Building a toy Python Enum class - Part I

Marton Trencseni - Tue 03 May 2022 - Python

Introduction

Enumerations, or enums for short, are part of the core language in traditional statically typed languages like C, C++ and Java. For example, in C++ we can write:

enum Color { RED, GREEN, BLUE };
Color r = RED;

switch(r)
{
    case RED  : std::cout << "red\n";   break;
    case GREEN: std::cout << "green\n"; break;
    case BLUE : std::cout << "blue\n";  break;
}

When the C++ compiler sees an enum definition, it allocates a special memory structure wide enough to hold the possible values (in the above case, there are 3 possible values, so 2 bits would be enough, but the compiler would actually allocate at least 1 byte, usually 4 bytes). Since C++ is statically typed, if we accidentall write a case statement like case PURPLE : std::cout << "purple\n"; break; we would get an error from the compiler, since it knows that PURPLE is not a valid/possible value of Color, as defined above.

Python is a very different language from C++: enumerations are not part of the core Python language, unlike say tuples. In Python, Enums (upper case) are part of the standard library, and are implemented using Python code, using Python classes in a tricky way. For example, here is enum.py (link) from cpython, and on line 1077 you can read the (very tricky) implementation of Enums. It looks something like:

class Enum(metaclass=EnumType):
    ...

As a former C++ programmer, I find this weird, cool and intriguing. So I decided to practice my Python Fu and write my own toy implmenentation of Python's Enum class, specifically for int values. Emphasis on "toy"; the full source code for enum.py linked above is 2018 lines of code (!), which also includes related classes such as IntEnum, StrEnum, Flag, etc.

The ipython notebook is up on Github.

Enumerations in Python

In Python, to use standard library Enums we have to first import it from the enum module, like:

from enum import Enum

class Color(Enum):
    RED: int = 1
    GREEN: int = 2
    BLUE: int = 3

Alternatively, we can skip giving values by hand, and use the auto() magic function:

from enum import Enum, auto

class Color(Enum):
    RED: int = auto()
    GREEN: int = auto()
    BLUE: int = auto()

The magic auto() function starts numbering at 1, so the above two are equivalent.

There are various ways to create enums:

c1 = Color(1)
c2 = Color(Color.RED)
c3 = Color(Color['RED'])
print(type(Color), type(c1), type(Color.RED), type(Color['RED']))

Output:

<class 'enum.EnumMeta'> <enum 'Color'> <enum 'Color'> <enum 'Color'>

Truthiness tests:

Color(1) == 1             # false
Color.RED == Color(1)     # true
Color.RED == Color['RED'] # true

An Enum itself (not an instance) has a useful iteration interface (this sort of convenience does not exist in C++):

print(len(Color)) # 3
# can iterate:
for c in Color:
    print(c)

Output:

3
Color.RED
Color.GREEN
Color.BLUE

Examples of things that don't work with enums:

Color(4)      # ValueError: 4 is not a valid Color
Color('RED')  # ValueError: 'RED' is not a valid Color
c1 = Color(1) # okay
c1 += 1       # TypeError: unsupported operand type(s) for +=: 'Color' and 'int'

Metaclasses in Python

So, as an exercise in Python meta-programming, let's write a simple Enum class that accomplishes the above. In the code above, notice that type(Color) is <class 'enum.EnumMeta'>, this is a big clue. Python has a feature called metaclasses, which is a way to construct classes the way we construct objects. In Python, classes in fact are just objects of type type:

class Color():     # not deriving from Enum
    RED: int = 1
    GREEN: int = 2
    BLUE: int = 3

print(type(Color))

Output:

<class 'type'>

But then, when we printed the type for Color(Enum), we got <class 'enum.EnumMeta'>, so what's going on? Let's check:

from enum import EnumMeta, Enum, auto

class Color(Enum):
    RED: int = 1
    GREEN: int = 2
    BLUE: int = 3

print(type(Color), type(Enum), type(EnumMeta))

Output:

<class 'enum.EnumMeta'> <class 'enum.EnumMeta'> <class 'type'>

So both Enum and Color are of type EnumMeta, and EnumMeta itself is of type type. We will follow this pattern in our own implementation of Enum.

In Python, with metaclasses, we can create classes that are of our own metaclasses' type, and we can have "constructor" code run when the class is defined (not when instances are created). Let's see an example:

class Enum(type):
    def __new__(metacls, cls, bases, classdict, **kwds):
        print(f'Defining new Enum type {cls}:')
        print(f'- metacls = {metacls}')
        print(f'- bases = {bases}')
        print(f'- classdict = {classdict}')

class Color(metaclass=Enum): # specifying our own Enum metaclass
    RED: int = 1
    GREEN: int = 2
    BLUE: int = 3

Output:

Defining new Enum type Color:
- metacls = <class '__main__.Enum'>
- bases = ()
- classdict = {'__module__': '__main__', '__qualname__': 'Color',
  '__annotations__': {'RED': <class 'int'>, 'GREEN': <class 'int'>,
  'BLUE': <class 'int'>}, 'RED': 1, 'GREEN': 2, 'BLUE': 3}

Note that the code in __new__() ran when we declared Color! We never created an instance of Color in the snippet above! This is where we start our toy implementation, and this is also how the standard library Enum works: by constructing a special class when a class deriving from Enum is declared. Here we are actually not deriving, we are metaclassing, which we will fix later.

Basic Enum functionality

The example above was just an illustration, a Color defined like this is not useful. To accomplish the standard library functionality, we will use a chain of classes similar to enum.py:

class EnumMeta(type):
    ...

class Enum(metaclass=EnumMeta):
    ...

class Color(Enum):
    ...

This way, when we define an Enum like Color, it derives from Enum, which is of metaclass EnumMeta, so this way we can use both inheritance and metaclassing features of Python. As a first order of business, in EnumMeta's __new__(), let's go through the enumerations defined by the user and save them into the dictionary. As seen in the output above, these are available in the passed in classdict object:

class EnumMeta(type):
    def __new__(metacls, cls, bases, classdict, **kwds):
        enumerations = {x: y for x, y in classdict.items() if not x.startswith('__')}
        enum = super().__new__(metacls, cls, bases, classdict, **kwds)
        enum._enumerations = enumerations
        return enum

class Enum(metaclass=EnumMeta):
    pass

class Color(Enum): # specifying our own Enum metaclass
    RED: int = 1
    GREEN: int = 2
    BLUE: int = 3

Now let's check how this works:

print(Color._enumerations)                     # {'RED': 1, 'GREEN': 2, 'BLUE': 3}
print(type(Color), type(Enum), type(EnumMeta)) # ...
Color(1)                                       # TypeError: Color() takes no arguments

The last call to the Color() constructor will fail, because the default constructor in Python does not take arguments. Let's fix this in the Enum base class:

class Enum(metaclass=EnumMeta):
    def __init__(self, value):
        # make sure the passed in value is a valid enumeration value
        if value not in self.__class__._enumerations.values():
            raise ValueError(f'{value} is not a valid {self.__class__.__name__}')
        # save the actual enumeration value
        for k, v in self.__class__._enumerations.items():
            if v == value:
                self.__key = k
                self.__value = v

Now we can try again:

Color(1) # okay
Color(4) # ValueError: 4 is not a valid Color

We have provided our own automatic constructor, which takes just the values the user-defined Enum should. We can pick some low hanging fruit and get string and iteration related functionality working:

class EnumMeta(type):
    ...

    def __len__(cls):
        return len(cls._enumerations)

    def __iter__(cls):
        return (cls(value) for value in cls._enumerations.values())

class Enum(metaclass=EnumMeta):
    ...

    def __str__(self):
        return "%s.%s" % (self.__class__.__name__, self.__key)

print(Color(1))
print(len(Color))
for c in Color:
    print(c)

Output:

Color.RED
3
Color.RED
Color.GREEN
Color.BLUE

Adding auto()

An easy feature to add is the magic auto() function, which enables us to avoid writing out int values, and increments automatically in the class definition. auto() is just a function that runs when the class is defined, so it needs to return something. Then the results can be "cleaned up" in the metaclass's __new__() function. We could use None or -999 as the auto() a return value, but that would conflict with the user using that value in their own Enums, so let's create a class just for this:

class _Auto():
    pass

def auto():
    return _Auto()

Then:

class EnumMeta(type):
    def __new__(metacls, cls, bases, classdict, **kwds):
        enumerations = {x: y for x, y in classdict.items() if not x.startswith('__')}
        # handle auto()
        next_value = 1
        for k, v in enumerations.items():
            if type(v) != _Auto: # if auto() was used, v will be an _Auto
                next_value = v + 1
            else:
                enumerations[k] = next_value
                next_value += 1
        enum = super().__new__(metacls, cls, bases, classdict, **kwds)
        enum._enumerations = enumerations
        return enum
    ...

Now we can do:

class Color(Enum):
    RED: int = auto()
    GREEN: int = auto()
    BLUE: int = auto()

The values will be replaced with 1, 2, 3.

The final version so far:

class _Auto():
    pass

def auto():
    return _Auto()

class EnumMeta(type):
    def __new__(metacls, cls, bases, classdict, **kwds):
        enumerations = {x: y for x, y in classdict.items() if not x.startswith('__')}
        # handle auto()
        next_value = 1
        for k, v in enumerations.items():
            if type(v) != _Auto:
                next_value = v + 1
            else:
                enumerations[k] = next_value
                next_value += 1
        enum = super().__new__(metacls, cls, bases, classdict, **kwds)
        enum._enumerations = enumerations
        return enum

    def __len__(cls):
        return len(cls._enumerations)

    def __iter__(cls):
        return (cls(value) for value in cls._enumerations.values())

class Enum(metaclass=EnumMeta):
    def __init__(self, value):
        # make sure the passed in value is a valid enumeration value
        if value not in self.__class__._enumerations.values():
            raise ValueError(f'{value} is not a valid {self.__class__.__name__}')
        # save the actual enumeration value
        for k, v in self.__class__._enumerations.items():
            if v == value:
                self.__key = k
                self.__value = v

    def __str__(self):
        return "%s.%s" % (self.__class__.__name__, self.__key)

Conclusion

This is good progress, but some things are still missing: Color['RED'] doesn't work, Color.RED returns an int, equality doesn't work, etc. In the next part I will add more features to this toy class to cover the most commonly used functionality of the standard library Enum.