Building a toy Python Enum class - Part I
Marton Trencseni - Tue 03 May 2022 - Python
Introduction
Enumerations, or enum
s for short, are part of the core language in traditional statically typed languages like C, C++ and Java. For example, in C++ we can write:
enum Color { RED, GREEN, BLUE };
Color r = RED;
switch(r)
{
case RED : std::cout << "red\n"; break;
case GREEN: std::cout << "green\n"; break;
case BLUE : std::cout << "blue\n"; break;
}
When the C++ compiler sees an enum definition, it allocates a special memory structure wide enough to hold the possible values (in the above case, there are 3 possible values, so 2 bits would be enough, but the compiler would actually allocate at least 1 byte, usually 4 bytes). Since C++ is statically typed, if we accidentall write a case statement like case PURPLE : std::cout << "purple\n"; break;
we would get an error from the compiler, since it knows that PURPLE
is not a valid/possible value of Color
, as defined above.
Python is a very different language from C++: enumerations are not part of the core Python language, unlike say tuples. In Python, Enum
s (upper case) are part of the standard library, and are implemented using Python code, using Python classes in a tricky way. For example, here is enum.py
(link) from cpython
, and on line 1077 you can read the (very tricky) implementation of Enum
s. It looks something like:
class Enum(metaclass=EnumType):
...
As a former C++ programmer, I find this weird, cool and intriguing. So I decided to practice my Python Fu and write my own toy implmenentation of Python's Enum
class, specifically for int
values. Emphasis on "toy"; the full source code for enum.py
linked above is 2018 lines of code (!), which also includes related classes such as IntEnum
, StrEnum
, Flag
, etc.
The ipython notebook is up on Github.
Enumerations in Python
In Python, to use standard library Enum
s we have to first import it from the enum
module, like:
from enum import Enum
class Color(Enum):
RED: int = 1
GREEN: int = 2
BLUE: int = 3
Alternatively, we can skip giving values by hand, and use the auto()
magic function:
from enum import Enum, auto
class Color(Enum):
RED: int = auto()
GREEN: int = auto()
BLUE: int = auto()
The magic auto()
function starts numbering at 1
, so the above two are equivalent.
There are various ways to create enums:
c1 = Color(1)
c2 = Color(Color.RED)
c3 = Color(Color['RED'])
print(type(Color), type(c1), type(Color.RED), type(Color['RED']))
Output:
<class 'enum.EnumMeta'> <enum 'Color'> <enum 'Color'> <enum 'Color'>
Truthiness tests:
Color(1) == 1 # false
Color.RED == Color(1) # true
Color.RED == Color['RED'] # true
An Enum
itself (not an instance) has a useful iteration interface (this sort of convenience does not exist in C++):
print(len(Color)) # 3
# can iterate:
for c in Color:
print(c)
Output:
3
Color.RED
Color.GREEN
Color.BLUE
Examples of things that don't work with enums:
Color(4) # ValueError: 4 is not a valid Color
Color('RED') # ValueError: 'RED' is not a valid Color
c1 = Color(1) # okay
c1 += 1 # TypeError: unsupported operand type(s) for +=: 'Color' and 'int'
Metaclasses in Python
So, as an exercise in Python meta-programming, let's write a simple Enum
class that accomplishes the above. In the code above, notice that type(Color)
is <class 'enum.EnumMeta'>
, this is a big clue. Python has a feature called metaclasses, which is a way to construct classes the way we construct objects. In Python, classes in fact are just objects of type type
:
class Color(): # not deriving from Enum
RED: int = 1
GREEN: int = 2
BLUE: int = 3
print(type(Color))
Output:
<class 'type'>
But then, when we printed the type for Color(Enum)
, we got <class 'enum.EnumMeta'>
, so what's going on? Let's check:
from enum import EnumMeta, Enum, auto
class Color(Enum):
RED: int = 1
GREEN: int = 2
BLUE: int = 3
print(type(Color), type(Enum), type(EnumMeta))
Output:
<class 'enum.EnumMeta'> <class 'enum.EnumMeta'> <class 'type'>
So both Enum
and Color
are of type EnumMeta
, and EnumMeta
itself is of type type
. We will follow this pattern in our own implementation of Enum
.
In Python, with metaclasses, we can create classes that are of our own metaclasses' type, and we can have "constructor" code run when the class is defined (not when instances are created). Let's see an example:
class Enum(type):
def __new__(metacls, cls, bases, classdict, **kwds):
print(f'Defining new Enum type {cls}:')
print(f'- metacls = {metacls}')
print(f'- bases = {bases}')
print(f'- classdict = {classdict}')
class Color(metaclass=Enum): # specifying our own Enum metaclass
RED: int = 1
GREEN: int = 2
BLUE: int = 3
Output:
Defining new Enum type Color:
- metacls = <class '__main__.Enum'>
- bases = ()
- classdict = {'__module__': '__main__', '__qualname__': 'Color',
'__annotations__': {'RED': <class 'int'>, 'GREEN': <class 'int'>,
'BLUE': <class 'int'>}, 'RED': 1, 'GREEN': 2, 'BLUE': 3}
Note that the code in __new__()
ran when we declared Color
! We never created an instance of Color
in the snippet above! This is where we start our toy implementation, and this is also how the standard library Enum
works: by constructing a special class when a class deriving from Enum
is declared. Here we are actually not deriving, we are metaclassing, which we will fix later.
Basic Enum
functionality
The example above was just an illustration, a Color
defined like this is not useful. To accomplish the standard library functionality, we will use a chain of classes similar to enum.py
:
class EnumMeta(type):
...
class Enum(metaclass=EnumMeta):
...
class Color(Enum):
...
This way, when we define an Enum
like Color
, it derives from Enum
, which is of metaclass EnumMeta
, so this way we can use both inheritance and metaclassing features of Python. As a first order of business, in EnumMeta
's __new__()
, let's go through the enumerations defined by the user and save them into the dictionary. As seen in the output above, these are available in the passed in classdict
object:
class EnumMeta(type):
def __new__(metacls, cls, bases, classdict, **kwds):
enumerations = {x: y for x, y in classdict.items() if not x.startswith('__')}
enum = super().__new__(metacls, cls, bases, classdict, **kwds)
enum._enumerations = enumerations
return enum
class Enum(metaclass=EnumMeta):
pass
class Color(Enum): # specifying our own Enum metaclass
RED: int = 1
GREEN: int = 2
BLUE: int = 3
Now let's check how this works:
print(Color._enumerations) # {'RED': 1, 'GREEN': 2, 'BLUE': 3}
print(type(Color), type(Enum), type(EnumMeta)) # ...
Color(1) # TypeError: Color() takes no arguments
The last call to the Color()
constructor will fail, because the default constructor in Python does not take arguments. Let's fix this in the Enum
base class:
class Enum(metaclass=EnumMeta):
def __init__(self, value):
# make sure the passed in value is a valid enumeration value
if value not in self.__class__._enumerations.values():
raise ValueError(f'{value} is not a valid {self.__class__.__name__}')
# save the actual enumeration value
for k, v in self.__class__._enumerations.items():
if v == value:
self.__key = k
self.__value = v
Now we can try again:
Color(1) # okay
Color(4) # ValueError: 4 is not a valid Color
We have provided our own automatic constructor, which takes just the values the user-defined Enum
should. We can pick some low hanging fruit and get string and iteration related functionality working:
class EnumMeta(type):
...
def __len__(cls):
return len(cls._enumerations)
def __iter__(cls):
return (cls(value) for value in cls._enumerations.values())
class Enum(metaclass=EnumMeta):
...
def __str__(self):
return "%s.%s" % (self.__class__.__name__, self.__key)
print(Color(1))
print(len(Color))
for c in Color:
print(c)
Output:
Color.RED
3
Color.RED
Color.GREEN
Color.BLUE
Adding auto()
An easy feature to add is the magic auto()
function, which enables us to avoid writing out int
values, and increments automatically in the class definition. auto()
is just a function that runs when the class is defined, so it needs to return something. Then the results can be "cleaned up" in the metaclass's __new__()
function. We could use None
or -999
as the auto()
a return value, but that would conflict with the user using that value in their own Enum
s, so let's create a class just for this:
class _Auto():
pass
def auto():
return _Auto()
Then:
class EnumMeta(type):
def __new__(metacls, cls, bases, classdict, **kwds):
enumerations = {x: y for x, y in classdict.items() if not x.startswith('__')}
# handle auto()
next_value = 1
for k, v in enumerations.items():
if type(v) != _Auto: # if auto() was used, v will be an _Auto
next_value = v + 1
else:
enumerations[k] = next_value
next_value += 1
enum = super().__new__(metacls, cls, bases, classdict, **kwds)
enum._enumerations = enumerations
return enum
...
Now we can do:
class Color(Enum):
RED: int = auto()
GREEN: int = auto()
BLUE: int = auto()
The values will be replaced with 1, 2, 3
.
The final version so far:
class _Auto():
pass
def auto():
return _Auto()
class EnumMeta(type):
def __new__(metacls, cls, bases, classdict, **kwds):
enumerations = {x: y for x, y in classdict.items() if not x.startswith('__')}
# handle auto()
next_value = 1
for k, v in enumerations.items():
if type(v) != _Auto:
next_value = v + 1
else:
enumerations[k] = next_value
next_value += 1
enum = super().__new__(metacls, cls, bases, classdict, **kwds)
enum._enumerations = enumerations
return enum
def __len__(cls):
return len(cls._enumerations)
def __iter__(cls):
return (cls(value) for value in cls._enumerations.values())
class Enum(metaclass=EnumMeta):
def __init__(self, value):
# make sure the passed in value is a valid enumeration value
if value not in self.__class__._enumerations.values():
raise ValueError(f'{value} is not a valid {self.__class__.__name__}')
# save the actual enumeration value
for k, v in self.__class__._enumerations.items():
if v == value:
self.__key = k
self.__value = v
def __str__(self):
return "%s.%s" % (self.__class__.__name__, self.__key)
Conclusion
This is good progress, but some things are still missing: Color['RED']
doesn't work, Color.RED
returns an int
, equality doesn't work, etc. In the next part I will add more features to this toy class to cover the most commonly used functionality of the standard library Enum
.