Python types for Data Scientists - Part III
Marton Trencseni - Fri 22 April 2022 - Python
Introduction
In the first post I showed how to get started using Python static type checking in ipython notebooks. The second post looked at slightly more advanced uses of typing to further increase the safety and readability of code. Here I will continue, and look at some aspects of type hinting:
- type check errors and runtime errors
- where type hints don't work
- Abstract Base Classes vs. Protocols
- types for class variables vs instance variables
The ipython notebook is up on Github. The best reference is the official Python documentation of the typing
module.
Type check errors and runtime errors
It's important to remember that in Python, type hints are optional and ignored (not enforced) by the Python runtime. Type hints are interpreted by external programs. In these example, I use nb_mypy, which actually runs mypy
to do type checking. Then, irrespective of the result of type checking, the regular Python runtime runs (and ignores all type hints). In other contexts, when using an IDE, the IDE would run type checks in the background, and show errors, or use the type hint informations for code complete.
Given the roots of Python, I find this to be a good trade-off to introduce typing and get 80% of the benefits.
But, it leads to some weird behaviour, which cannot be changed with nb_mypy: even if there is a typecheck error in the current cell, the code code in the cell will still be run after the typecheck completed. This leads to some confusing outputs. For example:
def foo(i: int) -> None:
print(i)
foo("hello")
Output:
error: Argument 1 to "foo" has incompatible type "str"; expected "int"
hello
First the nb_mypy type checker runs, finds the type error, prints the error, but then the code is executed anyway. And since the Python runtime ignores all typehints, the code runs just fine, since to Python the function foo()
is equivalent to:
def foo(i):
print(i)
Where type hints don't work
There are some cases where writing type hints does not work as we'd expect. The big ones are for
and while
loops:
for i: int in range(3):
print(i)
Output:
SyntaxError: invalid syntax
This is not a type check error, it's a syntax error. We cannot put the type hint for i
in the for
loop itself. It has to go before:
i: int
for i in range(3):
print(i)
However, at least in such simple cases, I would just skip the type hint since it's quite ugly. It's actually not required, the type checker can infer the int
type from range()
, so this will throw a type check error:
def f(s: str) -> None:
pass
for i in range(3): # okay
f(i) # not okay
Output:
error: Argument 1 to "f" has incompatible type "int"; expected "str"
Abstract Base Classes vs. Protocols
In the previous post, there was the example of declaring a Protocol
for addability:
class Addable(Protocol):
def __add__(self, other): # anything that declares __add__() can stand in for an Addable
raise NotImplementedError
T = TypeVar('T', bound=Addable)
def add(a: T, b: T):
print(type(a), type(b))
print(a+b) # checks for __add__(), uses __str__()
add(int(1), int(2)) # okay
In this example, what we pass to add()
needs to declare an __add__()
. So if we define our own class MyInt
like this:
class MyInt():
num: int
def __init__(self, num: int):
self.num = num
add(MyInt(1), MyInt(2)) # not okay
Output:
error: Value of type variable "T" of "add" cannot be "MyInt"
We can fix this by implementing __add__()
in MyInt
:
class MyInt(): # note that MyInt does not inherit Addable
num: int
def __init__(self, num: int):
self.num = num
def __add__(self, other):
return MyInt(self.num + other.num)
def __str__(self): # so print() works
return str(self.num)
T = TypeVar('T', bound=Addable)
def add(a: T, b: T):
print(type(a), type(b))
print(a+b) # checks for __add__(), uses __str__()
add(MyInt(1), MyInt(2)) # okay
add(int(1), int(2)) # okay
Output:
<class '__main__.MyInt'> <class '__main__.MyInt'>
3
<class 'int'> <class 'int'>
3
This works, even though MyInt
does not mention Addable
in the class declaration at all!
What happens if we go back to the Addable
declaration and change Protocol
to ABC
, like:
class Addable(ABC):
def __add__(self, other):
raise NotImplementedError
...
add(MyInt(1), MyInt(2)) # not okay
add(int(1), int(2)) # not okay
We get a type error from both lines:
error: Value of type variable "T" of "add" cannot be "MyInt"
error: Value of type variable "T" of "add" cannot be "int"
Neither MyInt
or int
can stand in for an Addable
if it's an ABC
. Only classes that derive can stand in for abstract base classes.
Let's change MyInt
to derive from Addable
:
class MyInt(Addable):
....
add(MyInt(1), MyInt(2)) # okay, MyInt derives from Addable
add(int(1), int(2)) # not okay
Output:
error: Value of type variable "T" of "add" cannot be "int"
MyInt
is now fine, int
still cannot stand in for an Addable
. This shows the difference between Protocol
and ABC
. With Protocol
, anything that implements the declared functions can stand-in for that type, irrespective of inheritance. With ABC
, only types that inherit from the base class (in the example above, MyInt
inherits from Addable
) can stand in for that type.
Types for class variables vs instance variables
Let's try this code:
class Foo:
num: int # num is NOT a class variable
def __init__(self, num: int):
self.num = num
f = Foo(1)
print(f.num) # prints 1
g = Foo(2)
print(f.num, g.num) # prints 1 2
print(Foo.num) # AttributeError: type object 'Foo' has no attribute 'num'
Output:
1
1 2
AttributeError: type object 'Foo' has no attribute 'num'
Here, we declare the num
instance variable of class Foo
to be of type int
. We create 2 instances of Foo
, and we see that each of them carries a separate num
. Then we try to access Foo.num
class variable, and we get an AttributeError
, because it doesn't exist.
Let's make one minor modification of the code, and assign some initial value to num
:
class Foo:
num: int = 0 # num is now a class variable
def __init__(self, num: int):
self.num = num
f = Foo(1)
print(f.num) # prints 1
g = Foo(2)
print(f.num, g.num) # prints 1 2
print(Foo.num) # prints 0
This one change creates num
as a class variable, which can be accessed. Note that both the class variable and the instance variable carry the int
type:
f = Foo(1)
print(Foo.num, f.num) # prints 0 1
f.num = "hello" # not okay
Foo.num = "world" # not okay
Output:
error: Incompatible types in assignment (expression has type "str", variable has type "int")
error: Incompatible types in assignment (expression has type "str", variable has type "int")
Conclusion
This post concludes this short series on Python typing for Data Scientists. I think the verdict is still out whether type hints are worth in in Data Science code (which tends to be short, linear and less structured than application software), but it's good to know that type hints exist and how it works.