Python types for Data Scientists - Part II
Marton Trencseni - Sun 17 April 2022 - Python
Introduction
In the previous post I showed how to get started using Python static type checking in ipython notebooks. Here I will look at slightly more advanced uses of typing to further increase the safety and readability of code. The ipython notebook is up on Github. The best reference is the official Python documentation of the typing
module.
Optional
types and Union
Sometimes we want to declare that something can be of a certain type, or None
. Imagine we don't know about numpy.random.random_sample
and we're writing a function randoms()
to return a random list[float]
of length num
:
def randoms(num: int) -> list[float]:
if num >= 0:
return [random() for _ in range(num)]
else:
return None # not okay, NoneType is not list[float]
We want to be good programmers and return None
if a negative value for num
is passed in, but None
is not a list[float]
, so this won't work:
error: Incompatible return value type (got "None", expected "List[float]")
This is what Optional[T]
is for, it declares that the type will be T
or None
:
def randoms(num: int) -> Optional[list[float]]:
if num >= 0:
return [random() for _ in range(num)]
else:
return None # okay, return type is Optional[...]
The same can also be achieved by using Union[]
:
def randoms(num: int) -> Union[list[float], None]:
if num >= 0:
return [random() for _ in range(num)]
else:
return None # okay, return type is Union[..., None]
Note that None
as a type hint is a special case and is replaced by type(None)
by Python.
What if we're a different kind of programmer, and we want to raise an exception instead of returning None
, like:
def randoms(num: int) -> list[float]:
if num >= 0:
return [random() for _ in range(num)]
else:
raise ValueError # okay, there is no typed way to communicate raised exceptions
This is okay, in Python there is no typed way to communicate raised exceptions.
Finally, what if we want to return just a float
if the user is asking for one random, and None
on negative input. Union
is the solution:
def randoms(num: int) -> Union[float, list[float], None]:
if num == 1:
return random() # float
elif num >= 0:
return [random() for _ in range(num)] # list[float]
else:
return None # None
Note that as of Python 3.10, Union[X, Y]
can be written as X | Y
, but this does not work yet on Python 3.9:
def randoms(num: int) -> float | list[float] | None:
if num == 1:
return random() # float
elif num >= 0:
return [random() for _ in range(num)] # list[float]
else:
return None # None
Type aliases and NewType
Suppose we are building a library for machine learning and we are using list[float]
for feature vectors. One way we can communicate this to the user of our library is by calling the arguments of our functions names like feature_vector
. We can also accomplish this in our typing, by declaring an alias for list[float]
:
FeatureVector = list[float] # type alias
We can now write FeatureVector
interchangeably with list[float]
:
def predict(fv: FeatureVector) -> float:
return random()
fv: list[float] = [0.1, 0.2, 0.3]
predict(fv) # okay
fv: FeatureVector = [0.1, 0.2, 0.3]
predict(fv) # okay
Suppose that we want to declare our type classes like in type aliases, but we want to be more strict. We only want to accept list[float]
s that were explicitly declared to be FeatureVector
classes. We can achieve this by using NewType
:
FeatureVector = NewType('FeatureVector', list[float])
# all FeatureVectors are list[float], but not all list[float] are FeatureVectors
def predict(fv: FeatureVector) -> float:
return random()
fv: list[float] = [0.1, 0.2, 0.3]
predict(fv) # not okay
fv: FeatureVector = FeatureVector([0.1, 0.2, 0.3]) # explicit cast
predict(fv) # okay
Output:
error: Argument 1 to "predict" has incompatible type "List[float]"; expected "FeatureVector"
In the above example, all FeatureVector
s are list[float]
, but not all list[float]
are FeatureVector
s. So any function that accepts a list[float]
will accept a FeatureVector
, but not the other way around.
Generics with TypeVar
Suppose we want to write a function first()
which returns the first element of a list, and we want to declare that the list contains things of type T
, and the return type will be the same type T
. We can accomplish this with a TypeVar
:
T = TypeVar('T') # declare type variable T to be used
def first(li: list[T]) -> T:
return li[0]
We can also mix TypeVar
s with Optional
to make first()
more useful:
T = TypeVar('T') # declare type variable T to be used
def first(li: list[T]) -> Optional[T]:
return li[0] if len(li) > 0 else None # okay
Note that we cannot bind a TypeVar
by usage. In the example below, we cannot bind T
to be str
(there is no "type solver"), this will return an error:
T = TypeVar('T') # declare type variable T to be used
def first(li: list[T]) -> T:
return "hello" # not okay
The error is:
error: Incompatible return value type (got "str", expected "T")
Protocols
Let's look at another example, where we want to add
two things:
T = TypeVar('T') # declare type variable T to be used
def add(a: T, b: T) -> T:
return a+b # checks for __add__()
This will result in a type-error, because Python doesn't know whether T
implements __add__()
:
error: Unsupported left operand type for + ("T")
To achieve the desired typing, we have to use Protocol
s:
class Addable(Protocol):
def __add__(self, other): # anything that declares __add__() can stand in for an Addable
raise NotImplementedError
Here we are declaring a class Addable
using typing.Protocol
, which declares __add__()
.
Anything that declares __add__()
can stand in for an Addable
, even if it's not descended from Addable
. For example, an int
is an Addable
. Examples:
def add(a: Addable, b: Addable) -> Addable:
print(type(a), type(b))
return a+b # checks for __add__()
add(str(1), str(2)) # okay
add(int(1), int(2)) # okay
add(int(1), float(2)) # okay
add(int(1), str(2)) # not a typecheck error, but a runtime error
Output:
<class 'str'> <class 'str'>
<class 'int'> <class 'int'>
<class 'int'> <class 'float'>
<class 'int'> <class 'str'>
TypeError: unsupported operand type(s) for +: 'int' and 'str' # coming from the last add()
All 4 of these pass the type checks, because str
, int
and float
are all Addable
, since they have __add__()
. The last one will raise a run-time exceptions, since +
doesn't work implicitly for int
and str
. Note that this is a runtime exception coming from running the code, not a type error — the type checker did not raise any errors.
There are 2 ways we can think about this mini-problem:
- We want to allow adding of 2 different types (eg.
int
andfloat
), but only if it makes sense. We want the type checker to raise an error for cases when a runtime exception would be raised, (eg. addingint
andstr
) - We only want to allow adding of exactly the same types, eg.
int, int
,float, float
,str, str
. We will see that this is not achievable in Python with generic types.
Let's look at another version of this, where we declare a TypeVar
and bind it to be Addable
:
T = TypeVar('T', bound=Addable)
def add(a: T, b: T) -> T:
print(type(a), type(b))
return a+b # checks for __add__()
add(str(1), str(2)) # okay
add(int(1), int(2)) # okay
add(int(1), float(2)) # okay
add(int(1), str(2)) # typecheck error and runtime error
Here, the last line raises a type check error and a runtime error:
error: Value of type variable "T" of "add" cannot be "object" # typecheck error coming from the last add()
...
TypeError: unsupported operand type(s) for +: 'int' and 'str' # runtime error coming from the last add()
Not that the third int, float
version still runs fine. So this version implements case 1. above, where different types can still be passed, as long as addition makes sense for them.
One last attempt could be to use a union'd TypeVar
, where we limit ourselves to certain types that can stand in for T
. But as before, in this case the type checker also doesn't enforce the instances of T
to be the same:
T = TypeVar('T', int, float, str)
def add(a: T, b: T) -> T:
return a+b # checks for __add__()
add(int(1), float(2)) # okay
It turns we cannot use binding to get case 2. above, ie. to force the type checker to make sure that both a
and b
arguments are actually the same type in add(a, b)
.
Conclusion
In the next article I will look at more uses of protocol and abstract base classes.