# Python types for Data Scientists - Part II

Marton Trencseni - Sun 17 April 2022 - Python

## Introduction

In the previous post I showed how to get started using Python static type checking in ipython notebooks. Here I will look at slightly more advanced uses of typing to further increase the safety and readability of code. The ipython notebook is up on Github. The best reference is the official Python documentation of the `typing`

module.

`Optional`

types and `Union`

Sometimes we want to declare that something can be of a certain type, or `None`

. Imagine we don't know about `numpy.random.random_sample`

and we're writing a function `randoms()`

to return a random `list[float]`

of length `num`

:

```
def randoms(num: int) -> list[float]:
if num >= 0:
return [random() for _ in range(num)]
else:
return None # not okay, NoneType is not list[float]
```

We want to be good programmers and return `None`

if a negative value for `num`

is passed in, but `None`

is not a `list[float]`

, so this won't work:

```
error: Incompatible return value type (got "None", expected "List[float]")
```

This is what `Optional[T]`

is for, it declares that the type will be `T`

or `None`

:

```
def randoms(num: int) -> Optional[list[float]]:
if num >= 0:
return [random() for _ in range(num)]
else:
return None # okay, return type is Optional[...]
```

The same can also be achieved by using `Union[]`

:

```
def randoms(num: int) -> Union[list[float], None]:
if num >= 0:
return [random() for _ in range(num)]
else:
return None # okay, return type is Union[..., None]
```

Note that `None`

as a type hint is a special case and is replaced by `type(None)`

by Python.

What if we're a different kind of programmer, and we want to raise an exception instead of returning `None`

, like:

```
def randoms(num: int) -> list[float]:
if num >= 0:
return [random() for _ in range(num)]
else:
raise ValueError # okay, there is no typed way to communicate raised exceptions
```

This is okay, in Python there is no typed way to communicate raised exceptions.

Finally, what if we want to return just a `float`

if the user is asking for one random, and `None`

on negative input. `Union`

is the solution:

```
def randoms(num: int) -> Union[float, list[float], None]:
if num == 1:
return random() # float
elif num >= 0:
return [random() for _ in range(num)] # list[float]
else:
return None # None
```

Note that as of Python 3.10, `Union[X, Y]`

can be written as `X | Y`

, but this does not work yet on Python 3.9:

```
def randoms(num: int) -> float | list[float] | None:
if num == 1:
return random() # float
elif num >= 0:
return [random() for _ in range(num)] # list[float]
else:
return None # None
```

## Type aliases and `NewType`

Suppose we are building a library for machine learning and we are using `list[float]`

for feature vectors. One way we can communicate this to the user of our library is by calling the arguments of our functions names like `feature_vector`

. We can also accomplish this in our typing, by declaring an alias for `list[float]`

:

```
FeatureVector = list[float] # type alias
```

We can now write `FeatureVector`

interchangeably with `list[float]`

:

```
def predict(fv: FeatureVector) -> float:
return random()
fv: list[float] = [0.1, 0.2, 0.3]
predict(fv) # okay
fv: FeatureVector = [0.1, 0.2, 0.3]
predict(fv) # okay
```

Suppose that we want to declare our type classes like in type aliases, but we want to be more strict. We only want to accept `list[float]`

s that were explicitly declared to be `FeatureVector`

classes. We can achieve this by using `NewType`

:

```
FeatureVector = NewType('FeatureVector', list[float])
# all FeatureVectors are list[float], but not all list[float] are FeatureVectors
def predict(fv: FeatureVector) -> float:
return random()
fv: list[float] = [0.1, 0.2, 0.3]
predict(fv) # not okay
fv: FeatureVector = FeatureVector([0.1, 0.2, 0.3]) # explicit cast
predict(fv) # okay
```

Output:

```
error: Argument 1 to "predict" has incompatible type "List[float]"; expected "FeatureVector"
```

In the above example, all `FeatureVector`

s are `list[float]`

, but not all `list[float]`

are `FeatureVector`

s. So any function that accepts a `list[float]`

will accept a `FeatureVector`

, but not the other way around.

## Generics with `TypeVar`

Suppose we want to write a function `first()`

which returns the first element of a list, and we want to declare that the list contains things of type `T`

, and the return type will be the same type `T`

. We can accomplish this with a `TypeVar`

:

```
T = TypeVar('T') # declare type variable T to be used
def first(li: list[T]) -> T:
return li[0]
```

We can also mix `TypeVar`

s with `Optional`

to make `first()`

more useful:

```
T = TypeVar('T') # declare type variable T to be used
def first(li: list[T]) -> Optional[T]:
return li[0] if len(li) > 0 else None # okay
```

Note that we cannot bind a `TypeVar`

by usage. In the example below, we cannot bind `T`

to be `str`

(there is no "type solver"), this will return an error:

```
T = TypeVar('T') # declare type variable T to be used
def first(li: list[T]) -> T:
return "hello" # not okay
```

The error is:

```
error: Incompatible return value type (got "str", expected "T")
```

## Protocols

Let's look at another example, where we want to `add`

two things:

```
T = TypeVar('T') # declare type variable T to be used
def add(a: T, b: T) -> T:
return a+b # checks for __add__()
```

This will result in a type-error, because Python doesn't know whether `T`

implements `__add__()`

:

```
error: Unsupported left operand type for + ("T")
```

To achieve the desired typing, we have to use `Protocol`

s:

```
class Addable(Protocol):
def __add__(self, other): # anything that declares __add__() can stand in for an Addable
raise NotImplementedError
```

Here we are declaring a class `Addable`

using `typing.Protocol`

, which declares `__add__()`

.
Anything that declares `__add__()`

can stand in for an `Addable`

, even if it's not descended from `Addable`

. For example, an `int`

is an `Addable`

. Examples:

```
def add(a: Addable, b: Addable) -> Addable:
print(type(a), type(b))
return a+b # checks for __add__()
add(str(1), str(2)) # okay
add(int(1), int(2)) # okay
add(int(1), float(2)) # okay
add(int(1), str(2)) # not a typecheck error, but a runtime error
```

Output:

```
<class 'str'> <class 'str'>
<class 'int'> <class 'int'>
<class 'int'> <class 'float'>
<class 'int'> <class 'str'>
TypeError: unsupported operand type(s) for +: 'int' and 'str' # coming from the last add()
```

All 4 of these pass the type checks, because `str`

, `int`

and `float`

are all `Addable`

, since they have `__add__()`

. The last one will raise a run-time exceptions, since `+`

doesn't work implicitly for `int`

and `str`

. Note that this is a runtime exception coming from running the code, not a type error — the type checker did not raise any errors.

There are 2 ways we can think about this mini-problem:

- We want to allow adding of 2 different types (eg.
`int`

and`float`

), but only if it makes sense. We want the type checker to raise an error for cases when a runtime exception would be raised, (eg. adding`int`

and`str`

) - We only want to allow adding of exactly the same types, eg.
`int, int`

,`float, float`

,`str, str`

. We will see that this is not achievable in Python with generic types.

Let's look at another version of this, where we declare a `TypeVar`

and bind it to be `Addable`

:

```
T = TypeVar('T', bound=Addable)
def add(a: T, b: T) -> T:
print(type(a), type(b))
return a+b # checks for __add__()
add(str(1), str(2)) # okay
add(int(1), int(2)) # okay
add(int(1), float(2)) # okay
add(int(1), str(2)) # typecheck error and runtime error
```

Here, the last line raises a type check error and a runtime error:

```
error: Value of type variable "T" of "add" cannot be "object" # typecheck error coming from the last add()
...
TypeError: unsupported operand type(s) for +: 'int' and 'str' # runtime error coming from the last add()
```

Not that the third `int, float`

version still runs fine. So this version implements case 1. above, where different types can still be passed, as long as addition makes sense for them.

One last attempt could be to use a union'd `TypeVar`

, where we limit ourselves to certain types that can stand in for `T`

. But as before, in this case the type checker also doesn't enforce the instances of `T`

to be the same:

```
T = TypeVar('T', int, float, str)
def add(a: T, b: T) -> T:
return a+b # checks for __add__()
add(int(1), float(2)) # okay
```

It turns we cannot use binding to get case 2. above, ie. to force the type checker to make sure that both `a`

and `b`

arguments are actually the same type in `add(a, b)`

.

## Conclusion

In the next article I will look at more uses of protocol and abstract base classes.