Random numbers, the natural logarithm and higher dimensional simplexes
Marton Trencseni - Sat 17 April 2021 - Data
Introduction
This is a fun post to exercise our brain and practice college math.
Let's play a game: draw a uniform random number between 0 and 1. What is the expected value?
from random import random
def once(n=1000*1000):
sum = 0.0
for _ in range(n):
sum += random()
return sum/n
print(once())
> 0.50023
As expected, it's 0.5.
Okay, what about if we draw two numbers?
def twice(n=1000*1000):
sum = 0.0
for _ in range(n):
sum += random()
sum += random()
return sum/n
print(twice())
> 1.00003
As expected, it's 1.0.
Okay, now let's turn it on its head: draw random numbers between 0 and 1, and stop when the sum exceeds 1. What is the expected number of draws?
def stop_at(z=1.0, n=1000*1000):
total_draws = 0.0
for _ in range(n):
trial_sum = 0.0
num_draws = 0
while True:
trial_sum += random()
num_draws += 1
if trial_sum >= z: break
total_draws += num_draws
return total_draws/n
print(stop_at(1))
> 2.71758
This returns $ e $, the base of the natural logarithm! Why does it show up here?
Practicing college math
As a reminder, the definition of $ e $:
Why is this the expected value of the draws?
Let $ N $ denote the random variable which denotes the number of draws required to cross 1.0. The expected value:
The sum starts at $ k = 2 $ because we need at least 2 draws, $ P(N=k) = 0 $ for $ k = 1 $.
The first step is to realize that $ P(N=k) $, the probability that we stop at $k$, is the probability that the sum is less than $1$ at $k-1$ draws, and exceeds $1$ after the $k$-th draw. Let $U_i$ denote the $i$-th draw, and let $S_k$ denote the sum up to $k$, $ S_k = \sum\limits_{i=1}^{k}{ U_i } $.
Then:
Now some elementary probability theory:
With $ A = S_{k-1} < 1 $ and $ B = S_{k} \geq 1 $:
Obviously, $ P( S_{k-1} < 1 \land S_k < 1 ) = P( S_k < 1 ) $, so:
Moving terms:
So:
Now we just need to figure out the formula for $ P( S_k < 1 ) $.
This is easiest solved geometrically. For $k=2$, $ P( S_k < 1 ) $ is the triangle area inside the x-y unit square where $x+y < 1$, which is clearly $1/2$. For $k=3$, $ P( S_k < 1 ) $ is the simplex volume inside the x-y-z unit square where $ x+y+z < 1 $. The simplex has as its base the previous triangle defined by $x+y < 1$, and has unit height.
The general formula for the area of triangles (volumes of simplexes) with unit height in $k$ dimensions is $ V = A \times k $, where A is the area (volume) of the base, which is the same formula for $k-1$. Inductively, this means that:
We have the factorial from the definition of $e$! Now we apply a simple identity of the factorial, $ k! = k \times (k-1)! $ to our formula:
We're almost done:
Let $j = k-1$:
Let $n = j-1$:
Varying the stopping threshold
We saw above that if we stop at $z=1$, we need an average of $e$ draws. What if we want to stop at $z=2$ or $z=3.14$? What is the $E(N_z)$, the average number of draws required if we want to stop at $z$?
import matplotlib.pyplot as plt
x = [x/10 for x in list(range(10, 50, 1))]
y = [stop_at(i, n=100*1000) for i in x]
fig, ax = plt.subplots()
ax.plot(x, y, marker='o')
ax.set(xlabel='z', ylabel='average draws',
title='E(N_z)')
ax.grid()
plt.show()
This is intuitive: if we need $e$ draws to reach $z$, and each additional draws averages $0.5$, so $E(N_z) = f(z) = e + 2(z-1)$.
Sometimes it's good to exercise our brain (and $\LaTeX$ skills) to stay sharp!