Randomness extractors: making fair coins out of biased coins

Marton Trencseni - Sun 26 May 2024 - Data

Introduction

In a previous post titled Fair coin from biased coin, I looked at the problem of creating a uniform random coin given access to a biased coin. I looked at multiple approaches, and determined that they're actually all the same in some sense. Recently, while reading Turing award winner Avi Widgerson's excellent book Mathematics and Computation, I learned that this is a simple case of a more general problem of deterministic randomness extraction: given a stream of random input bits, which are potentially biased and correlated, create a stream of output bits which are uniform random, ie. a series of fair coin tosses. The output bit stream, like in the Von Neumann case, is allowed to have a lower bitrate, ie. we don't always emit output bits on input.

Using the language of entropy, the input bit stream has non-zero, but less than 1-bit of entropy per bit: $ 0 < H_{input} < 1 $. We want to create an output bit stream which does have 1-bit of entropy per bit, or close to it: $ H_{output} \approx 1 $. The price we pay is (potentially) emitting less bits, ie. if the input bit stream has some bitrate $ r_{input} \: bits/sec $, then for the output bit stream $ r_{output} < r_{input} $.

I will skip the exact mathematical definitions and theorems, and instead go step-by-step to build more sophisticated randomness extractors, with Python code samples. The code is up on Github.

Base classes

First, let's define some Python base classes that capture the core concepts of our problem domain. A BitSource is something that emits bits (represented as ints here), while a RandomnessExtractor is something which internally gets bits from a BitSource, and sometimes emits a bit, sometimes returns None:

class BitSource(ABC):
    @abstractmethod
    def next(self) -> int:
        pass


class RandomnessExtractor(ABC):
    @abstractmethod
    def next(self) -> Optional[int]:
        pass

Let's define a BiasedBitSource:

class BiasedBitSource(BitSource):
    def __init__(self, bias=None):
        if bias is None:
            bias = random()
        self.bias = bias

    def next(self) -> int:
        if random() < self.bias:
            return 0
        else:
            return 1

Trivial case: fair coin from fair coin

Let's define a simple PasstroughExtractor, which just returns the input bit stream:

class PassthroughExtractor(RandomnessExtractor):
    def __init__(self, bit_source: BitSource):
        self.bit_source = bit_source
        self.bits = []
    def next(self) -> Optional[int]:
        bit = self.bit_source.next()
        result = bit
        print_bits(bit, result)
        return result

The trivial application then is:

bit_source = BiasedBitSource(bias=0.5)
extractor = PassthroughExtractor(bit_source)

for _ in range(100):
    result = extractor.next()

Let's see it in action. The first column is always the input bit stream, the second column is the output bit stream:

von Neumann: fair coin from biased coin

Let's make the following two assumptions about the input bit stream:

each bit in the input stream is an independent (potentially biased) coin toss
each bit in the input stream has identical bias (say $p=0.7$ for 1)

In other words, the input stream is made up of indepedent identical (iid) random variables. In this case, the von Neumann rule works:

break the input stream into non-overlapping 2 bit pairs
if the 2 bits are 00 or 11, don't emit
if the 2 bits are 01, emit 1
if the 2 bits are 10, emit 0

This works because the probability for both 01 and 10 is $p(1-p)$. Interestingly, this is the best possible approach, whether we know the bias of the input bit stream or not. The code for the VonNeumannExtractor is below:

class VonNeumannExtractor(RandomnessExtractor):
    def __init__(self, bit_source: BitSource):
        self.bit_source = bit_source
        self.bits = []

    def next(self) -> Optional[int]:
        bit = self.bit_source.next()
        if bit not in [0, 1]:
            raise ValueError("Bit must be 0 or 1.")
        self.bits.append(bit)
        result = None
        if len(self.bits) == 2:
            if self.bits == [0, 1]:
                result = 1
            elif self.bits == [1, 0]:
                result = 0
            self.bits = []  # Reset for the next pair of bits
        print_bits(bit, result)
        return result

Using it on a biased source with unknown bias:

bit_source = BiasedBitSource()
extractor = VonNeumannExtractor(bit_source)

for _ in range(100):
    result = extractor.next()

Prints something like:

0 .
1 1
1 .
0 0
0 .
0 .
...

Varying bias and the parity extractor

In the above case, we assumed that the input stream is made up of independent, identically distributed bits. What if we give up the second condition, ie. we allow each bit in the input bit stream to have different bias? In this case, the Von Neumann trick will not work, as subsequent bits no longer have identical probabilities, so 01 and 10 no longer have equal probability. In this case, it can be shown that taking N bit blocks of the input stream, and returning the parity bit as the output stream reduces the bias, and as N gets larger, makes the output stream closer to an even coin toss.

Let's define an input stream with varying bias:

class VaryingBiasedBitSource(BitSource):
    def __init__(self, min_cutoff=0, max_cutoff=1):
        self.min_cutoff = min_cutoff
        self.max_cutoff = max_cutoff
        self.bias = random()

    def next_bias(self, draw):
        bias = draw**2
        if random() < 0.5:
            bias = 1 - bias
        bias = max(self.min_cutoff, bias)
        bias = min(self.max_cutoff, bias)
        return bias

    def next(self) -> int:
        draw = random()
        if draw < self.bias:
            result = 0
        else:
            result = 1
        self.bias = self.next_bias(draw)
        return result

Note that there is no one right way to define the way the bits vary. I chose a specific rule to compute the next bias (with some functional and some random elements). Next, let's define the ParityExtractor:

class ParityExtractor(RandomnessExtractor):
    def __init__(self, bit_source: BitSource, N: int):
        self.bit_source = bit_source
        if N <= 0:
            raise ValueError("N must be a positive integer.")
        self.N = N
        self.bits = []

    def next(self) -> Optional[int]:
        bit = self.bit_source.next()
        if bit not in [0, 1]:
            raise ValueError("Bit must be 0 or 1.")
        self.bits.append(bit)
        result = None
        if len(self.bits) == self.N:
            result = sum(self.bits) % 2
            self.bits = []  # Reset for the next block of N bits
        print_bits(bit, result)
        return result

Let's try it:

bit_source = VaryingBiasedBitSource(min_cutoff=0.1, max_cutoff=0.7)
extractor = ParityExtractor(bit_source=bit_source, N=4)

for _ in range(100):
    result = extractor.next()

Prints something like:

0 .
1 .
0 .
0 1
0 .
1 .
1 .
0 0
0 .
1 .
1 .
1 1
0 .
...

Markov chain input stream and the Blum extractor

Let's now look at a case when the input bits are emitted by a Markov Chain process, ie. there are $k$ states, each state has transitions to two other states, with one transition having $p$ probability and emitting a 0 bit, the other having $1-p$ probability and emiting a 1 bit. Let's define MarkovChainBitSource with an example:

class MarkovChainBitSource(BitSource):
    def __init__(self, start_state: str):
        self.current_state = start_state

    @abstractmethod
    def get_transitions(self, state: str) -> Dict[int, Tuple[str, float]]:
        pass

    def next(self) -> int:
        transitions = self.get_transitions(self.current_state)
        rand = random()
        cumulative_probability = 0.0
        for bit, (next_state, probability) in transitions.items():
            cumulative_probability += probability
            if rand < cumulative_probability:
                self.current_state = next_state
                return bit
        raise RuntimeError("Probabilities do not sum to 1.")


class ExampleMarkovChainBitSource(MarkovChainBitSource):
    def __init__(self):
        super().__init__(start_state='A')
        self.transitions = {
            'A': {0: ('B', 0.5), 1: ('C', 0.5)},
            'B': {0: ('A', 0.7), 1: ('C', 0.3)},
            'C': {0: ('A', 0.4), 1: ('B', 0.6)},
        }

    def get_transitions(self, state: str) -> Dict[int, Tuple[str, float]]:
        return self.transitions[state]

Further, let's assume that our randomness extractor has access to the internal state of the Markov chain. In this case, the problem can be reduced to the Von Neumann case: for each state, we need to remember the last input bit the Markov chain produced when we were in this state the last time, take the current bit produced exiting this state, and apply the Von Neumann rule. Implementing this in Python is surprisingly simple:

class BlumExtractor(RandomnessExtractor):
    def __init__(self, bit_source: MarkovChainBitSource):
        self.bit_source = bit_source
        self.state_bits = defaultdict(list)  # Stores the last bits for each state

    def next(self) -> Optional[int]:
        current_state = self.bit_source.current_state
        bit = self.bit_source.next()
        self.state_bits[current_state].append(bit)
        result = None
        if len(self.state_bits[current_state]) == 2:
            bits = self.state_bits[current_state]
            if bits == [0, 1]:
                result = 1
            elif bits == [1, 0]:
                result = 0
            self.state_bits[current_state] = []
        print_bits(bit, result)
        return result

Let's try it:

bit_source = ExampleMarkovChainBitSource()
extractor = BlumExtractor(bit_source)

for _ in range(100):
    result = extractor.next()

Prints something like:

0 .
1 .
1 .
0 0
1 1
0 0
0 .
0 .
1 1
...

General case

Can we produce perfectly uniform bits from a biased input source that has some randomness (so it's not a deterministic process), but the bits may be correlated, and we don't know how? The answer is no, if we don't know anything about the structure of the input stream, there is no general way to create even a single perfectly uniform bit.

Conclusion

An interesting test for this code would be to apply statistical tests on the generated output stream to test how random they are. I will do this in a subsequent post.