Lamport's Byzantine Generals algorithm in Python

Marton Trencseni - Wed 30 July 2025 - Distributed

Introduction

Imagine a group of generals of the Byzantine army camped with their troops around an enemy city. The generals must agree upon a common battle plan: either attack or retreat. However, some of them may be traitors! The non-traitor generals succeed only if all of them choose the same action — either everyone attacks or everyone retreats; if even one non-traitor general does something different, the battle is lost.

In their seminal paper The Byzantine Generals Problem Lamport, Shostak and Pease pose this problem with the analogue of the Byzantine generals, and give solutions using distributed algorithms. Their abstract reads:

Reliable computer systems must handle malfunctioning components that give conflicting information to different parts of the system. This situation can be expressed abstractly in terms of a group of generals of the Byzantine army camped with their troops around an enemy city. Communicating only by messenger, the generals must agree upon a common battle plan. However, one or more of them may be traitors who will try to confuse the others. The problem is to find an algorithm to ensure that the loyal generals will reach agreement. It is shown that, using only oral messages, this problem is solvable if and only if more than two-thirds of the generals are loyal; so a single traitor can confound two loyal generals. With unforgeable written messages, the problem is solvable for any number of generals and possible traitors. Applications of the solutions to reliable computer systems are then discussed.

The problem asks: how can distributed processes reach agreement when a subset of them — up to $M$ in a total of $N$ nodes — might behave arbitrarily, lying, omitting, or forging messages? Failures of this kind are called Byzantine because, like traitorous generals, they do not merely crash; they actively try to mislead the rest of the system.

In this article, we will implement Lamport's classic solution for the Byzantine Generals problem, referred to as the oral messages version of the problem; this is the version we are considering in this article.

The code is on Github: mtrencseni/dca/byzantine/simple

Problem statement

The problem statement in plain English is the following: there are $N$ generals, $M$ of which is a traitor or faulty. They need to agree on a common order, which is commonly taken to be the two values attack and retreat. One of the generals is the king, who proposes the initial value (attack or retreat) and kicks off the algorithm. Note that the king may also be a traitor! Furthermore, if the king is not a traitor, we want all the other non-traitor generals to follow the king's orders.

Note: in Lamport's paper, the term commander is used instead of king; however, I find Lamport's terminology confusing, because the term commander is re-used in later stages of the algorithm for other nodes as well. So in this discussion, I avoid the term commander altogether.

The assumption is that if all non-traitors generals attack at the same time, they will be victorious, otherwise they get defeated. If they all agree to retreat, that is also an acceptable outcome. The act of attacking or retreating in the analogue story of Byzantine Generals is irrelevant, the point is that they all agree on a value. In this sense, attack and retreat are arbitrary, the value can be an arbitrary string — the point is that all non-traitors should agree on the same value.

The challenge is to find an algorithm for exchanging messages between the generals that fulfills these requirements.

Note that if we can assume that the king is not a traitor, the solution is trivial: the king just sends his orders (value) to the other generals, and they do what the king told them to do. The problem is only interesting if the king may be one of the $M$ traitors.

Assumptions

Translating the problem to the language of modern computers, we make the following assumptions:

There are $N$ nodes which communicate with each other by sending messages.
Of the $N$ total nodes, $M$ are traitors (malicious, faulty, misbehaving, down).
The non-traitor nodes do not know who the traitors are. The traitors can know of each other and can collude.
One of the nodes (the king) initiates the distributed algorithm with an initial value.
For all messages received, nodes can identify the sender node — traitors cannot spoof messages.
Non-traitor nodes follow the distributed algorithm given.
Traitor nodes do not follow the distributed algorithm, they can:
- not send some or all messages
- send duplicate messages
- send malformed messages
- send well-formed messages that do not follow the distributed algorithm
- send messages with a delay
- send messages out-of-order

In terms of implementation, most of the traitor behaviour must be handled at the software engineering level. For example, how long should non-traitor nodes wait for messages from other nodes, before they stop waiting and assume a default value in order to advance the algorithm? These considerations are not in scope for this article's toy implementation. In the toy implementation given here, traitors essentially follow the protocol in terms of sending messages, but they send on the wrong value (e.g. retreat instead of attack) in an attempt to confuse the other nodes.

Algorithm

Lamport, Shostak and Pease proved that no algorithm exists for $N < 3M+1$ that solves the Byzantine Generals problem, and gave a distributed algorithm that works for the $N \geq 3M+1$ case. Let's illustrate the challenge and solution approach in the $(M, N) = (1, 4)$ case. Let's start with the case when the king is the traitor, and the king sends conflicting messages to the other 3 non-traitor generals: attack, retreat, retreat. If the generals were just to follow what the king tells them to do, the one who attacks would get defeated. Either all 3 need to attack or all 3 need to retreat. The basic idea is that they forward to each other what the king told them, collect these values, and then they all run an identical majority() function on the collected values to arrive at a common decision. Let's use the following majority() function which uses the value with the higher count, or retreat in case of a tie:

def majority(values):
    c = Counter(values)
    return "attack" if c["attack"] > c["retreat"] else "retreat"

Assuming nodes (generals) are numbered from 0 and the king is node 0, in the previous example the traitor king sent the following values (orders) to:

node 1: attack
node 2: retreat
node 3: retreat

Let's call this initial phase of the king sending out these initial orders the $k=1$ phase. The remaining nodes now all run the $k=0$ phase; here they send each other the value the king sent them:

node 1 sends to node 2 and 3 the value he received, attack
node 2 sends to node 1 and 3 the value he received, retreat
node 3 sends to node 1 and 2 the value he received, retreat

Overall, the three nodeswill have the following values:

node 1: attack (from the king), retreat, retreat
node 2: retreat (from the king), attack, retreat
node 3: retreat (from the king), attack, retreat

The point of the algorithm is that all 3 nodes will end up with the same three pooled values: attack, retreat, retreat. Then they can run their identical majority() function on it, which would return retreat, and all 3 would retreat. The traitor king was not able to fool them.

Note that the generals don't know if the king is a traitor, but the algorithm has to work in that case. What happens if the king is not a traitor, but one of the other 3 generals is? In that case, the king sends the same order initially to all 3 nodes. Let's say that the king's order was to attack (but this is arbitrary, it could be retreat, or any other text). The king sends to:

node 1: attack
node 2: attack
node 3: attack

The nodes then run the same algorithm, but one of them is a traitor. Let's say the traitor is node 3, and he breaks the algorithm, and forwards retreat instead of the attack order he received from the king.

node 1 sends to node 2 and 3 the value he received, attack
node 2 sends to node 1 and 3 the value he received, attack
node 3 is a traitor and sends to node 1 and 2 the incorrect value retreat

Overall, the two non-traitor nodes 1 and 2 will have the following values:

node 1: attack (from the king), attack, retreat
node 2: attack (from the king), attack, retreat

Since they both have the values attack, attack, retreat, when they run their majority() function on this set they will both attack. The king doesn't take part in the algorithm and always just follows his initial order, so in the end, all 3 non-traitor nodes (node 0 [the king], 1 and 2) attack; as before, the traitor node was not able to fool them.

Note that we don't care what traitor nodes do — since they're traitors (or faulty), they don't follow the algorithm anyway.

In the $(M, N) = (1, 4)$ case, an easy to understand 2-phase protocol works: in phase $M=1$ the king sends out the initial orders, and then in phase $M=0$ the other nodes let each other know what the king sent them to make sure they all end up with the same majority value.

Let's look at the $(M, N) = (2, 7)$ case. Let's again assume that the king is a traitor and sends the following values to the other 6 generals:

node 1: attack
node 2: attack
node 3: attack
node 4: retreat
node 5: retreat
node 6: retreat

There are $M=2$ traitors, let's assume that node 6 is the second traitor. Similarly to the king, node 6 will also forward different values to the other nodes, to shift their majority balance and cause them to disagree. Node 6 forwards to:

node 1: retreat
node 2: retreat
node 3: retreat
node 4: attack
node 5: attack

The other nodes (1..5) are non-traitors and follow the algorithm, so they forward the value they received from the king. Each of them ends up with 6 values (1 from the king and 5 from the other generals), and assuming the algorithm concludes here, calls majority() on them.

Let's look at what node 1 received, from:

node 0: attack (in first stage)
node 2: attack
node 3: attack
node 4: retreat
node 5: retreat
node 6: retreat (traitor)

Running majority() on the above 6 values will yield retreat for node 1.

Now, let's look at what node 4 received, from:

node 0: retreat (in first stage)
node 1: attack
node 2: attack
node 3: attack
node 5: retreat
node 6: attack (traitor)

Running majority() on the above 6 values will yield attack for node 4. Node 1 and 4 don't agree!

The solution is to run one more stage of the algorithm, to double check the values received in the previous stage again. In the example just given, non-faulty generals can send each other what the traitor node 6 sent them, pool it (with the original value from node 6), run majority() on it, and then use that majority value, which is guaranteed to match (since no more traitors are left in this group, so they are truthful to each other), instead of the value they originally received from 6. Specifically, if they do this, they will all end up with the 5 values that node 6 forwarded to them (retreat, retreat, retreat, attack, attack), which will result in majority(...) = retreat. So with this additional verification step, node 4 would change his last value from attack to retreat, and also retreat, and the algorithm keeps working!

Of course the nodes don't actually know who is a traitor and who is not, so they have to run this additional third stage for all messages received in the second stage, which results in an exploding number of messages. So, in the $M=2$ case, 3 stages of messages are required. Similarly, in the $M=3$ case, 4 stages would be required, to keep checking for traitors breaking the majority pooled values.

In general, with $M$ traitors among $N \geq M+1$ nodes, $M+1$ stages are needed to ensure the correctness of the algorithm. These stages are referred to as the $k=M$ stage in which the king sends his order to the other $N-1$ generals (first stage), all the way down to the $k=0$ stage. This nomenclature of decreasing $k$s originates in Lamport's original paper.

Message paths

In the $M=1$ example above, the 3 generals send each other in stage 0 what the king sent them in stage 1, but they don't send it back to the king. In the second, $M=2$ example, when nodes 1 to 5 are checking what node 6 sent them, they don't send anything back to node 0 or 6. In general, the later stages of checking happen between nodes that were not on the path of the message cascade that is being checked. To know this "path" that the messages took, the messages contain a path field, which is a series of node ids. The last id is always the id of the node sending the message. In the example above, the king's original messages have path [0], node 6's messages that are being checked by the other nodes have path [0, 6]. Note that it's not the case that the same message is traveling through this path — path here means that 6 is sending messages because 0 sent it an original message that triggered the subsequent messages.

Maintaining these paths is important, without them the nodes wouldn't know what message they are verifying, as at higher $M$, an arbitrarily high number of messages are exchanged by nodes in later stages. Without the messages containing the path, the nodes wouldn't know what to do with the values.

An interesting question is, what happens if the traitor nodes change the paths, i.e. they send [0, 1, 2, 3] instead of [0, 2, 3, 4]? Can they break the algorithm like that? Interestingly, the answer is again that this is an engineering issue, theoretically they cannot break the algorithm:

In our examples and implementation, the first element of the path always has to be 0, since it's always node 0 who initiates the algorithm and starts the message cascade; so if a node received a path from a traitor node that doesn't start with a 0, he can just ignore it.
If a traitor node sends two messages, with matching paths, but different values, the receiving node can notice this, discard the second and later ones and just continue with the first one received; the sending node is a traitor anyway, so as long as $N \geq 3M+1$, traitors sending bad values are handled by the algorithm, so it doesn't matter which message is kept.
If a traitor sends a message with the last element not matching his own id, the receiver can just discard it since per our assumption nodes cannot spoof each other's identity, so the receiver knows the sender's id doesn't match.
If the traitor inserts or changes some arbitrary node ids in the "middle" (not first, not last) of the path, that is also not a problem: if it's an existing path that he already sent (or will send), that reduces to case 2 above; and if it's a path that the receiver doesn't actually need or expect per the algorithm, then the receiver can just discard. To understand this, note that the message cascade of the algorithm is perfectly deterministic, and hence the different paths that a receiver expects and needs to receive are known to him!

So, interestingly, spoofing paths does not theoretically break Lamport's distributed algorithm, even though in a production implementation these cases would need to be handled (with e.g. timeouts). For this reason, in our toy implementation, traitor nodes don't interfere with the paths they're passing on, and only change the values themselves.

Timeouts and default values

One more thing to address is default values. What happens if traitor nodes simply do not send certain messages, or just go silent and don't take part in the algorithm. This is again an engineering issue, as it can always be handled with timeouts and default values. In other words, if a traitor node doesn't send any value, we can always assume they did send some value that is "not the right value". Assuming that $N \geq 3M+1$ and our algorithm works, in these timeout (or similar) cases we can always just default to something (e.g. retreat) and pretend that that's what the node sent us, and overall the algorithm will still be correct (all non-traitor nodes will decide on the same value).

For this reason, the traitors in the toy implementation given here always do their part in the message cascade (but pass on different values), to keep the implementation clean from timeout/retry/default logic and easy to understand.

Two phases

The algorithm fundamentally has two phases:

receiving and sending messages (message cascade)
using the received message to recursively majority check the messages in the cascade tree

In Lamport's original paper, the two phases above are defined in an interleaved way, which I found very hard to understand. In the toy implementation given here, for pedagogical purposes, the two phases are separated:

Phase 1, Message Cascade: nodes recursively forward messages to each other, extending the path fields, for $M+1$ stages; all nodes store all received messages for later processing.
Phase 2, Decide: once the message cascade has finished, nodes use the messages stored in phase 1 to arrive at a final value (attack or retreat); no messages are exchanged in this phase.

Python implementation with HTTP Flask servers

Similarly to the previous distributed algorithms, I use Flask HTTP servers here. A driver fires up $N=3M+1$ Python processes, one for each node, of which $M$ are instructed to be traitors on the command line. For simplicity, the driver always makes node 0 the king (but the implementation of the algorithm in general.py does not have this assumption). The driver kicks off the king using the /start endpoint once all the nodes are up and running, who then kicks off the message cascade of $M+1$ stages. The driver checks each node using /status whether it has received all messages it needs, i.e. has phase 1 finished. Then it calls /decide to trigger them to make the decision based on the messages they have received and stored. Note that this two phase behaviour — specifically controlled by the driver — is to make the algorithm flow easier to understand, for example so that debug logs during the decision phase can be printed by the nodes sequentially, without interleaving. The nodes could trigger their own decide phase once they have received all messages automatically, i.e. the nodes could transition to phase 2 without the driver instructing them to do so.

The driver.py code:

m = int(sys.argv[1]) if len(sys.argv) >= 2 else 1
n = 3*m + 1

# randomly pick m traitors (could include commander 0)
traitors = set(random.sample(range(n), m))
print(f'Traitors: {traitors}')

def spawn(mod, *args):
    return subprocess.Popen([sys.executable, mod, *map(str, args)])

procs = []
# start every general
for gid in range(n):
    traitor = 1 if gid in traitors else 0
    procs.append(spawn("general.py", gid, n, m, traitor))
# let servers come up
time.sleep(1)
# kick off message cascade by telling the commander to issue his order
requests.post("http://127.0.0.1:8000/start", json={"order": "attack" if random.random() < 0.5 else "retreat"})
# wait until all generals report done
while True:
    time.sleep(0.1)
    try:
        if all(requests.get(f"http://127.0.0.1:{8000+i}/status").json()["done"] for i in range(n)):
            break # all done, exit waiting loop
    except:
        pass
# tell each general to decide based on what they've seen so far
print('Decisions:')
decisions = set()
for i in range(n):
    if i in traitors:
        print(f'General {i} is a traitor')
    else:
        value = requests.post(f"http://127.0.0.1:{8000+i}/decide").json()["value"]
        print(f'General {i} decided {value}')
        decisions.add(value)
if len(decisions) == 1:
    print(f'SUCCESS: all non-traitor generals decided to {value}!')
else:
    print('FAILURE: non-traitor generals decided differently')
# stop all generals
for p in procs:
    p.kill()

The generals are implemented in general.py. The /start endpoint is quite simple, it just sends the first stage of messages to all the other generals. If the king is not a traitor, the value specified by the driver is broadcast to all other generals, otherwise the king sends alternating attack and retreat orders:

@app.route("/start", methods=["POST"])
def start():
    # /start is essentially like order with: path=[], value=order, k=0
    global value
    value = request.get_json()["order"]
    if not traitor:
        broadcast([id], value)
    else:
        for i in range(0, n):
            if i == id: continue # don't order myself
            msg = {"path": [id], "value": other_value(value) if traitor and i % 2 == 0 else value}
            async_order(i, msg)
    global done
    done=True
    return jsonify(ok=True)

The second relevant endpoint is /order, where generals receive messages from the king, and from each other in subsequent stages of the message cascade.

@app.route("/order", methods=["POST"])
def order():
    msg = request.get_json()
    path = list(msg["path"])
    value = msg["value"]
    k = 1 + m - len(path)
    msg = {"path": path, "value": value}
    received_values[tuple(path)] = value
    if k > 0:
        broadcast(path + [id], other_value(value) if traitor else value)
    if all_messages_received():
        global done
        done = True
    return jsonify(ok=True)

The code is quite simple:

the value in this message is saved for later processing in received_values
if the k value, which numbers the stages of the message cascade and starts at M has not yet reached 0, the message is broadcast to all nodes not on the path:

def broadcast(path, value):
    for i in range(n):
        if i not in path:
            async_order(i, msg={"path": path, "value": value})

The final step in order() is to checks whether this is the last message that we expect to receive in the message cascade — this will be covered in a later section. If all messages have been received, the done flag is set.

Deciding

The driver repeatedly check whether all messages are done using their trivial /status endpoint:

@app.route("/status")
def status():
    return jsonify(done=done)

Once all nodes report being done (all nodes received all messages in the cascade), the driver calls each node's /decide endpoint. It is at this stage that the saved incoming messages are actually processed to decide on the final value. The algorithm used here is traditionally called OM(), the same nomenclature Lamport used in his original paper:

@app.route("/decide", methods=["POST"])
def decide():
    assert(done)
    global value
    if value is None:
        value = OM(path=shortest_path())
    return jsonify(value=value)

The semantics of the recursive algorithm OM(path) is essentially: what is the majority decision for the value of the message identified by path, and we call it with path=shortest_path() since it's the original message (and value) received from the king that we want to decide on. Since the driver always picks node 0 as the king, shortest_path() actually returns [0]:

def shortest_path():
    return list(min(received_values.keys(), key=len, default=None))

Now, on to the implementation of OM():

def OM(path):
    k = 1 + m - len(path)
    value = received_values[tuple(path)]
    if k == 0:
        # print(f'node={id}: in stage {k} for OM({k}) for path={path} I am just returning the raw received message -> {value}')
        return value
    values = [value]
    # others = []
    for i in range(n):
        if i not in list(path) + [id]:
            # others.append(i)
            values.append(OM(list(path)+[i]))
    # print(f'node={id}: in stage {k} I received from path={path} value {vals[0]}, my OM({k-1}) values from others ({others}) are {vals[1:]} -> {majority(values)}')
    return majority(values)

The algorithm OM() implements Lamport's logic explained in the introduction, by checking for each received value (up to $k>1$) from node X, what are other nodes reporting about this same value sent by X. It's best understood by looking at the traces (that are commented out above and on Github) in an example run. Below is an example run with $(M, N) = (2, 7)$, just showing the above traces for one of the nodes:

$ python3 driver.py 2
Traitors: {3, 5}
...
node=1: in stage 0 for path=[0, 2, 3] I am just returning the raw received message -> retreat
node=1: in stage 0 for path=[0, 2, 4] I am just returning the raw received message -> attack
node=1: in stage 0 for path=[0, 2, 5] I am just returning the raw received message -> retreat
node=1: in stage 0 for path=[0, 2, 6] I am just returning the raw received message -> attack
node=1: in stage 1 I received from path=[0, 2] value attack, my OM(k=0) values from others ([3, 4, 5, 6]) are ['retreat', 'attack', 'retreat', 'attack'] -> attack
node=1: in stage 0 for path=[0, 3, 2] I am just returning the raw received message -> retreat
node=1: in stage 0 for path=[0, 3, 4] I am just returning the raw received message -> retreat
node=1: in stage 0 for path=[0, 3, 5] I am just returning the raw received message -> attack
node=1: in stage 0 for path=[0, 3, 6] I am just returning the raw received message -> retreat
node=1: in stage 1 I received from path=[0, 3] value retreat, my OM(k=0) values from others ([2, 4, 5, 6]) are ['retreat', 'retreat', 'attack', 'retreat'] -> retreat
node=1: in stage 0 for path=[0, 4, 2] I am just returning the raw received message -> attack
node=1: in stage 0 for path=[0, 4, 3] I am just returning the raw received message -> retreat
node=1: in stage 0 for path=[0, 4, 5] I am just returning the raw received message -> retreat
node=1: in stage 0 for path=[0, 4, 6] I am just returning the raw received message -> attack
node=1: in stage 1 I received from path=[0, 4] value attack, my OM(k=0) values from others ([2, 3, 5, 6]) are ['attack', 'retreat', 'retreat', 'attack'] -> attack
node=1: in stage 0 for path=[0, 5, 2] I am just returning the raw received message -> retreat
node=1: in stage 0 for path=[0, 5, 3] I am just returning the raw received message -> attack
node=1: in stage 0 for path=[0, 5, 4] I am just returning the raw received message -> retreat
node=1: in stage 0 for path=[0, 5, 6] I am just returning the raw received message -> retreat
node=1: in stage 1 I received from path=[0, 5] value retreat, my OM(k=0) values from others ([2, 3, 4, 6]) are ['retreat', 'attack', 'retreat', 'retreat'] -> retreat
node=1: in stage 0 for path=[0, 6, 2] I am just returning the raw received message -> attack
node=1: in stage 0 for path=[0, 6, 3] I am just returning the raw received message -> retreat
node=1: in stage 0 for path=[0, 6, 4] I am just returning the raw received message -> attack
node=1: in stage 0 for path=[0, 6, 5] I am just returning the raw received message -> retreat
node=1: in stage 1 I received from path=[0, 6] value attack, my OM(k=0) values from others ([2, 3, 4, 5]) are ['attack', 'retreat', 'attack', 'retreat'] -> attack
node=1: in stage 2 I received from path=[0] value attack, my OM(k=1) values from others ([2, 3, 4, 5, 6]) are ['attack', 'retreat', 'attack', 'retreat', 'attack'] -> attack
General 1 decided attack
...

A final note: the king already sets their value when the driver calls the /start endpoint, they just decide on the value they proposed originally and sent to all the other generals.

Counting messages in the message cascade

Counting the expected messages in the message cascade can be done in either of two ways:

simulate the message cascade explicitly and count
find the closed-form formula

Both are simple. The first option can be implemented with a few lines of code, like:

def count_expected_messages():
    expected_messages = defaultdict(int)
    #
    def receive_message(node, path):
        k = 1 + m - len(path)
        expected_messages[node] += 1
        if k == 0: return
        for i in range(n):
            if i not in path+[node]:
                receive_message(i, path+[node])
    #
    for i in range(1, n):
        receive_message(i, path=shortest_path())
    return expected_messages[id]

For the second, which is what this implementation actually uses, the formula is:

def expected_messages(k):
    n = 3 * m + 1
    return perm(n-2, m-k)

The function used in the /order endpoint to check whether all messages have been received is then as simple as:

total_expected = sum(expected_messages(k) for k in range(m+1))
def all_messages_received():
    total_received = len(received_values)
    if total_received < total_expected:
        return False
    return True

Let's look at the number of messages received per node, for different values of $M$, in each stage (note that $k=0$ is always the last stage):

	$M=1$	$M=2$	$M=3$	$M=4$
$k=0$	$2$	$20$	$336$	$7\,920$
$k=1$	$1$	$5$	$56$	$990$
$k=2$		$1$	$8$	$110$
$k=3$			$1$	$11$
$k=4$				$1$
total	$3$	$26$	$401$	$9\,032$

Note that the king sends one message to each of the other $N-1$ nodes (the first $k=M$ messages), and then no other messages are sent or received by the king (since all subsequent messages carry a path with the king's node id as the first element).

Sample runs

Below are some sample runs for different $M$ values ($N=3M+1$ always):

$ python3 driver.py 1
Traitors: {0}
Decisions:
General 0 is a traitor
General 1 decided retreat
General 2 decided retreat
General 3 decided retreat
SUCCESS: all non-traitor generals decided to retreat!

$ python3 driver.py 1
Traitors: {3}
Decisions:
General 0 decided attack
General 1 decided attack
General 2 decided attack
General 3 is a traitor
SUCCESS: all non-traitor generals decided to attack!

$ python3 driver.py 2
Traitors: {0, 5}
Decisions:
General 0 is a traitor
General 1 decided retreat
General 2 decided retreat
General 3 decided retreat
General 4 decided retreat
General 5 is a traitor
General 6 decided retreat
SUCCESS: all non-traitor generals decided to retreat!

$ python3 driver.py 3
Traitors: {4, 6, 8}
Decisions:
General 0 decided attack
General 1 decided attack
General 2 decided attack
General 3 decided attack
General 4 is a traitor
General 5 decided attack
General 6 is a traitor
General 7 decided attack
General 8 is a traitor
General 9 decided attack
SUCCESS: all non-traitor generals decided to attack!

$ python3 driver.py 4
Traitors: {0, 1, 5, 7}
Decisions:
General 0 is a traitor
General 1 is a traitor
General 2 decided retreat
General 3 decided retreat
General 4 decided retreat
General 5 is a traitor
General 6 decided retreat
General 7 is a traitor
General 8 decided retreat
General 9 decided retreat
General 10 decided retreat
General 11 decided retreat
General 12 decided retreat
SUCCESS: all non-traitor generals decided to retreat!

Too complicated for 2025 AIs

For the last two years I've been using LLMs, specifically the latest ChatGPT models for coding. These LLMs are especially useful for the 100-1000 line programs I write for Bytepawn articles. In many cases, ChatGPT is able to produce a nearly complete solution on the first go. Interestingly, when working on the code for this article, OpenAI's o3 model repeatedly failed to produce a working implementation, even after ~10 attempts (both with refining, showing error logs, and starting over from scratch). It seems that the Byzantine Generals problem is (i) too complicated, (ii) the descriptions of the algorithm are too vague, and (iii) too sparsely discussed and implemented on the Internet — hence rare in its training data — so o3 can't get it right. On the other hand, when I asked it to switch from blocking HTTP calls (between generals) to a non-blocking solution, it gave a working solution in one go. Perhaps unsurprisingly given their training data, LLMs are much better software engineers than they are computer scientists.

Conclusion

By implementing Lamport's message cascade and majority vote logic in code, we see that the theoretical guarantees translate to a working system: every non-traitor node decides identically once the $M+1$ stages of forwarding finish. Although the exponential message growth makes the algorithm impractical for large $M$, the exercise lays a clear foundation for understanding modern, more efficient Byzantine-fault-tolerant protocols.

In future posts, I plan to look at:

engineering optimizations (e.g. use raw TCP connections between generals instead of HTTP)
enable multiple runs of the algorithm, initiated by different nodes
implement Lamport's second Byzantine Generals algorithm, which uses unforgeable signed messages (digital signatures)