Single round-trip optimization for Paxos with PaxosLease masters

Marton Trencseni - Sat 25 April 2026 - Distributed

Introduction

In the previous article, Combining Paxos and PaxosLease, I combined the durable Paxos replicated log with PaxosLease-based master election. The result was a small master-based replicated state machine: Paxos still decided the log, but PaxosLease decided which node was currently allowed to drive the log. In that implementation, non-master nodes refused client writes, and clients had to discover and connect to the current master.

That was a useful step, but it left one obvious inefficiency in place. Even though the system now had a master, every command still ran a full Paxos round:

prepare -> propose -> learn

The previous article ended exactly there: once you have a stable master, it is unnecessary for every new log slot to keep paying for both Paxos phases. The master is already the only node serving writes. In the steady state, for new Paxos rounds, we should be able to skip the prepare step and go directly to the propose step. This article makes that optimization, but in the smallest possible way, by adding one boolean flag: skip_prepare, which is set when the lease is gained or lost, and passed to the Paxos proposer. The code is on Github.

The inefficiency

In the previous implementation, the /command endpoint looked like this:

@app.route("/command", methods=["POST"])
def endpoint_command():
    data = request.get_json(force=True, silent=True) or {}
    if "command" not in data:
        return jsonify({"error": "Missing 'command' in JSON body"}), 400

    if not paxoslease_proposer.is_master():
        master = find_master()
        return jsonify({
            "status": "not_master",
            "reason": "This node is not the master; send the command to the master node.",
            "master": master,
        })

    round_id = get_current_round()
    result = proposer.paxos_round(round_id, data["command"])
    if result.get("status") == "success":
        advance_round(round_id + 1)
    return jsonify(result)

The master check is new compared to plain leaderless Paxos, but the Paxos round itself is still classic single-decree Paxos repeated independently for every Paxos round, or replicated log slot. Inside paxos_round(), the proposer increments its proposal number, sends prepare messages, waits for a majority of promises, chooses the highest already accepted value if needed, sends propose messages, waits for a majority of accepts, and finally broadcasts learn.

The prepare phase is what lets Paxos tolerate multiple proposers. Without it, two proposers could both try to write different values into the same round. One proposer might get some acceptors to accept value A, while another proposer might get other acceptors to accept value B. If neither proposer first asks acceptors what they have already promised or accepted, then a later proposal could overwrite earlier progress and violate the core Paxos rule: once a value may have been chosen, every later successful proposal for that round must preserve that same value.

Prepare prevents this. It makes the proposer ask a majority of acceptors: “have you already accepted something for this round?” If the answer is yes, the proposer must adopt the highest-numbered accepted value it sees. This is why different proposers do not create split-brain decisions for the same log slot. They may race, but the prepare phase forces them to converge on the same value once any value has made enough progress.

Once PaxosLease has elected a master, the situation is simpler. Only the lease holder should be accepting client commands, so there should be only one active proposer for the replicated log. After that proposer has successfully completed one full Paxos round in the current master epoch, the following rounds are fresh slots allocated by the same master. For those rounds, there are no competing client-command proposers and no earlier accepted value to discover. The master can effectively behave as if the prepare phase succeeded with empty acceptor state, and jump straight to propose.

The first round after becoming master is different. A new master cannot assume the next Paxos round is clean. The previous lease holder may have sent prepare or propose messages before losing the lease. It may even have completed the propose phase with a majority, meaning the value is already chosen according to Paxos, but then crashed or lost its lease before the learn messages reached everyone. In that case, the value is “stuck” in the acceptors: not necessarily learned everywhere, but already accepted by enough nodes that Paxos must preserve it.

So the new master must run a full prepare phase for its first Paxos round. That prepare phase discovers any accepted value left behind by the previous master. If such a value exists, the new master must propose that value rather than its own client command. Only after this full round succeeds can the master safely switch to the optimized steady state and skip prepare for subsequent fresh rounds in the same lease epoch.

Single-round optimization

In classic Paxos, the prepare phase serves an important purpose. It is the phase where a proposer asks acceptors:

Have you already promised a higher proposal id?
Have you already accepted a value for this round?

The second question is the key one. If some value was already accepted by a majority, or might still become chosen, a new proposer must not overwrite it with a different value. This is why the proposer inspects the prepare responses and, if needed, adopts the highest-numbered accepted value instead of its own initial value.

So prepare is what makes it safe for a new proposer to enter a Paxos round that may already have history.

In the previous implementation, every command ran Paxos as if every log slot might have history:

1st command: prepare -> propose -> learn
2nd command: prepare -> propose -> learn
3rd command: prepare -> propose -> learn

That is conservative and correct, but once PaxosLease has elected a stable master, it is more work than necessary.

After the master has completed one full Paxos round, the situation changes. The master is now the only node that should be accepting client commands. It allocates log slots monotonically using current_round, and each successful command advances the round:

round_id = get_current_round()
result = proposer.paxos_round(
    round_id,
    data["command"],
    skip_prepare=get_skip_prepare()
)

if result.get("status") == "success":
    advance_round(round_id + 1)
    set_skip_prepare(True)

So the second, third, fourth, etc. commands in the same stable leader epoch are going into fresh log slots allocated by the same master. There should be no competing proposer trying to write different client commands into those slots. For those fresh slots, the prepare phase would only discover empty acceptor state, so the master can go directly to propose.

That is the optimized steady state:

1st command as master:  prepare -> propose -> learn
2nd command as master:             propose -> learn
3rd command as master:             propose -> learn

The first round after becoming master is different. It cannot safely skip prepare.

When a node becomes master, it does not automatically know the full state of the next Paxos slot. There may have been an earlier master. That earlier master may have sent propose messages for the current round, reached some acceptors, then crashed or lost its lease before the value was learned everywhere. In that case, some acceptors may already have an accepted value for the round. If the new master skipped prepare in its first round, it might propose a different value into a slot that already has a partially accepted value. That is exactly the situation the prepare phase is designed to handle.

So the first command after becoming master must run full Paxos:

prepare:
    discover any previously accepted value for this round

propose:
    propose either the new command,
    or the highest-numbered previously accepted value

learn:
    publish the chosen value

Only after this succeeds do we set:

set_skip_prepare(True)

This gives us a clean leader epoch rule:

At the start of a master epoch:
    be conservative
    run full Paxos once

After one successful full round:
    assume the master owns the following fresh slots
    skip prepare until the lease is lost

And when the node loses the lease, we reset:

set_skip_prepare(False)

So every new master epoch starts with full Paxos again.

Changing `paxos.py`

The main code change is to pass a skip_prepare flag to paxos_round() and condition the prepare phase on it:

def paxos_round(self, round_id, initial_value, skip_prepare=False):
    self.increment_proposal_id()
    pid = self.state.proposal_id

    prepare_responses = []
    chosen_value = initial_value

    if not skip_prepare:
        # phase 1: prepare, same code as before
        ...

    # phase 2: propose, same code as before
    ...

    # phase 3: learn, same code as before
    ...

The important point is that this line did not move into the if not skip_prepare block:

self.increment_proposal_id()

Every Paxos round still gets a new proposal id. The boolean only controls whether we send prepare messages. The rest of the method is unchanged: propose still goes to the acceptors, learn still broadcasts the chosen value, majority is still required.

Adding `skip_prepare` to `node.py`

The flag belongs in node.py, because it is not really a Paxos property by itself. It is a property of the combined Paxos + PaxosLease node:

# whether the current master can skip Paxos prepare for fresh rounds
skip_prepare = False
skip_prepare_lock = threading.Lock()

def get_skip_prepare():
    with skip_prepare_lock:
        return skip_prepare

def set_skip_prepare(value):
    global skip_prepare
    with skip_prepare_lock:
        skip_prepare = value

Then the /command endpoint becomes:

@app.route("/command", methods=["POST"])
def endpoint_command():
    data = request.get_json(force=True, silent=True) or {}
    if "command" not in data:
        return jsonify({"error": "Missing 'command' in JSON body"}), 400

    if not paxoslease_proposer.is_master():
        set_skip_prepare(False)
        master = find_master()
        return jsonify({
            "status": "not_master",
            "reason": "This node is not the master; send the command to the master node.",
            "master": master,
        })

    round_id = get_current_round()
    result = proposer.paxos_round(
        round_id,
        data["command"],
        skip_prepare=get_skip_prepare()
    )

    if result.get("status") == "success":
        advance_round(round_id + 1)
        set_skip_prepare(True)
    else:
        set_skip_prepare(False)

    return jsonify(result)

This gives the desired behavior. When a node first becomes master, skip_prepare is false. So the first command goes through normal Paxos. If that succeeds, the node advances the log and flips skip_prepare to true. From then on, as long as the node remains master, new commands go straight to propose. If the node is not master, skip_prepare is reset to false. If a Paxos round fails, it is also reset to false. This keeps the fast path conservative: any awkward situation pushes us back to full Paxos. Note that the set_skip_prepare(False) calls here are not strictly required, the logical correctness of setting the flag to false is handled by the PaxosLease code (see below).

Resetting on lease release and expiry

Suppose a node is master and has already set skip_prepare=True. Then its lease expires, but no client sends a command to it during the interval when it is not master. Later, it becomes master again. If the flag is still true, the first command in the new master epoch would incorrectly skip prepare. So skip_prepare should be reset when the node loses the lease. For explicit lease release, this is easy:

@app.route("/paxoslease/stop", methods=["POST"])
def endpoint_paxoslease_stop():
    payload = paxoslease_proposer.release_lease()
    set_skip_prepare(False)
    return Response(
        json.dumps(payload, indent=2, sort_keys=True) + "\n",
        mimetype="application/json"
    )

For local lease expiry, a small callback to PaxosLeaseProposer handles this. The constructor now accepts on_lease_lost:

class PaxosLeaseProposer:
    def __init__(
        ...
        on_lease_lost=None,
    ):
        ...
        self.on_lease_lost = on_lease_lost
        ...

And _on_local_lease_timeout() calls it:

def _on_local_lease_timeout(self):
    with self._lock:
        print('Lease expired (local)')
        self.state.lease_owner = False
        self.state.lease_expires_at = None
        self._lease_timer = None

    self._cancel_extend_timer()

    if self.on_lease_lost is not None:
        self.on_lease_lost()

Then node.py wires the callback like this:

paxoslease_proposer = PaxosLeaseProposer(
    ...
    on_lease_lost=lambda: set_skip_prepare(False),
)

Now the invariant is simple:

If this node loses the lease, skip_prepare becomes False.

That means every new master epoch starts conservatively, with a full Paxos round.

Behavior

The behavior is now:

node becomes master:    skip_prepare = False

1st command as master:  prepare -> propose -> learn
                        skip_prepare = True

2nd command as master:             propose -> learn
3rd command as master:             propose -> learn

lease lost:             skip_prepare = False

Conclusion

PaxosLease gave the system a temporary master. Once there is a master, the common path can be cheaper. That is the main practical benefit of combining Paxos with a master lease: Paxos is still deciding the log, but the healthy steady-state command path is now just one message round trip.