PaxosLease: Extending and Releasing Leases

Marton Trencseni - Mon 15 December 2025 - Distributed

Introduction

In the previous post I implemented the basic PaxosLease algorithm in Python using HTTP and Flask. That version was deliberately minimal: a proposer could acquire a lease for T seconds, but not retain it. If the node stayed healthy and wanted to keep being master, the lease would still expire. It also had no way to give back the lease; you just had to wait for the timeout. That's fine for a demo, but not for real systems.

In this follow-up I add two features that are essential in practice: extending a lease, and releasing a lease. Extending leases lets a node hold on to mastership for as long as it needs. Releasing leases lets a worker finish its job and immediately free the resource, instead of wasting time sitting on an idle lease. Both of these extensions come straight from the original PaxosLease paper; the acceptor algorithm stays almost unchanged, and most of the changes are in the proposer.

Under the hood, this version is still the same design: Flask nodes, each running both a proposer and an acceptor, talking over HTTP. We keep timers in memory (no need to write to disk in PaxosLease), we wait a fixed time on startup to respect the "max lease time" rule, and we rely on majority quorums and ballot numbers to guarantee that at most one node believes it holds the lease at any moment. The code is on Github.

Extending and releasing leases

The basic PaxosLease algorithm gives you a lease for a fixed duration T: the proposer starts its local timer, sends propose requests, acceptors start their own timers when they accept, and everyone eventually forgets the lease after T seconds. In many real systems the lease identifies a master node that you'd like to keep stable for as long as possible. You don't want your primary database node to randomly lose leadership and switch to a different node every few seconds just because some arbitrary lease duration expired.

The paper's extension mechanism is simple: in the prepare phase, if a majority of acceptors respond either with empty state or with an existing lease for this same proposer whose timer has not yet expired, then the proposer is allowed to propose itself again as the lease holder with a fresh duration. In other words, you're allowed to "renew" your lease as long as the majority of the cluster still thinks you're the current owner. This doesn't require any changes to the acceptor; it just stores whatever lease it's told to, with a new timeout. In my code I implement this with an extra flag on the proposer (extend_existing=True) that relaxes the "must be empty" check into "empty or belongs to me."

Releasing leases goes in the opposite direction: instead of waiting for the lease to time out, the current owner may want to hand it back early. The paper's suggestion is to add a new release message that carries the ballot number of the lease you want to drop. Proposers first flip their own state from "I have the lease" to "I do not have the lease," then send release(n) to all acceptors. Each acceptor checks whether its accepted_n matches n. If it does, it clears its state and stops its timer; otherwise it ignores the message. This makes release idempotent and safe even if multiple proposers are trying things concurrently. In the code below I wired this into a /stop endpoint on the proposer, plus a new /release endpoint on the acceptor.

Code

The core of the implementation is still the acceptor and proposer classes. The acceptor hasn't changed much: it tracks promised_n, accepted_n, an accepted_value describing the lease, and a local timer. On on_propose, if the ballot is high enough, it updates its state and restarts the timer:

def on_propose(self, proposal_id, lease_owner, lease_seconds):
    with self._lock:
        if self.state.promised_n is None or proposal_id >= self.state.promised_n:
            self.state.promised_n = proposal_id
            self.state.accepted_n = proposal_id
            self.state.accepted_value = {
                "owner": lease_owner,
                "lease_seconds": lease_seconds,
                "lease_expires_at": time.time() + lease_seconds,
            }
            self._restart_timer(lease_seconds)
            success = True
        else:
            success = False
        return success, self.state

To support early release, the acceptor gets a small on_release method. It looks at the ballot number in the release message; if it matches the currently accepted ballot, the acceptor cancels its timer and drops the lease immediately:

def on_release(self, proposal_id):
    with self._lock:
        if self.state.accepted_n == proposal_id:
            if self._lease_timer is not None:
                self._lease_timer.cancel()
                self._lease_timer = None
            self.state.accepted_n = None
            self.state.accepted_value = None
            return True, self.state
        return False, self.state

Most of the interesting logic lives in the proposer. It still generates proposal ids in its own "lane" (node_id + k*256), runs prepare and propose over HTTP, and retries once with a bumped ballot if prepare fails. What's new is that the proposer now runs two timers: a main expiry timer, and an extension timer that fires halfway through the remaining lease. When the extension timer fires, the proposer calls acquire_lease(extend_existing=True), which runs PaxosLease again but treats "majority of acceptors that either have no lease or still believe I'm the owner" as sufficient to proceed.

On successful acquire or extend, _start_local_lease_timer() sets lease_expires_at, starts the main expiry timer, and also starts the extension timer:

def _start_local_lease_timer(self, lease_seconds):
    self._cancel_local_lease_timer()
    with self._lock:
        self.state.lease_expires_at = time.time() + lease_seconds
        expires_at = self.state.lease_expires_at
    self._lease_timer = threading.Timer(lease_seconds, self._on_local_lease_timeout)
    self._lease_timer.daemon = True
    self._lease_timer.start()
    # extension at half the remaining time
    self._cancel_extend_timer()
    remaining = expires_at - time.time()
    if remaining > 0:
        extend_after = remaining / 2.0
        self._extend_timer = threading.Timer(extend_after, self._on_extend_timer)
        self._extend_timer.daemon = True
        self._extend_timer.start()

The /start endpoint still just calls proposer.acquire_lease(). The new /stop endpoint calls proposer.release_lease(), which sets lease_owner = False, cancels both timers, and sends out release messages with the current proposal_id. That's it for this toy implementation: press /start to become master with auto-renew, press /stop to resign.

Sample trace

Let's walk through a concrete trace with three nodes, 0, 1 and 2, so a majority is any 2. We'll assume the base lease time is LEASE_SECONDS = 5.0 and that each message takes 500 ms, just like in the earlier post.

At time t = 0, we call /start on node 0. The proposer increments its proposal_id and sends prepare messages to all three acceptors. They receive them at t = 0.5s, find their promised_n empty, and all promise. The responses arrive back at node 0 at t = 1.0s. Node 0 sees that a majority (in fact, all three) have no current lease, so it starts its local lease timer until t = 6.0s and schedules an extension timer at half that time. It then sends propose messages with owner=0, lease_seconds=5 to all acceptors.

The acceptors get the propose calls at t = 1.5s. Each one accepts the lease, sets accepted_value.owner = 0, and starts its own timer to expire at t = 6.5s. The acceptor always starts its timer after the proposer, since it's conditional on receiving a message from the proposer; this point is crucial for correctness. Their propose responses reach node 0 at t = 2.0s. Node 0 now has a majority of accepts and sets lease_owner = True. From t = 2.0s onwards, node 0 is the master. Halfway through the lease, at t = 3.5s, its extension timer fires and it tries to renew. The next prepare round will see that a majority of acceptors believe that node 0 holds the lease. Because this is an extension (extend_existing=True), that's good enough: node 0 proposes itself again, acceptors refresh their timers, and node 0 extends its mastership without any gap.

Releasing the lease early looks similar but runs in reverse. Suppose the client calls /stop on node 0 at t = 4.0s. The proposer flips its local state to "no lease," cancels both timers, and sends release(proposal_id) to all acceptors. Each acceptor checks whether its accepted_n matches that proposal_id. If it does, it cancels its own timer and clears accepted_value. Within one network hop the cluster returns to the "no one has the lease" state, and another node can call /start to take over. If node 0 had crashed instead of calling /stop or its release messages don't reach the other nodes for some reason, the lease would simply expire based on the acceptor's timers, and there may be a small window where no new lease can be elected. That small gap is the price you pay for never having overlapping leases while still staying diskless. The point is that with PaxosLease there can never be a situation where two different nodes believe they hold the lease — but there can be a situation where no alive node holds the lease and for a short time nobody can acquire the lease, because the old lease owner had died without releasing the lease.

Why PaxosLease works

The intuitive picture from the paper is simple and powerful: the proposer starts its timer before sending out propose requests, while acceptors only start their timers after accepting the proposal and just before replying. That means the proposer's lease interval [t_start, t_end = t_start + T] is always "ahead of" the acceptors' intervals — acceptors are a bit late to the party. If a majority of acceptors have committed to a particular lease and started their timers, the algorithm guarantees that no other proposer can collect a conflicting majority before the original proposer's timer expires. At most one proposer ever believes it has the lease at any moment.

The proof in the paper splits this into two cases: a competing proposer with a lower ballot number, and a competing proposer with a higher ballot number. For a lower ballot b' < b, any majority that would let the new proposer hold the lease must intersect the majority that previously accepted the old proposer's lease. For an acceptor in that intersection to send an "empty" reply, its timer must have expired, which implies the original proposer's timer has also expired — so the interval where both proposers think they hold the lease is always empty. For a higher ballot b' > b, a similar majority-intersection argument shows that any acceptor that both accepted the old lease and later replies "empty" to the new prepare must have let its timer expire, which again implies the old lease has ended before the new one can logically begin. Either way, leases never overlap.

Liveness is the usual story with Paxos-type algorithms. There's a risk of dynamic deadlock: two over-eager proposers might keep generating higher ballot numbers, spamming prepare requests, and continually stepping on each other's toes, so neither manages to get a majority to accept their proposal. The fix is also the usual one: introduce backoff and randomness so proposers wait a small random delay before retrying, and in practice one of them will "win" and stabilize. This is not included in this toy implementation. The good news is that static deadlocks — the classic "naive majority vote" failure mode where two disjoint majorities lock in two different leaders forever — cannot happen here. Because acceptors can overwrite their own state with higher ballot numbers, and because of the majority intersection property, any old lease can be superseded cleanly once its timer is done.

Conclusion

With extension and release, this toy PaxosLease implementation looks much closer to how you'd actually use it in a real system. A master node can now hold on to its role for long continuous periods, without rolling the consensus dice every few seconds, and can also resign quickly so that another node can take over. All of this still happens with no disk persistence on acceptors, just in-memory timers and a simple "wait M seconds after restart" rule to cover crashes.

In ScalienDB and Keyspace, something like this sat alongside a more traditional Multi-Paxos pipeline. One part of the system agreed on a replicated log of database commands; another part, via PaxosLease, agreed on who the current master was. The nice thing about PaxosLease is that once you understand Paxos, there's very little new magic: it's the same quorum and ballot machinery, but applied to time-bounded truth instead of eternal truth. In a future post I'll go back to the Multi-Paxos side and connect it with PaxosLease based master leases.