Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Redlock

Redlock is Redis’s distributed locking algorithm. This spec intentionally finds Martin Kleppmann’s counterexample: when a client pauses (GC, network delay) while holding locks, the locks expire, another client acquires them, and both clients believe they hold the lock.

The spec

Source

module Redlock

// Redlock distributed locking — modeled to find Kleppmann's counterexample
// From: Antirez, "Distributed locks with Redis" (2016)
// Analysis: Kleppmann, "How to do distributed locking" (2016)
// N is the last Redis instance index (N+1 instances, 0..N)
// M is the last client index (M+1 clients, 0..M)
// TTL is the lock time-to-live in ticks
// MaxTime bounds the global clock to keep state space finite
// Use: specl check redlock.specl -c N=2 -c M=1 -c TTL=3 -c MaxTime=8
// Verified: N=2 M=1 TTL=3 MaxTime=8 -> MutualExclusion VIOLATED in 15 steps
// The counterexample demonstrates Kleppmann's attack: client 0 acquires majority,
// time advances (locks expire), client 1 acquires majority, both enter Holding

const N: Int
const M: Int
const TTL: Int
const MaxTime: Int

// Global time (discrete ticks, models real/wall-clock time)
var globalTime: Int

// Per client: local clock (may lag behind global time due to pauses)
var localClock: Dict[Int, Int]

// Per client: state 0=Idle, 1=Acquiring, 2=Holding, 3=Released
var clientState: Dict[Int, Int]

// Per client: which instances the client has locked
var clientLocks: Dict[Int, Set[Int]]

// Per client: local clock value at time of acquisition start
var acquireTime: Dict[Int, Int]

// Per client: next instance index to try locking (sequential acquisition)
var nextInstance: Dict[Int, Int]

// Per instance: locked by which client (-1 = unlocked)
var instanceLock: Dict[Int, Int]

// Per instance: global time at which lock expires
var instanceExpiry: Dict[Int, Int]

func Majority() { (N + 1) / 2 + 1 }

init {
    globalTime = 0;
    localClock = {c: 0 for c in 0..M};
    clientState = {c: 0 for c in 0..M};
    clientLocks = {c: {} for c in 0..M};
    acquireTime = {c: 0 for c in 0..M};
    nextInstance = {c: 0 for c in 0..M};
    instanceLock = {i: -1 for i in 0..N};
    instanceExpiry = {i: 0 for i in 0..N};
}

// --- Time modeling ---

// Global time advances by one tick
action Tick() {
    require globalTime < MaxTime;
    globalTime = globalTime + 1;
}

// Client c's local clock advances (catches up toward global time)
// A client that never fires ClockAdvance is effectively paused
action ClockAdvance(c: 0..M) {
    require localClock[c] < globalTime;
    localClock = localClock | {c: localClock[c] + 1};
}

// --- Lock expiry ---

// Instance i's lock expires (TTL exceeded in real/global time)
action ExpireLock(i: 0..N) {
    require instanceLock[i] != -1;
    require globalTime >= instanceExpiry[i];
    instanceLock = instanceLock | {i: -1};
}

// --- Client lock acquisition ---

// Client c starts acquiring the lock
action StartAcquire(c: 0..M) {
    require clientState[c] == 0;
    clientState = clientState | {c: 1};
    acquireTime = acquireTime | {c: localClock[c]};
    clientLocks = clientLocks | {c: {}};
    nextInstance = nextInstance | {c: 0};
}

// Client c successfully locks instance i (SET NX equivalent)
action TryLock(c: 0..M, i: 0..N) {
    require clientState[c] == 1;
    require nextInstance[c] == i;
    require instanceLock[i] == -1;
    instanceLock = instanceLock | {i: c};
    instanceExpiry = instanceExpiry | {i: globalTime + TTL};
    clientLocks = clientLocks | {c: clientLocks[c] union {i}};
    nextInstance = nextInstance | {c: i + 1};
}

// Client c fails to lock instance i (already locked by another client)
action FailLock(c: 0..M, i: 0..N) {
    require clientState[c] == 1;
    require nextInstance[c] == i;
    require instanceLock[i] != -1;
    require instanceLock[i] != c;
    nextInstance = nextInstance | {c: i + 1};
}

// Client c finishes acquisition: majority obtained and validity remaining
// Validity check uses LOCAL clock (this is the source of the bug)
action FinishAcquire(c: 0..M) {
    require clientState[c] == 1;
    require nextInstance[c] > N;
    require len(clientLocks[c]) >= Majority();
    require localClock[c] - acquireTime[c] < TTL;
    clientState = clientState | {c: 2};
}

// Client c fails acquisition: unlock all instances, return to Idle
action FailAcquire(c: 0..M) {
    require clientState[c] == 1;
    require nextInstance[c] > N;
    require len(clientLocks[c]) < Majority()
        or localClock[c] - acquireTime[c] >= TTL
    clientState = clientState | {c: 0};
    clientLocks = clientLocks | {c: {}};
    instanceLock = {i:
        if instanceLock[i] == c then -1 else instanceLock[i]
        for i in 0..N}
}

// Client c releases the lock (DEL on all instances if value matches)
action ReleaseLock(c: 0..M) {
    require clientState[c] == 2;
    clientState = clientState | {c: 3};
    instanceLock = {i:
        if instanceLock[i] == c then -1 else instanceLock[i]
        for i in 0..N}
}

// --- Safety property ---

// Mutual exclusion: at most one client in Holding state
// This SHOULD be VIOLATED — the checker finds Kleppmann's counterexample
invariant MutualExclusion {
    all c1 in 0..M: all c2 in 0..M:
        (clientState[c1] == 2 and clientState[c2] == 2)
        implies c1 == c2
}

What to notice

  • Intentional violation. Unlike other showcase specs, this one is designed to fail. The MutualExclusion invariant is violated — that’s the point.
  • Time modelling. Global time advances via Tick(), client local clocks advance via ClockAdvance(). A client that never fires ClockAdvance is effectively paused — modelling GC pauses or network delays.
  • The bug. Client checks lock validity using localClock (line 118), but locks expire based on globalTime. If a client pauses (local clock falls behind global time), its locks expire while it still thinks they’re valid.
  • Set operations. clientLocks[c] union {i} for accumulating acquired locks, conditional dict comprehension for releasing all locks on failure.
specl check redlock.specl -c N=2 -c M=1 -c TTL=3 -c MaxTime=8

The checker finds the violation in 15 steps.