Paxos Commit Leslie Lamport Jim Gray Microsoft Research Preview of a paper in preparation Presented at HPTS 12 Oct 2003, Asilomar, Ca.
RequestCommit Prepare Prepare Prepare Prepare Prepare Prepare Prepare Commit working Resource Manager Transaction Manager prepared Prepare Prepare Prepare Prepared committed aborted working committed aborted Two Phase Commit • N Resource Managers (RMs) • Want all RMs to commit or all abort. • Coordinated by Transaction Manager (TM)TM sends Prepare, Commit-Abort • RM responds Prepared, Aborted • 3N+1 messages • N+1 stable writes • Delay • 3 message • 2 stable write • Blocking: if TM fails, Commit-Abort stalls
Consensus • N processes want to agree on a value • Want to tolerate F faults • Tolerate F processes stopping • Tolerate F Messages delayed or lost • If there are less than F faults in a windowThen consensus achieved. • Byzantine faults need 3F “acceptors” • Benign faults need 2F+1 “acceptors”stalls but safe if more than F faults
Group has a leader known to all leader election is a subroutine Process proposes a value v to leader. Leader sends proposal (phase 2) (ballot, value) to all acceptors Acceptors respond with:max(ballot, value)they have seen If leader gets no higher ballot, and gets at least F+1 responses then leader can announce (ballot, value) Full protocol 3-phase Phase 1: Leader starts new ballot Phase 2 Leader proposes value Phase 3 If value accepted by F+1 then value is accepted. If not, leader tries to get majority value accepted. Paxos Consensus 6F+4 messages, F+1 stable writes 4 message delays and 2 stable writes
Commit Leader Acceptors 0…2F RM0…N RM0 request commit prepare prepared all prepared commit Paxos Commit • Obvious idea: Have TM use Paxos consensus of RMs prepared • More efficient idea: • 2F+1 acceptors (~2F+1 TMs) • Each RM leads a Paxos on: I’m Prepared. • If F+1 acceptors see all RMs prepared, then transaction committed. • 2F(N+1) + 3N + 1 messages5 message delays (one extra delay)2 stable write delays. • == 2PC when F=0
Request Commit Prepare All Prepared Prepared Prepared Prepared Commit working Resource Managers prepared working committed aborted working AllPrepared aborted committed aborted Paxos Commit (success case) Acceptors Commit Leader
Prepare to Commit Jim Gray Microsoft Research Defense of Commit at HPTS Asilomar Oct 2003
2PC • Atomicity – all or nothing • Isolation – no concurrency anomalies • Durability – state survives failures • Reliability / Consistency – does right thing • Availability: always up
I can do better • Those 2PC wimps are • Stupid – they do not understand my app • Fascists – the force me to send messages • I can do better • I can write async code • I can keep logs • I can deal with failures and complexities • Indeed, this is my destinya full employment act
Commit • KISS • Simple fault / failure model • It is hard to get these “optimizations” right. • But you want availability… • OK… • No 2PC just C
2PC Commit • Availability: always up • Atomicity – all or nothing • Isolation – no concurrency anomalies • Durability – state survives failures • Reliability / Consistency – does right thing • => 2PC++ = 3PC = Non Blocking Commit Solves the availability problem
Commit -- AACID • Atomicity – all or nothing • Availability: always up • Isolation – no concurrency anomalies • Durability – state survives failures • Reliability / Consistency – does right thing Workflow still useful. But it is built on transactions.