1 / 19

Paxos

Paxos. Lamport the archeologist and the “Part-time Parliament” of Paxos : The Part-time Parliament, TOCS 1998 Paxos Made Simple, ACM SIGACT News 2001. Paxos Made Live, PODC 2007 Paxos Made Moderately Complex, (Cornell) 2011. …….

alcina
Download Presentation

Paxos

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Paxos • Lamport the archeologist and the “Part-time Parliament” of Paxos: • The Part-time Parliament, TOCS 1998 • Paxos Made Simple, ACM SIGACT News 2001. • Paxos Made Live, PODC 2007 • Paxos Made Moderately Complex, (Cornell) 2011. • …….. CS 271

  2. The Paxos Atomic Broadcast AlgorithmThanks to Idit Keidar for slides • Asynchronous system with crash failures. • Leader based: each process has an estimate of who is the current leader • To order an operation, a process sends it to current leader • The leader sequences the operation and launches a Consensus algorithm to fix the agreement CS 271

  3. The Consensus Algorithm Structure • Two phases • Leader contacts a majority in each phase • There may be multiple concurrent leaders • Ballots distinguish among values proposed by different leaders • Unique, locally monotonically increasing • Processes respond only to leader with highest ballot seen so far CS 271

  4. Ballot Numbers • Pairs num, process id • n1, p1 > n2, p2 • If n1 > n2 • Or n1=n2 and p1 > p2 • Leader p chooses a unique, locally monotonically increasing ballot number • If latest known ballot is n, q then • p chooses n+1, p CS 271

  5. The Two Phases of Paxos • Phase 1: prepare • If you believe you are the leader • Choose new unique ballot number • Learn outcome of all smaller ballots from majority • Phase 2: accept • Leaderproposes a value with its ballot number • Leader gets majority to acceptits proposal • A value accepted by a majority can be decided CS 271

  6. Paxos - Variables BallotNumi, initially 0,0 Latest ballot pi took part in (phase 1) AcceptNumi, initially 0,0 Latest ballot piaccepted a value in (phase 2) AcceptVali, initially ^ Latest accepted value (phase 2) CS 271

  7. Phase I: Prepare - Leader • Periodically, until decision is reached do: ifleaderthen BallotNum BallotNum.num+1, myId send(“prepare”, BallotNum) to all • Goal: contact other processes, ask them to join this ballot, and get information about possible past decisions CS 271

  8. Phase I: Prepare - Cohort • Upon receive (“prepare”, bal) from i ifbal  BallotNum then BallotNum  bal send (“ack”, bal, AcceptNum, AcceptVal) to i This is a higher ballot than my current, I better join it This is a promise not to accept ballots smaller than bal in the future Tell the leader about my latest accepted value and what ballot it was accepted in CS 271

  9. Phase II: Accept - Leader Upon receive (“ack”, BallotNum, b, val) from majority if all vals = ^ then myVal = initial value else myVal = received val with highest b send (“accept”, BallotNum, myVal) to all /* proposal */ The value accepted in the highest ballot might have been decided, I better propose this value CS 271

  10. Phase II: Accept - Cohort Upon receive (“accept”, b, v) ifb  BallotNum then AcceptNum  b; AcceptVal  v /* accept proposal */ send (“accept”, b, v) to all(first time only) This is not from an old ballot CS 271

  11. Paxos – Deciding Upon receive(“accept”, b, v) from n-t decide v periodically send (“decide”, v) to all Upon receive (“decide”, v) decide v CS 271

  12. In Failure-Free Execution (“prepare”, 1,1) (“accept”, 1,1 ,v1) 1 1 1 1 1 2 2 2 . . . . . . . . . (“ack”, 1,1, 0,0,^) n n n (“accept”, 1,1,v1) decide v1 CS 271

  13. Performance? Why is this phase needed? (“prepare”, 1,1) (“accept”, 1,1 ,v1) 1 1 1 1 1 2 2 2 . . . . . . . . . (“ack”, 1,1, 0,0,^) n n n (“accept”, 1,1,v1) CS 271

  14. Failure-Free Execution C C request response S1 S1 S1 S1 S1 S2 S2 S2 . . . . . . (“prepare”) . . . (“ack”) (“accept”) Sn Sn Sn Phase 1 Phase 2 CS 271

  15. Observation • In Phase 1, no consensus values are sent: • Leader chooses largest unique ballot number • Gets a majority to “vote” for this ballot number • Learns the outcome of all smaller ballots • In Phase 2, leader proposes its own initial value or latest value it learned in Phase 1 CS 271

  16. Failure free execution C C request response S1 S1 S1 S1 S1 S1 S2 S2 S2 (“prepare”) . . . (“ack”) . . . . . . (“accept”) Sn Sn Sn Phase 1 Phase 2 CS 271

  17. Optimization • Run Phase 1 only when the leader changes • Phase 1 is called “view change” or “recovery mode” • Phase 2 is the “normal mode” • Each message includes BallotNum (from the last Phase 1) and ReqNum • Respond only to messages with the “right” BallotNum CS 271

  18. Paxos Atomic Broadcast: Normal Mode Upon receive(“request”, v) from client if (I am not the leader) then forward to leader else /* propose v as request number n */ ReqNum  ReqNum +1; send (“accept”, BallotNum , ReqNum, v) to all Upon receive(“accept”, b, n, v) with b = BallotNum /* accept proposal for request number n */ AcceptNum[n]  b; AcceptVal[n]  v send (“accept”, b, n, v) to all(first time only) CS 271

  19. Recovery Mode • The new leader must learn the outcome of all the pending requests that have smaller BallotNums • The “ack” messages include AcceptNums and AcceptVals of all pending requests • For all pending requests, the leader sends “accept” messages • What if there are holes? • e.g., leader learns of request number 13 and not of 12 • fill in the gaps with dummy “do nothing” requests CS 271

More Related