1 / 24

Paxos Made Simple

Paxos Made Simple. Jinghe Zhang. Introduction. Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system: No master for issuing locks. Failures. Problem.

heath
Download Presentation

Paxos Made Simple

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Paxos Made Simple Jinghe Zhang

  2. Introduction • Lock is the easiest way to manage concurrency • Mutex and semaphore. • Read and write locks. • In distributed system: • No master for issuing locks. • Failures.

  3. Problem • How to reach consensus/data consistency in distributed system that can tolerate non-malicious failures?

  4. Requirements • Safety • Only a value that has been proposed may be chosen. • Only a single value is chosen. • A node never learns that a value has been chosen unless it actually has been. • Liveness • Some proposed value is eventually chosen. • If a value has been chosen, a node can eventually learn the value

  5. Paxos Properties • Paxos is an asynchronous consensus algorithm • Asynchronous networks • No common clocks or shared notion of time (local ideas of time are fine, but different processes may have very different “clocks”) • No way to know how long a message will take to get from A to B

  6. Paxos Properties • Paxos is guaranteed safe. • Consensus is a stable property: once reached it is never violated; the agreed value is not changed.

  7. Paxos Properties • Paxos is not guaranteed live. • Consensus is reached if “a large enough subnetwork...is non-faulty for a long enough time.” • Otherwise Paxos might never terminate.

  8. Paxos Algorithm • Key Assumptions: • Set of processes that run Paxos is known a-priori • Processes suffer crash failures • All processes have Greek names (but translate as “Fred”, “Cynthia”, “Nancy”…)

  9. Paxos Algorithm • 3 roles • proposer • acceptor • Learner • A node can act as more than one clients (usually 3). • 2 phases • Phase 1: Prepare request  Response • Phase 2: Accept request  Response

  10. Phase 1: (prepare request) (1) A proposer chooses a new proposal version number n , and sends a prepare request (“prepare”,n) to a majority of acceptors: (a) Can I make a proposal with number n ? (b) if yes, do you suggest some value for my proposal?

  11. Phase 1: (prepare request) (2) If an acceptor receives a prepare request (“prepare”, n) with n greater than that of any prepare request it has already responded, sends out (“ack”, n, n’, v’) or (“ack”, n,  , ) (a) responds with a promise not to accept any more proposals numbered less than n. (b) suggest the value v of the highest-number proposal that it has accepted if any, else 

  12. Phase 2: (accept request) (3) If the proposer receives responses from a majority of the acceptors, then it can issue an accept request (“accept”, n , v) with number n and value v: (a) n is the number that appears in the prepare request. (b) v is the value of the highest-numbered proposal among the responses

  13. Phase 2: (accept request) (4) If the acceptor receives an accept request (“accept”, n , v) , it accepts the proposal unless it has already responded to a prepare request having a number greater than n.

  14. Learning the decision • Obvious algorithm: whenever acceptor accepts a proposal, respond to all learners (“accept”, n, v). • No Byzantine-Failures: Acceptors informs a distinguished learner and let the distinguished learner broadcast the result. • Learner receives (“accept”, n, v) from a majority of acceptors, decides v, and sends (“decide”, v) to all other learners. • Learners receive (“decide”, v), decide v

  15. In Well-Behaved Runs 1 1 1 1 1 2 2 2 (“prepare”,1) (“accept”,1 ,v1) . . . . . . . . . (“ack”,1, ^, ^) n n n (“accept”,1 ,v1) 1: proposer 1-n: acceptors 1-n: learners decide v1

  16. Paxos is safe… • Intuition: • If a proposal with value v is decided, then every higher-numbered proposal issued by any proposer has value v. next prepare request with Proposal Number n+1 (what if n+k?) A majority of acceptors accept (n, v), v is decided

  17. Safety (proof) • Suppose (n, v)is the earliest proposal that passed. If none, safety holds. • Let (n’, v’) be the earliest issued proposal after (n, v) with a different value v’!=v • As (n’, v’) passed, it requires a major of acceptors. Thus, some process approve both (n, v) and (n’, v’), though it will suggest value v with version number k>= n. • As (n’, v’) passed, it must receive a response (“ack”, n’, j, v’) to its prepare request, with n<j<n’. Consider (j, v’) we get the contradiction.

  18. Liveness • Fischer-Lynch-Patterson (1985) • No consensus can be guaranteed in an asynchronous communication system in the presence of any failures. • Intuition: a “failed” process may just be slow, and can rise from the dead at exactly the wrong time.

  19. Liveness • FLP tells us that it is impossible for an asynchronous system to agree on anything with accuracy and liveness! • Liveness requires that agents are free to accept different values in subsequent rounds. • But: safety requires that once some round succeeds, no subsequent round can change it.

  20. Liveness(cont.) • Paper gives us a scenario with 2 proposers, and during the scenario no decision can be made. • As the paper points out, selecting a distinguished proposer will solve the problem. • “Leader election” • But Paxos doesn’t block in case of a lost message • Phase 1 can start with new rank even if previous attempts never ended

  21. Applications • Chubby lock service. • Petal: Distributed virtual disks. • Frangipani: A scalable distributed file system.

  22. Summary • Consensus is “impossible” • But this doesn’t turn out to be a big obstacle • We can achieve consensus with probability one in many situations • Paxos is an example of a consensus protocol, very simple

  23. If you are interested... • Lamport, Leslie (May 1998). "The Part-Time Parliament". ACM Transactions on Computer Systems16 (2): 133–169. (http://research.microsoft.com/users/lamport/pubs/lamport-paxos.pdf)

  24. Questions? Jenkins, if I want another yes-man, I’ll build one!

More Related