1 / 36

Timeliness, Failure Detectors, and Consensus Performance

This research paper explores the concepts of timeliness, failure detectors, and consensus performance in the context of replication state machine replication. It examines different models, such as synchronous, asynchronous, and eventually stable, and discusses the impact of weak assumptions on algorithm performance.

lbertram
Download Presentation

Timeliness, Failure Detectors, and Consensus Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Timeliness, Failure Detectors,and Consensus Performance Alex Shraer Joint work with Dr. Idit Keidar Technion – Israel Institute of Technology In PODC 2006

  2. How do you survive failures and achieve high availability?

  3. Replication

  4. State Machine Replication • Replicas are identical deterministic state machines • Process operations in the same order  remain consistent a a a b b c

  5. Consensus • Building block for state machine replication • Each process has an input, should decide on an output so that– Agreement: decisions are the same Validity: decision is input of one process Termination: eventually all correct processes decide

  6. Basic Model • Message passing • Links between every pair of processes • do not create, duplicate or alter messages (integrity) • Process and link failures

  7. Synchronous Model • Known bound Δ on message delay, processing • Very convenient for algorithms • Requires very conservative timeouts • in practice: avg. latency < • Computation might be too sloooow! max. latency 100 [Cardwell, Savage, Anderson 2000], [Bakr-Keidar 2002]

  8. Asynchronous Model • Unbounded message delay • Much more practical Fault-tolerant consensus impossible [FLP85]

  9. Eventually Stable (Indulgent) Models • Initially asynchronous • for unbounded period of time • Eventually reach stabilization • GST (Global Stabilization Time) • following GST certain assumptions hold • Examples • ES (Eventual Synchrony) – starting from GST all links have a bound on message delay [Dwork, Lynch, Stockmeyer 88] • failure detectors • Example: W (leader) failure detector • Outputs one trusted process • From some point, all correct processes trust the same correct process [Chandra, Toueg 96], [Chandra, Hadzilacos, Toueg 96]

  10. Indulgent Models: Research Trend • Weaken post-GST assumptions as much as possible [Guerraoui, Schiper96], [Aguilera et al. 03, 04], [Malkhi et al. 05] Weaker = better?

  11. You only need ONE machine with eventually ONE timely link. Buy the hardware to ensure it, set the timeout accordingly, and EVERYTHING WILL WORK. Indulgent Models: Research Trend

  12. Why isn’t anything happening ??? Network Don’t worry! It will eventually happen! Consensus with Weak Assumptions

  13. Network Consensus with Weak Assumptions

  14. What’s Going On? • In practice, bounds just need to hold “long enough” for the algorithm (TA) to finish • But TA depends on our synchrony assumptions • with weak assumptions, TA might be unbounded • For practical systems, eventual completion of the job is not enough!

  15. Our Goal • Understand the relationship between: • assumptions (1 timely link, failure detectors, etc.) that eventually hold • performance of algorithms that exploit these assumptions, and only them • Challenge: How do we understand the performance of asynchronous algorithms that make very different assumptions?

  16. Typical Metric: Count “Rounds” • Algorithms normally progress in rounds, though rounds are not synchronized among processes at process pi: forever do send messages receive messages while (!some conditions) compute… • Previous work: • look at synchronous runs (every message takes exactly  time) • count rounds or “s” [Keidar, Rajsbaum 01], [Dutta, Guerraoui 02], [Guerraoui, Raynal 04] [Dutta et al. 03], etc.

  17. Are All “Rounds” the Same? • Algorithm 1 waits for messages from a majority that includes a pre-defined leader in each round • takes 3 rounds • Algorithm 2 waits for messages from all (unsuspected) processes in each round • E.g., group membership • takes 2 rounds

  18. Do All Rounds Cost the Same? LAN Market Oranges $1.00 Apples $1.00

  19. Do All “Rounds” Cost the Same? • On the Internet, n2 timely links can be a rarity, [Bakr, Keidar 02] • Timely communication • with leader • with majority require timeouts orders of magnitude smaller WAN Market Oranges $100.00 Apples $1.00

  20. GIRAFGeneral Round-based Algorithm Framework • Inspired by Gafni’s RRFD, generalizes it • Organize algorithms into rounds • Separate algorithm logic from waiting condition • Waiting condition defines model • Allows reasoning about lower and upper bounds for rounds of different types

  21. Defining Properties in GIRAF • Environment can have • perpetual properties • eventual properties • In every run r, there exists a round GSR(r) • GSR(r) – the first round from which: • no process fails • all eventual properties hold in each round

  22. Defining Properties • Timeliness of incoming, outgoing and bidirectional links. • Some known failure detector properties • Use properties to clearly define models

  23. Some Results: Context • Consensus problem • Global decision time metric • Time until all correct processes decide • Message passing • Crash failures • t < n/2 potential failures out of n>1 processes

  24. ◊LM Model: Leader and Majority • Nothing required before GSR • In every round k ≥ GSR • Every correct process receives a round k message from a majority of processes, one of which is the Ω-leader. • Practically requires much shorter timeouts than Eventual Synchrony[Bakr, Keidar]

  25. ◊LM: Previous Work • Most Ω-based algorithms wait for majority in each round (not ◊LM) • Paxos [Lamport 98] works for ◊LM • Takes constant number of rounds in Eventual Synchrony (ES) • But how many rounds without ES?

  26. 5 . . . 20 Paxos Run in ES ΩLeader (“prepare”,2) (“prepare”,21) (Commit, 21, v1) yes 21 1 2 21 20 21 21 yes 21 21 5 21 . . . . . . no . . . . . . 21 21 20 21 (Commit, 21,v1) BallotNum number of attempts to decide initiated by leaders decide v1

  27. ok ok 1 2 9 no (5) ok 5 9 5 no (8) no (13) 8 8 9 13 13 13 20 20 20 Paxos in ◊LM (w/out ES) ΩLeader Commit takes O(n) rounds! (“prepare”,2) (“prepare”,9) (“prepare”,14) 2 9 BallotNum GSR GSR+1 GSR+2 GSR+3

  28. What Can We Hope For? • Tight lower bound for ES: 3 rounds from GSR [DGK05] • ◊LM weaker than ES • One might expect it to take a longer time in ◊LM than in ES

  29. Result 1: Don't Need ES • Leader and majority can give you the same performance! • Algorithm that matches lower bound for ES!

  30. Our ◊LM Algorithm in a Nutshell • Commit with increasing ballot numbers, decide on value committed by majority • like Paxos, etc. • Challenge: Don’t know all ballots, how to choose the new one to be highest one? • Solution: Choose it to be the round number • Challenge: rounds are wasted if a prepare/commit fails. • Solution: pipeline prepares and commits: try in each round • Challenge: do they really need to say no? • Solution: support leader’s prepare even if have a higher ballot number • challenge: higher number may reflect later decision! Won’t agreement be compromised? • solution: new field “trustMe” ensures supported leader doesn't miss real decisions

  31. All DECIDE 1 5 8 101 101 8 8 101 101 13 13 101 101 20 13 101 101 20 101 101 Example Run: GSR=100 <PREPARE, …, trustMe> All PREPARE with !trustMe All COMMIT ΩLeader Did not lead to decision Rounds: GSR GSR+1 GSR+2

  32. Question 2: ◊S and Ω Equivalent? • ◊S and Ω equivalent in the “classical” sense [Chandra, Hadzilacos, Toueg 96] • Weakest for consensus • ◊S: eventually (from GSR onward), • all faulty processes are suspected by every correct process • there exists one correct process that is not suspected by any correct process. • Can we substitute Ω with ◊S in ◊LM?

  33. Result 2: ◊S and Ω not that Equivalent • Consensus takes linear time from GSR • By reduction to mobile failure model[Santoro, Widmayer 89]

  34. Result 3: Do We Need Oracles? • Timely communication with majority suffices! • ◊AFM (All-From-Majority) simplified: • In every round k ≥ GSR, every correct process p receives round k message from a majority of processes, and p’s message reaches a majority of processes. • Decision in 5 rounds from GSR • 1st constant time algorithm w/out oracle or ES • idea: information passes to all nodes in 2 rounds

  35. Result 4: Can We Assume Less? • ◊MFM: Majority from Majority • The rest receive a message from a minority • Only a little missing for ◊AFM • Stronger than models in literature[Aguilera et al. 03, 04], [Malkhi et al. 05] • Bounded time from GSR impossible!

  36. Conclusions • Which guarantees should one implement ? • weaker ≠ better • some previously suggested assumptions are too weak • sometimes a little stronger = much better • worth longer timeouts / better hardware • ES is not essential • not worth longer timeouts / better hardware • future: more models, bounds to explore • GIRAF

More Related