1 / 45

When Is Agreement Possible? CS 188 Distributed Systems February 24, 2015

When Is Agreement Possible? CS 188 Distributed Systems February 24, 2015. Introduction. Basics of agreement protocols Impossibility of agreement in asynchronous system with failures When is agreement possible?. Basics of Agreement Protocols. What is agreement?

Download Presentation

When Is Agreement Possible? CS 188 Distributed Systems February 24, 2015

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. When Is Agreement Possible?CS 188Distributed SystemsFebruary 24, 2015

  2. Introduction • Basics of agreement protocols • Impossibility of agreement in asynchronous system with failures • When is agreement possible?

  3. Basics of Agreement Protocols • What is agreement? • What are the necessary conditions for agreement?

  4. What Do We Mean By Agreement? • In simplest case, can n processors agree that a variable takes on value 0 or 1? • Only non-faulty processors need agree • More complex agreements can be built from this simple agreement

  5. Conditions for Agreement Protocols • Consistency • All participants agree on same value and decisions are final • Validity • Participants agree on a value at least one of them wanted • Termination • All participants choose a value in a finite number of steps

  6. Impossibility of Agreement in Async System With Failures • Assume a reliable, but asynchronous, message passing system • Any message may face arbitrary delays • Can a set of processors reach agreement if one of the processors fails?

  7. Agreement Isn’t Always Possible • In the general case for arbitrary systems • Adding some special properties to the system may change that result • But without those properties, provably impossible • A result sometimes abbreviated FLP • For Fischer, Lynch, and Patterson, who proved it

  8. Model of the System • The system consists of n processors • The goal is for all non-faulty processors to agree on value 0 or 1 • Rule out the trivial case of always agreeing on 0 (or 1) • Agreement depends on protocol, initial state, and inputs to each processor

  9. Bivalent and Univalent States • A bivalent state is a system state that could lead to either value being decided • A univalent state can only lead to one of the values being decided • 0-valent or 1-valent • Valency must take allowable failures into account!

  10. System Configuration • Processors have internal state • State of network is the set of messages sent, but not yet received • Event e is the receipt of message m by a processor • Which can lead to sending one or more new messages • Events are deterministic • A schedule is a sequence of events

  11. Proving the Result • Let’s assume the result is false • That we can reach agreement with one failure in these conditions • Use an adversarial model • Within rules of behavior, assume adversary can force any legal event • Look for contradictions

  12. What Can the Adversary Do? • Force any processor to perform an event at any moment • Choose any message to be delivered to any processor when it requests a message • Delay any message arbitrarily long • Once, it can kill one processor permanently

  13. The Necessity of Bivalency • There has to be an initial bivalent configuration for the system • Why? • If all processors started with value 1, the system would decide 1 • If all processors started with value 0, the system would decide 0

  14. Intermediate Initial States • If some processors start with value 0 and some with value 1 • Some initial states lead to result 1 • Some initial states lead to result 0 • All initial states lead to one or the other • So there is a 1-valent initial state that differs from a 0-valent initial state by one processor’s initial value

  15. Node 1:0 Node 2:1 Node 3: 1 . . . Node N: 1 Node 1:0 Node 2:1 Node 3: 1 . . . Node N: 0 A Graphical Representation What’sin these states? State x State y They differ in only one value 0-valent initial states 1-valent initial states

  16. Why Does This Imply Bivalence? • What if that one differing processor is the processor that fails? • The system must still reach agreement from the remaining states • Which are identical, now • But on what value?

  17. Node 1:0 Node 2:1 Node 3: 1 . . . Node N: 1 Node 1:0 Node 2:1 Node 3: 1 . . . Node N: 0 Is This Possible? Does the system decide on 1? Looks like x and y must be bivalent Does the system decide on 0? State x State y Then State x wasn’t 0-valent, after all Then State y wasn’t 1-valent, after all 0-valent initial states 1-valent initial states

  18. So What? • So there has to be at least one bivalent initial state • Why’s that so bad? • If the system never leaves a bivalent state, it never makes a decision • We must show our adversary can’t perpetually force bivalency

  19. The Persistence of Bivalency • Let’s assume bivalency doesn’t persist • At some point, some bivalent state must transition to a univalent state • Implying at least two events • One to go to 0-valent • One to go to 1-valent • With no events leading to bivalent states

  20. e e’ D D’ A Graphical Representation C Remember, these events are each delivery of a message So m and m’ must have been in the message delivery system state simultaneously

  21. Looking Closely at Events e and e’ • What would happen if we executed e first, then e’? • What would happen if we executed them in the opposite order? • Well, why should I care? • Would executing them in either order lead to the same state? • If so, there’s a contradiction

  22. e e’ D D’ e’ e Order of Events e and e’ C

  23. Why Should They Lead to the Same State? • What if e and e’ occur on different processors? • Then they’re independent events • So they should produce the same result if executed in either order • So e and e’ could not have occurred on different processors

  24. Could the Events Occur on the Same Processor P? • If e was first, the state became 0-valent • If e’ was first, the state became 1-valent • But what if P then fails? • Since the event happened only at P, only P sees the effects • So we’re still in a bivalent state

  25. Recapitulating the Argument • It’s possible to start in a bivalent state • There must be some point at some processor P at which the bivalent state changes to univalent • If P fails before anyone knows the valency, the system becomes bivalent • And can never settle to univalency • Perpetual bivalency implies no agreement

  26. When Is Agreement Possible? • Didn’t we show in the last class that we can reach agreement if less than 1/3 of our processors are faulty? • Yes, but only if the message passing system is synchronous • Whether agreement is possible in a system depends on certain parameters

  27. Parameters for Agreement In Distributed Systems • Synchronous vs. asynchronous processors • Bounded vs. unbounded communications delay • Ordered vs. unordered messages • Point-to-point vs. broadcast communications

  28. Synchronous vs. Asynchronous Processors • Synchronous processors imply that all processors make progress predictably • More precisely, there is a constant s such that • for every s+1 steps taken by Pi • all Pj will take at least one step

  29. Bounded vs. Unbounded Communications Delay • Delay is bounded if and only if all messages arrive at their destination within t steps • Implies no lost messages • Doesn’t imply messages arrive in the order sent

  30. Ordered vs. Unordered Messages • Messages are ordered if they are received in the same real time order as their sending • Using true real time • In some cases, merely receiving all messages in same order at all processors is enough

  31. Point-to-Point vs. Broadcast Communications • Point-to-point communications means a given message sent by Pi is seen only by its destination Pj • Broadcast communications mean that Pi can send a message to all other processors in a single atomic step • Most typically by hardware broadcast

  32. So, When Can We Reach Agreement? • Case 1: Processors are synchronous and communications is bounded • Case 2: Messages are ordered and the transmission medium is broadcast • Case 3: Processors are synchronous and messages are ordered • And that’s it • (Case 1 covers Byzantine agreement)

  33. What Does This Result Mean? • For practical systems we really build • Not that we can never reach agreement • Good systems almost always do • But that we generally can’t guarantee it • Which implies that our systems should tolerate disagreements • At some times • Under some conditions

  34. When Is Disagreement OK? • For preference, when it doesn’t matter • E.g., when reasonable results possible even without agreement • Or when it eventually works itself out • With possible inconsistencies in the meantime • Or, at worst, when it is visible to people who can fix it

  35. When Is Disagreement Not OK? • When the consequences of disagreement are dire • When it results in unfixable problems • When its consequences are invisible, but relevant • Unfortunately, we don’t always get to choose when we can avoid it

  36. Minimizing Chances of Disagreement • Understand when agreement is most critical • In those cases, use protocols that are less likely to fail on agreement • Which usually have heavy expenses • So don’t always use them

  37. A Classification of Faults • More detailed than previously discussed • Produced by fault-tolerant computing community • Divides faults into classes • Stronger class is subset of weaker class

  38. Byzantine Authenticated Byzantine Incorrect Computation Timing Omission Crash Fail Stop An Ordered Fault Classification

  39. Fail Stop Faults • A processor ceases operation • But informs other processors in computation that it has stopped • Relatively easy to deal with

  40. Crash Fault • A processor crashes or loses internal state and halts • Without notification to anyone else • Hard to distinguish from a really slow processor

  41. Omission Faults • A processor fails to do something in time • Like respond to a message • But otherwise it may still be operating correctly • Or it may have crashed

  42. Timing Fault • A processor completes a task before or after the window when it should • Or never • A late acknowledgement to a message, e.g.

  43. Incorrect Computation Fault • A processor fails to produce the correct results for a given set of input • Which could be merely not producing the results soon enough • Or could be sending back trash

  44. Authenticated Byzantine Fault • Processor performs an arbitrary or malicious fault • But authentication mechanisms note any alterations made to others’ messages

  45. Byzantine Fault • Any and every fault • Having arbitrarily bad consequences • Possibly working in combination with other faults to produce really bad results • In this classification, all other faults are subclasses of Byzantine faults

More Related