1 / 119

SECOND PART: Algorithms for UNRELIABLE Distributed Systems (The consensus problem)

SECOND PART: Algorithms for UNRELIABLE Distributed Systems (The consensus problem). Failures in Distributed Systems. Link failure: A link fails and remains inactive; the network may get disconnected Processor Crash: At some point, a processor stops taking steps

blue
Download Presentation

SECOND PART: Algorithms for UNRELIABLE Distributed Systems (The consensus problem)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SECOND PART: Algorithms for UNRELIABLE Distributed Systems (The consensus problem)

  2. Failures in Distributed Systems • Link failure: A link fails and remains inactive; the network may get disconnected • Processor Crash: At some point, a processor stops taking steps • Byzantine processor: processor changes state arbitrarily and sends messages with arbitrary content (name dates back to untrustable Byzantine Generals of Byzantine Empire, IV–XV century A.D.)

  3. Link Failures a a Non-faulty links b b a c a c

  4. a a Faulty link b b a c c Some of the messages are not delivered

  5. Crash Failures a a Non-faulty processor b b a c a c

  6. a a Faulty processor b b Some of the messages are not sent

  7. Round 1 Round 2 Round 3 Round 4 Round 5 Failure After failure the processor disappears from the network

  8. Byzantine Failures a a Non-faulty processor b b a c a c

  9. Byzantine Failures a Faulty processor a *!§ç# *!§ç# %&/£ %&/£ Processor sends arbitrary messages, plus some messages may be not sent

  10. Round 1 Round 2 Round 3 Round 4 Round 5 Round 6 Failure Failure After failure the processor may continue functioning in the network

  11. Consensus Problem • Every processor has an input x є X • Termination: Eventually every non-faulty processor must decide on a value y. • Agreement: All decisions by non-faulty processors must be the same. • Validity: If all inputs are the same, then the decision of a non-faulty processor must equal the common input (this avoids trivial solutions).

  12. Agreement Start Finish 0 2 1 3 3 3 3 3 2 3 Everybody has an initial value All non-faulty must decide the same value

  13. Validity If everybody starts with the same value, then non-faulty must decide that value Finish Start 1 2 1 1 1 1 1 1 1 1

  14. Negative result for link failures • It isimpossible to reach consensus in case of link failures, even in the synchronous case, and even if one only wants to tolerate a single link failure.

  15. Consensus under link failures:the 2 generals problem • There are two generals of the same army who have encamped a short distance apart. • • Their objective is to capture a hill, which is possible only if they attack simultaneously. • • If only one general attacks, he will be defeated. • • The two generals can only communicate by sending messengers, which is not reliable. • • Is it possible for them to attack simultaneously?

  16. The 2 generals problem Let’s attack B A

  17. Impossibility of consensus under link failures • First of all, notice that it is needed to exchange messages to reach consensus (generals might have different opinions in mind!) • Assume the problem can be solved, and let Π be the shortest (i.e., with minimum number of messages) protocol for a given input configuration. • Suppose now that the last message in Π does not reach the destination. Since Π is correct, consensus must be reached in any case. This means, the last message was useless, and then Π could not be shortest!

  18. Negative result for processor failuresin asynchronous systems • It isimpossible to reach consensus with crash failures in the asynchronous case, even if one only wants to tolerate a single crash failure.

  19. Assumption on the communication modelfor crash and byzantine failures • Complete undirected graph • Synchronous network: we assume that messages are sent, delivered and read in the very same round

  20. Overview of Consensus Results • Let f be the maximum number of faulty processors

  21. A simple algorithm for fault-free consensus Each processor: • Broadcast its input to all processors • Decide on the minimum (only one round is needed)

  22. Start 0 1 4 3 2

  23. Broadcast values 0,1,2,3,4 0 0,1,2,3,4 0,1,2,3,4 1 4 0,1,2,3,4 3 2 0,1,2,3,4

  24. Decide on minimum 0,1,2,3,4 0 0,1,2,3,4 0,1,2,3,4 0 0 0,1,2,3,4 0 0 0,1,2,3,4

  25. Finish 0 0 0 0 0

  26. 1 1 1 1 1 1 1 1 1 1 This algorithm satisfies the validity condition Finish Start If everybody starts with the same initial value, everybody decides on that value (minimum)

  27. Consensus with Crash Failures The simple algorithm doesn’t work Each processor: • Broadcast value to all processors • Decide on the minimum

  28. Start fail 0 0 1 0 4 3 2 The failed processor doesn’t broadcast its value to all processors

  29. Broadcasted values fail 0 0,1,2,3,4 1,2,3,4 1 4 0,1,2,3,4 1,2,3,4 3 2

  30. Decide on minimum fail 0 0,1,2,3,4 1,2,3,4 0 1 0,1,2,3,4 1,2,3,4 0 1

  31. Finish fail 0 0 1 0 1 No Consensus!!!

  32. If an algorithm solves consensus for f failed (crashing) processors we say it is: an f-resilient consensus algorithm

  33. An f-resilient algorithm Round 1: Broadcast my value Round 2 to round f+1: Broadcast any new received values End of round f+1: Decide on the minimum value received

  34. Example: f=1 failures, f+1 = 2 rounds needed Start 0 1 4 3 2

  35. Example: f=1 failures, f+1 = 2 rounds needed Round 1 0 fail 0 0,1,2,3,4 1,2,3,4 1 0 4 (new values) 0,1,2,3,4 1,2,3,4 3 2 Broadcast all values to everybody

  36. Example: f=1 failures, f+1 = 2 rounds needed Round 2 0 0,1,2,3,4 0,1,2,3,4 1 4 0,1,2,3,4 0,1,2,3,4 3 2 Broadcast all new values to everybody

  37. Example: f=1 failures, f+1 = 2 rounds needed Finish 0 0,1,2,3,4 0,1,2,3,4 0 0 0,1,2,3,4 0,1,2,3,4 0 0 Decide on minimum value

  38. Example: f=2 failures, f+1 = 3 rounds needed Start 0 1 4 3 2

  39. Example: f=2 failures, f+1 = 3 rounds needed Round 1 0 Failure 1 1,2,3,4 1,2,3,4 1 0 4 0,1,2,3,4 1,2,3,4 3 2 Broadcast all values to everybody

  40. Example: f=2 failures, f+1 = 3 rounds needed Round 2 0 Failure 1 0,1,2,3,4 1,2,3,4 1 4 0 0,1,2,3,4 1,2,3,4 3 2 Failure 2 Broadcast new values to everybody

  41. Example: f=2 failures, f+1 = 3 rounds needed Round 3 0 Failure 1 0,1,2,3,4 0,1,2,3,4 1 4 0,1,2,3,4 0,1,2,3,4 3 2 Failure 2 Broadcast new values to everybody

  42. Example: f=2 failures, f+1 = 3 rounds needed Finish 0 Failure 1 0,1,2,3,4 0,1,2,3,4 0 0 0,1,2,3,4 0,1,2,3,4 3 0 Failure 2 Decide on the minimum value

  43. Example: f=2 failures, f+1 = 3 rounds needed Start 0 1 4 3 2 Another example execution with 2 failures

  44. Example: f=2 failures, f+1 = 3 rounds needed Round 1 0 Failure 1 1,2,3,4 1,2,3,4 1 0 4 0,1,2,3,4 1,2,3,4 3 2 Broadcast all values to everybody

  45. Example: f=2 failures, f+1 = 3 rounds needed Round 2 0 Failure 1 0,1,2,3,4 0,1,2,3,4 1 4 0,1,2,3,4 0,1,2,3,4 3 2 Broadcast new values to everybody Remark: At the end of this round all processes know about all the other values

  46. Example: f=2 failures, f+1 = 3 rounds needed Round 3 0 Failure 1 0,1,2,3,4 0,1,2,3,4 1 4 0,1,2,3,4 0,1,2,3,4 3 2 Failure 2 Broadcast new values to everybody (no new values are learned in this round)

  47. Example: f=2 failures, f+1 = 3 rounds needed Finish 0 Failure 1 0,1,2,3,4 0,1,2,3,4 0 0 0,1,2,3,4 0,1,2,3,4 3 0 Failure 2 Decide on minimum value

  48. If there are f failures and f+1 rounds then there is a round with no failed processors 2 3 4 5 6 1 Round Example: 5 failures, 6 rounds No failure

  49. In the algorithm, at the end of the round with no failure: • Every (non-faulty) processor knows • about all the values of all other • participating processors • This knowledge doesn’t change until • the end of the algorithm

  50. Therefore, at the end of the round with no failure: everybody would decide the same value However, we don’t know the exact position of this round, so we have to let the algorithm execute for f+1 rounds

More Related