1 / 60

Modeling Failure Detectors and Timing Assumptions in Distributed Systems

Ali Ghodsi from UC Berkeley discusses the implementation and requirements of failure detectors in distributed systems. The focus is on completeness and accuracy, with an exploration of different types of detectors and their properties, such as strong completeness and accuracy, weak completeness, and eventual accuracy. The content dives into formal models, the interface of perfect failure detectors, and the challenges of achieving strong accuracy without synchronous systems. Various established detectors, including Perfect Detector, Strong Detector, and Weak Detector, are compared based on their completeness and accuracy characteristics.

amarouch
Download Presentation

Modeling Failure Detectors and Timing Assumptions in Distributed Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Failure Detectors Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu

  2. Modeling Timing Assumptions  Tedious to model eventual synchrony (partial synchrony)  Timing assumptions mostly needed to detect failures  Heartbeats, timeouts, etc…  Use failure detectors to encapsulate timing assumptions  Black box giving suspicions regarding node failures  Accuracy of suspicions depends on model strength 8/31/2024 2 Ali Ghodsi, alig(at)cs.berkeley.edu

  3. Implementation of Failure Detectors Typical Implementation  Periodically exchange heartbeat messages  Timeout based on some worst case msg round trip  If timeout, then suspect node  If recv msg from suspected node, revise suspicion and increase time-out 8/31/2024 3 Ali Ghodsi, alig(at)cs.berkeley.edu

  4. Completeness and Accuracy  Two important types of requirements 1. Completeness requirements Requirements regarding actually crashed nodes  When do they have to be detected?  2. Accuracy requirements Requirements regarding actually alive nodes  When are they allowed to be suspected?   How to trivially achieve either? [d]  Together they are impossible in an asynchronous system! 8/31/2024 4 Ali Ghodsi, alig(at)cs.berkeley.edu

  5. Formal Model of FD  Augment formal model with failure detectors (FD)  A configuration consists of  State of each node  FD-state of each node  Transition function on node i gets extra parameter:  FD-state of node i  FD-state updated in comp(i) by another function  FD-function  Not modeled explicitly, but must satisfy some properties 8/31/2024 5 Ali Ghodsi, alig(at)cs.berkeley.edu

  6. Requirements: Completeness  Strong Completeness  Every crashed node is eventually detected by all correct nodes  There exists a time after which all crashed nodes are detected by all correct nodes  The book only studies detectors with this property  Is it realistic? [d] 8/31/2024 6 Ali Ghodsi, alig(at)cs.berkeley.edu

  7. Requirements: Completeness  Weak Completeness  Every crashed node is eventually detected by some correct node  There exists a time after which all crashed nodes are detected by some correct node  Possibly detected by different correct nodes 8/31/2024 7 Ali Ghodsi, alig(at)cs.berkeley.edu

  8. Requirements: Accuracy  Strong Accuracy  No correct node is ever suspected  For all nodes p and q,  p does not suspect q, unless q has crashed  Is it realistic? [d]  Strong assumption, requires synchrony  I.e. no premature timeouts 8/31/2024 8 Ali Ghodsi, alig(at)cs.berkeley.edu

  9. Requirements: Accuracy  Weak Accuracy  There exists a correct node which is never suspected by any node  There exists at least one correct node p  All nodes will never suspect p  Still strong assumption  One node is always “well-connected” 8/31/2024 9 Ali Ghodsi, alig(at)cs.berkeley.edu

  10. Requirements: Eventual Accuracy  Eventual Strong Accuracy  After some finite time the FD provides strong accuracy  Eventual Weak Accuracy  After some finite time the detector provides weak accuracy  After some time, the requirements are fulfilled  Prior to that, any behavior is possible!  Quite weak assumptions [d]  When can eventual weak accuracy be achieved? 8/31/2024 10 Ali Ghodsi, alig(at)cs.berkeley.edu

  11. Four Main Established Detectors  Four detectors with strong completeness  Perfect Detector (P)  Strong Accuracy Synchronous Systems  Strong Detector (S)  Weak Accuracy  Eventually Perfect Detector (P)  Eventual Strong Accuracy Asynchronous Systems  Eventually Strong Detector (S or Ω)  Eventual Weak Accuracy 8/31/2024 11 Ali Ghodsi, alig(at)cs.berkeley.edu

  12. Four Less Interesting Detectors  Four detectors with weak completeness  Detector Q  Strong Accuracy Synchronous Systems  Weak Detector (W)  Weak Accuracy  Eventually Detector Q (Q)  Eventual Strong Accuracy Asynchronous Systems  Eventually Weak Detector (W)  Eventual Weak Accuracy 8/31/2024 12 Ali Ghodsi, alig(at)cs.berkeley.edu

  13. Interface of Perfect Failure Detector  Module:  Name: PerfectFailureDetector, instance p  Events:  Indication: p, crash | pi  Notifies that node pihas crashed  Properties:  PFD1 (strong completeness)  PFD2 (strong accuracy) 8/31/2024 13 Ali Ghodsi, alig(at)cs.berkeley.edu

  14. Properties of P  Properties:  PFD1 (strong completeness)  Eventually every node that crashes is permanently detected by every correct node (liveness)  PFD2 (strong accuracy)  If a node p is detected by any node, then p has crashed (safety)  Safety or Liveness? 8/31/2024 14 Ali Ghodsi, alig(at)cs.berkeley.edu

  15. Implementing P in Synchrony  Assume synchronous system  Max transmission delay between 0 and  time units  Each node every  time units  Send <heartbeat> to all nodes  Each node waits + time units  If did not get <heartbeat> from pi  Detect <p, crash | pi> 8/31/2024 15 Ali Ghodsi, alig(at)cs.berkeley.edu

  16. Correctness of P  PFD1 (strong completeness)  A crashed node doesn’t send <heartbeat>  Eventually every node will notice the absence of <heartbeat>  PFD2 (strong accuracy)  Assuming local computation is negligible  Maximum time between 2 heartbeats  + time units  If alive, all nodes will recv hb in time  No inaccuracy   pi pj   max delay 8/31/2024 16 Ali Ghodsi, alig(at)cs.berkeley.edu

  17. Interface of EPFD  Module:  Name: EventuallyPerfectFailureDetector, instance ep  Events:  Indication: ep, suspect | pi  Notifies that node piis suspected to have crashed  Indication: ep, restore | pi  Notifies that node piis not suspected anymore  Properties:  PFD1 (strong completeness)  PFD2 (eventual strong accuracy). Eventually, no correct node is suspected by any correct node. 8/31/2024 17 Ali Ghodsi, alig(at)cs.berkeley.edu

  18. Implementing P  Assume partially synchronous system  Eventually some bounds exists  Every  time units at each node:  Send <heartbeat> to all nodes  Each node waits T time units  If did not get <heartbeat> from pi  Indicate <ep, suspect | pi> if piis not in suspected  Put piin suspected set  If get HB from pi, and piis in suspected  Indicate <ep, restore | pi> and remove pifrom suspected  Increase timeout T 8/31/2024 18 Ali Ghodsi, alig(at)cs.berkeley.edu

  19. Correctness of P  EPFD1 (strong completeness)  Same as before  EPFD2 (eventual strong accuracy)  Each time p is inaccurately suspected by a correct q timeout T is increased at q  Eventually system becomes synchronous, and T becomes larger than the unknown bound (T>  + )  q will receive HB on time, and never suspect p again 8/31/2024 19 Ali Ghodsi, alig(at)cs.berkeley.edu

  20. Leader Election 8/31/2024 20 Ali Ghodsi, alig(at)cs.berkeley.edu

  21. Leader Election vs Failure Detection  Failure detection captures failure behavior  Detect failed nodes  Leader election (LE) also captures failure behavior  Detect correct nodes (a single & same for all)  Formally, leader election is an FD  Always suspects all nodes except one (leader)  Ensures some properties regarding that node 8/31/2024 21 Ali Ghodsi, alig(at)cs.berkeley.edu

  22. Leader Election vs Failure Detection  We’ll define two leader election algorithm  Leader election (LE) which “matches” P  Eventual leader election () which “matches” P 8/31/2024 22 Ali Ghodsi, alig(at)cs.berkeley.edu

  23. Matching LE and P  P’s properties  P always eventually detects failures (strong completeness)  P never suspects correct nodes (strong accuracy)  Completeness of LE  Informally: eventually ditch crashed leaders  Formally: eventually every correct node trusts some correct node  Accuracy of LE  Informally: never ditch a correct leader  Formally: No two correct nodes trust different correct nodes Is this really accuracy? [d] Yes! Assume two nodes trust different correct nodes One of them must eventually switch, i.e. leaving a correct node    8/31/2024 23 Ali Ghodsi, alig(at)cs.berkeley.edu

  24. LE desirable properties  LE always eventually detects failures  Eventually every correct node trusts some correct node  LE is always accurate  No two correct nodes trust different correct nodes  But the above two permit the following elect p3 elect p1 elect p2elect p1 p1 elect p3 elect p1 p2 elect p3 p3  But P1is “inaccurately” leaving a correct leader 8/31/2024 24 Ali Ghodsi, alig(at)cs.berkeley.edu

  25. LE desirable properties  To avoid “inaccuracy” we add  Local Accuracy:  If a node is elected leader by pi, all previously elected leaders by pihave crashed Not allowed, as p1 is correct elect p3 elect p1 elect p2 elect p1 p1 elect p3 elect p1 p2 elect p3 p3 8/31/2024 25 Ali Ghodsi, alig(at)cs.berkeley.edu

  26. Interface of Leader Election  Module:  Name: LeaderElection, instance le  Events:  Request: le, Leader | pi  Indicate that the leader is node pi  Properties:  LE1 (eventual completeness). Eventually every correct node trusts some correct node  LE2 (agreement). No two correct nodes trust different correct nodes  LE3 (local accuracy). If a node is elected leader by pi, all previously elected leaders by pihave crashed 8/31/2024 26 Ali Ghodsi, alig(at)cs.berkeley.edu

  27. Implementing LE  Globally rank all nodes  e.g. rank ordering p1>p2>p3>p4 represented by function rank(process)=rank_int  e.g. rank(p1)=4, rank(p2)=3, rank(p3)=2, rank(p4)=1  maxrank(S)={return highest ranked node in S}  maxrank(S)=arg maxp∈S{rank(p)} 8/31/2024 27 Ali Ghodsi, alig(at)cs.berkeley.edu

  28. Implementing LE (2)  Implements: LeaderElection, instance le.  Uses: PerfectFailureDetector, instance P.  upon event le, init do  suspected = ; leader :=   upon event P, crash | pi do suspected := suspected  {pi}   upon exists leader  maxrank(-suspected) do  leader := maxrank(-suspected)  trigger le, Leader | pi 8/31/2024 28 Ali Ghodsi, alig(at)cs.berkeley.edu

  29. Matching  and P  P weakens P by only providing eventual accuracy  Weaken LE to  by only guaranteing eventual agreement LE Properties:  LE1 (eventual completeness). Eventually every correct node trusts some correct node  eventual  LE2 (agreement). No two correct nodes trust different correct nodes  LE3 (local accuracy). If a node is elected leader by pi, all previously elected leaders by pihave crashed 8/31/2024 29 Ali Ghodsi, alig(at)cs.berkeley.edu

  30. Interface of Eventual Leader Election  Module:  Name: EventualLeaderElection, instance   Events:  Request: , leader | pi  Notify that pi is trusted to be leader  Properties:  ELD1 (eventual completeness). Eventually every correct node trusts some correct node  ELD2 (eventual agreement). Eventually no two correct nodes trust different correct nodes 8/31/2024 30 Ali Ghodsi, alig(at)cs.berkeley.edu

  31. Eventual Leader Detection Ω  In crash-stop process abstraction  Ω is obtained directly from  P  Each node trusts the node with highest id among all nodes not suspected by ◊P  Eventually, exactly one correct process will be trusted by all correct processes 8/31/2024 31 Ali Ghodsi, alig(at)cs.berkeley.edu

  32. Implementing Ω Implements: EventualLeaderElection, instance Ω. Uses: EventuallyPerfectFailureDetector, instance ep.   upon event Ω, init do suspected := ; leader := pn;  trigger Ω, Leader | leader   upon event ep, suspect | pi do  suspected := suspected  {pi}  upon event ep, restore | pi do  suspected := suspected \ {pi}  upon exists leader  maxrank(-suspected) do leader := maxrank(-suspected) trigger Ω, Leader | pi    8/31/2024 32 Ali Ghodsi, alig(at)cs.berkeley.edu

  33. Ω for Crash Recovery  Can we elect a recovered node? [d]  Not if it keeps crash-recovering infinitely often!  Basic idea  Count number of times you’ve crashed (epoch)  Distribute your epoch periodically to all nodes  Elect leader with lowest (epoch, node_id)  Implementation  Similar to  P and  for crash-stop  Piggyback epoch with heartbeats  Store and load leader upon crash 8/31/2024 33 Ali Ghodsi, alig(at)cs.berkeley.edu

  34. Byzantine Leader Election  Processes observe leaders behavior  Trigger bld, Complain | p if leader p doesn’t adhere to specification, i.e. its Byzantine faulty  Resilience is f Byzantine nodes  N>3f  Function leader(round) rotates through nodes  1p1, 2p2, …, NpN, N+1p1, N+2p2, … 8/31/2024 34 Ali Ghodsi, alig(at)cs.berkeley.edu

  35. Interface of Byzantine ELE  Module:  Name: ByzantineLeaderElection, instance bld  Events:  Indication: bld, Trust | p Notify that p is trusted to be leader   Request: bld, Complain| p Complain that leader pisn’t trusted   Properties:  Eventual Succession. If more than f correct nodes complain about p then every correct node eventually trusts some other leader than p  Putsch Resistance. A correct node does not trust a new leader unless at least one correct node complained about the current one  Eventual Agreement. Eventually no two correct nodes trust different nodes 8/31/2024 35 Ali Ghodsi, alig(at)cs.berkeley.edu

  36. Rotating Byzantine LE Implements: ByzantineLeaderElection, instance bld. Uses: AuthPerfectPointToPointLinks, instance al.   upon event bld, init do round := 1; complList := [nil]*N; compl:= false; trigger bld, trust | leader(round)    upon event bld, Complain| p s.t. p=leader(round) and compl:=false do compl := True forall q in  do trigger al, send | q, [Complaint, round]  If p suspected of being faulty broadcast Complaint    8/31/2024 36 Ali Ghodsi, alig(at)cs.berkeley.edu

  37. Rotating Byzantine LE (2)  Implements: ByzantineLeaderElection, instance bld.  Uses: AuthPerfectPointToPointLinks, instance al.  upon event al, Deliver | p, [Complaint, r] s.t. r=round and complList[p]=nil  complList[p] := Complaint  if #(complList) > f and compl = False then compl := True forall q in  do  trigger al, send | q, [Complaint, round] If more than f complain then broadcast Complaint    else if #(complList) > 2f then round := round + 1 complList := [nil]*N compl := False trigger bld, trust | leader(round) If more than 2f complain then switch leader     8/31/2024 37 Ali Ghodsi, alig(at)cs.berkeley.edu

  38. Correctness of Byzantine ELE  Why not switch leader as soon as received more than f complaints? [d]  Byzantine nodes send complaints to only 1 node.  Thus, one correct node receives f+1 complaints  All other correct nodes receive 1 complaint  Only one correct node switched leader  Eventual succession  When received f+1 complaints, correct nodes relay bcast  Every correct node will get > 2f complaints  New leader will be elected 8/31/2024 38 Ali Ghodsi, alig(at)cs.berkeley.edu

  39. Correctness of Byzantine ELE (2)  Putsch resistance  f Byzantine nodes cannot generate f+1 complaints required for relaying and leader switch to happen  Eventual agreement  Correct nodes synchronously move through rounds, as a node moves to new round when it ensured every other correct node will move too (by relaying)  Eventually all faulty (Byzantine) nodes detected 8/31/2024 39 Ali Ghodsi, alig(at)cs.berkeley.edu

  40. Reductions 8/31/2024 40 Ali Ghodsi, alig(at)cs.berkeley.edu

  41. Reductions  We say X≼Y if  X can be solved given a solution of Y  Read X is reducible to Y  Informally, problem X is weaker or as hard as Y 8/31/2024 41 Ali Ghodsi, alig(at)cs.berkeley.edu

  42. Preorders, partial orders…  A relation ~ is a preorder on a set A if for any x,y,z in A  x~x (reflexivity)  x~y and y~z implies x~z (transitivity)  Difference between preorder and partial order  Partial order is a preorder with antisymmetry x~y and y~x implies x=y   I.e. two different objects x and y cannot be symmetric i.e. it isn’t possible that x~y and y~x for two different x and y  8/31/2024 42 Ali Ghodsi, alig(at)cs.berkeley.edu

  43. ≼ is a preorder  ≼ is a preorder  Reflexivity. X≼X X can be solved given a solution to X   Transitivity. X≼Y and Y≼Z implies X≼Z  Since Y≼Z, use impl. of Z to impl. Y. use impl. of Y to impl. X. Hence we impl. Y from Z’s impl.  ≼ is not antisymmetric, thus not a partial order  Two different X and Y can be equivalent Distinct problems X and Y can be solved from the other’s solution  8/31/2024 43 Ali Ghodsi, alig(at)cs.berkeley.edu

  44. Shortcut definitions  We write X≃Y if  X≼Y and Y≼X  Problem X is equivalent to Y  We write X≺Y if  X≼Y and not X≃Y  or equivalently, X≼Y and not Y≼X  Problem X is strictly weaker than Y, or  Problem Y is strictly stronger than X 8/31/2024 44 Ali Ghodsi, alig(at)cs.berkeley.edu

  45. Example  It is true that P≼P  Given P, we can implement P We just return P’s suspicions. P always satisfies P’s properties    In fact, P≺P in the asynchronous model  Because not P≼P is true  Reductions common in computability theory  If X≼Y, and if we know X is impossible to solve  Then Y is impossible to solve too  If P≼P, and some problem Z can be solved with P Then Z can also be solved with P  8/31/2024 45 Ali Ghodsi, alig(at)cs.berkeley.edu

  46. Weakest FD for a problem?  Often P is used to solve problem X  But P is not very practical (needs synchrony)  Is X a “practically” solvable problem?  Can we implement X with P?  Sometimes a weaker FD than P will not solve X  Proven using reductions 8/31/2024 46 Ali Ghodsi, alig(at)cs.berkeley.edu

  47. Weakest FD for a problem  We might know that X≼P (X solvable with P)  Can we solve X weaker FD than P, say P?  Or is it impossible, i.e. P≺X  Common proof to show P is weakest FD for X  Prove that P≼X  I.e. P can be solved given X  If P≼X then P≺X  Because we know P≺P and P≃X, i.e. P≺P≃X  If we can solve X with P, then  we can solve P with P, which is a contradiction 8/31/2024 47 Ali Ghodsi, alig(at)cs.berkeley.edu

  48. How are the detectors related 8/31/2024 48 Ali Ghodsi, alig(at)cs.berkeley.edu

  49. Trivial Reductions  Strongly complete  P≼P P is always strongly accurate, thus also eventually strongly accurate  P  S≼S S is always weakly accurate, thus also eventually weakly accurate  P S  S≼P P is always strongly accurate, thus also always weakly accurate  S  S≼P P is always eventually strongly accurate, thus also always eventually weakly accurate  8/31/2024 49 Ali Ghodsi, alig(at)cs.berkeley.edu

  50. Trivial Reductions (2)  Weakly complete  Q≼Q Q is always strongly accurate, thus also eventually strongly accurate  Q  W≼W W is always weakly accurate, thus also eventually weakly accurate  Q W  W≼Q Q is always strongly accurate, thus also always weakly accurate  W  W≼Q P is always eventually strongly accurate, thus also always eventually weakly accurate  8/31/2024 50 Ali Ghodsi, alig(at)cs.berkeley.edu

More Related