Modeling Failure Detectors and Timing Assumptions in Distributed Systems

Failure Detectors Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu

Modeling Timing Assumptions  Tedious to model eventual synchrony (partial synchrony)  Timing assumptions mostly needed to detect failures  Heartbeats, timeouts, etc…  Use failure detectors to encapsulate timing assumptions  Black box giving suspicions regarding node failures  Accuracy of suspicions depends on model strength 8/31/2024 2 Ali Ghodsi, alig(at)cs.berkeley.edu

Implementation of Failure Detectors Typical Implementation  Periodically exchange heartbeat messages  Timeout based on some worst case msg round trip  If timeout, then suspect node  If recv msg from suspected node, revise suspicion and increase time-out 8/31/2024 3 Ali Ghodsi, alig(at)cs.berkeley.edu

Completeness and Accuracy  Two important types of requirements 1. Completeness requirements Requirements regarding actually crashed nodes  When do they have to be detected?  2. Accuracy requirements Requirements regarding actually alive nodes  When are they allowed to be suspected?   How to trivially achieve either? [d]  Together they are impossible in an asynchronous system! 8/31/2024 4 Ali Ghodsi, alig(at)cs.berkeley.edu

Formal Model of FD  Augment formal model with failure detectors (FD)  A configuration consists of  State of each node  FD-state of each node  Transition function on node i gets extra parameter:  FD-state of node i  FD-state updated in comp(i) by another function  FD-function  Not modeled explicitly, but must satisfy some properties 8/31/2024 5 Ali Ghodsi, alig(at)cs.berkeley.edu

Requirements: Completeness  Strong Completeness  Every crashed node is eventually detected by all correct nodes  There exists a time after which all crashed nodes are detected by all correct nodes  The book only studies detectors with this property  Is it realistic? [d] 8/31/2024 6 Ali Ghodsi, alig(at)cs.berkeley.edu

Requirements: Completeness  Weak Completeness  Every crashed node is eventually detected by some correct node  There exists a time after which all crashed nodes are detected by some correct node  Possibly detected by different correct nodes 8/31/2024 7 Ali Ghodsi, alig(at)cs.berkeley.edu

Requirements: Accuracy  Strong Accuracy  No correct node is ever suspected  For all nodes p and q,  p does not suspect q, unless q has crashed  Is it realistic? [d]  Strong assumption, requires synchrony  I.e. no premature timeouts 8/31/2024 8 Ali Ghodsi, alig(at)cs.berkeley.edu

Requirements: Accuracy  Weak Accuracy  There exists a correct node which is never suspected by any node  There exists at least one correct node p  All nodes will never suspect p  Still strong assumption  One node is always “well-connected” 8/31/2024 9 Ali Ghodsi, alig(at)cs.berkeley.edu

Requirements: Eventual Accuracy  Eventual Strong Accuracy  After some finite time the FD provides strong accuracy  Eventual Weak Accuracy  After some finite time the detector provides weak accuracy  After some time, the requirements are fulfilled  Prior to that, any behavior is possible!  Quite weak assumptions [d]  When can eventual weak accuracy be achieved? 8/31/2024 10 Ali Ghodsi, alig(at)cs.berkeley.edu

Four Main Established Detectors  Four detectors with strong completeness  Perfect Detector (P)  Strong Accuracy Synchronous Systems  Strong Detector (S)  Weak Accuracy  Eventually Perfect Detector (P)  Eventual Strong Accuracy Asynchronous Systems  Eventually Strong Detector (S or Ω)  Eventual Weak Accuracy 8/31/2024 11 Ali Ghodsi, alig(at)cs.berkeley.edu

Four Less Interesting Detectors  Four detectors with weak completeness  Detector Q  Strong Accuracy Synchronous Systems  Weak Detector (W)  Weak Accuracy  Eventually Detector Q (Q)  Eventual Strong Accuracy Asynchronous Systems  Eventually Weak Detector (W)  Eventual Weak Accuracy 8/31/2024 12 Ali Ghodsi, alig(at)cs.berkeley.edu

Interface of Perfect Failure Detector  Module:  Name: PerfectFailureDetector, instance p  Events:  Indication: p, crash | pi  Notifies that node pihas crashed  Properties:  PFD1 (strong completeness)  PFD2 (strong accuracy) 8/31/2024 13 Ali Ghodsi, alig(at)cs.berkeley.edu

Properties of P  Properties:  PFD1 (strong completeness)  Eventually every node that crashes is permanently detected by every correct node (liveness)  PFD2 (strong accuracy)  If a node p is detected by any node, then p has crashed (safety)  Safety or Liveness? 8/31/2024 14 Ali Ghodsi, alig(at)cs.berkeley.edu

Implementing P in Synchrony  Assume synchronous system  Max transmission delay between 0 and  time units  Each node every  time units  Send <heartbeat> to all nodes  Each node waits + time units  If did not get <heartbeat> from pi  Detect <p, crash | pi> 8/31/2024 15 Ali Ghodsi, alig(at)cs.berkeley.edu

Correctness of P  PFD1 (strong completeness)  A crashed node doesn’t send <heartbeat>  Eventually every node will notice the absence of <heartbeat>  PFD2 (strong accuracy)  Assuming local computation is negligible  Maximum time between 2 heartbeats  + time units  If alive, all nodes will recv hb in time  No inaccuracy   pi pj   max delay 8/31/2024 16 Ali Ghodsi, alig(at)cs.berkeley.edu

Interface of EPFD  Module:  Name: EventuallyPerfectFailureDetector, instance ep  Events:  Indication: ep, suspect | pi  Notifies that node piis suspected to have crashed  Indication: ep, restore | pi  Notifies that node piis not suspected anymore  Properties:  PFD1 (strong completeness)  PFD2 (eventual strong accuracy). Eventually, no correct node is suspected by any correct node. 8/31/2024 17 Ali Ghodsi, alig(at)cs.berkeley.edu

Implementing P  Assume partially synchronous system  Eventually some bounds exists  Every  time units at each node:  Send <heartbeat> to all nodes  Each node waits T time units  If did not get <heartbeat> from pi  Indicate <ep, suspect | pi> if piis not in suspected  Put piin suspected set  If get HB from pi, and piis in suspected  Indicate <ep, restore | pi> and remove pifrom suspected  Increase timeout T 8/31/2024 18 Ali Ghodsi, alig(at)cs.berkeley.edu

Correctness of P  EPFD1 (strong completeness)  Same as before  EPFD2 (eventual strong accuracy)  Each time p is inaccurately suspected by a correct q timeout T is increased at q  Eventually system becomes synchronous, and T becomes larger than the unknown bound (T>  + )  q will receive HB on time, and never suspect p again 8/31/2024 19 Ali Ghodsi, alig(at)cs.berkeley.edu

Leader Election 8/31/2024 20 Ali Ghodsi, alig(at)cs.berkeley.edu

Leader Election vs Failure Detection  Failure detection captures failure behavior  Detect failed nodes  Leader election (LE) also captures failure behavior  Detect correct nodes (a single & same for all)  Formally, leader election is an FD  Always suspects all nodes except one (leader)  Ensures some properties regarding that node 8/31/2024 21 Ali Ghodsi, alig(at)cs.berkeley.edu

Leader Election vs Failure Detection  We’ll define two leader election algorithm  Leader election (LE) which “matches” P  Eventual leader election () which “matches” P 8/31/2024 22 Ali Ghodsi, alig(at)cs.berkeley.edu

Matching LE and P  P’s properties  P always eventually detects failures (strong completeness)  P never suspects correct nodes (strong accuracy)  Completeness of LE  Informally: eventually ditch crashed leaders  Formally: eventually every correct node trusts some correct node  Accuracy of LE  Informally: never ditch a correct leader  Formally: No two correct nodes trust different correct nodes Is this really accuracy? [d] Yes! Assume two nodes trust different correct nodes One of them must eventually switch, i.e. leaving a correct node    8/31/2024 23 Ali Ghodsi, alig(at)cs.berkeley.edu

LE desirable properties  LE always eventually detects failures  Eventually every correct node trusts some correct node  LE is always accurate  No two correct nodes trust different correct nodes  But the above two permit the following elect p3 elect p1 elect p2elect p1 p1 elect p3 elect p1 p2 elect p3 p3  But P1is “inaccurately” leaving a correct leader 8/31/2024 24 Ali Ghodsi, alig(at)cs.berkeley.edu

LE desirable properties  To avoid “inaccuracy” we add  Local Accuracy:  If a node is elected leader by pi, all previously elected leaders by pihave crashed Not allowed, as p1 is correct elect p3 elect p1 elect p2 elect p1 p1 elect p3 elect p1 p2 elect p3 p3 8/31/2024 25 Ali Ghodsi, alig(at)cs.berkeley.edu

Interface of Leader Election  Module:  Name: LeaderElection, instance le  Events:  Request: le, Leader | pi  Indicate that the leader is node pi  Properties:  LE1 (eventual completeness). Eventually every correct node trusts some correct node  LE2 (agreement). No two correct nodes trust different correct nodes  LE3 (local accuracy). If a node is elected leader by pi, all previously elected leaders by pihave crashed 8/31/2024 26 Ali Ghodsi, alig(at)cs.berkeley.edu

Implementing LE  Globally rank all nodes  e.g. rank ordering p1>p2>p3>p4 represented by function rank(process)=rank_int  e.g. rank(p1)=4, rank(p2)=3, rank(p3)=2, rank(p4)=1  maxrank(S)={return highest ranked node in S}  maxrank(S)=arg maxp∈S{rank(p)} 8/31/2024 27 Ali Ghodsi, alig(at)cs.berkeley.edu

Implementing LE (2)  Implements: LeaderElection, instance le.  Uses: PerfectFailureDetector, instance P.  upon event le, init do  suspected = ; leader :=   upon event P, crash | pi do suspected := suspected  {pi}   upon exists leader  maxrank(-suspected) do  leader := maxrank(-suspected)  trigger le, Leader | pi 8/31/2024 28 Ali Ghodsi, alig(at)cs.berkeley.edu

Matching  and P  P weakens P by only providing eventual accuracy  Weaken LE to  by only guaranteing eventual agreement LE Properties:  LE1 (eventual completeness). Eventually every correct node trusts some correct node  eventual  LE2 (agreement). No two correct nodes trust different correct nodes  LE3 (local accuracy). If a node is elected leader by pi, all previously elected leaders by pihave crashed 8/31/2024 29 Ali Ghodsi, alig(at)cs.berkeley.edu

Interface of Eventual Leader Election  Module:  Name: EventualLeaderElection, instance   Events:  Request: , leader | pi  Notify that pi is trusted to be leader  Properties:  ELD1 (eventual completeness). Eventually every correct node trusts some correct node  ELD2 (eventual agreement). Eventually no two correct nodes trust different correct nodes 8/31/2024 30 Ali Ghodsi, alig(at)cs.berkeley.edu

Eventual Leader Detection Ω  In crash-stop process abstraction  Ω is obtained directly from  P  Each node trusts the node with highest id among all nodes not suspected by ◊P  Eventually, exactly one correct process will be trusted by all correct processes 8/31/2024 31 Ali Ghodsi, alig(at)cs.berkeley.edu

Implementing Ω Implements: EventualLeaderElection, instance Ω. Uses: EventuallyPerfectFailureDetector, instance ep.   upon event Ω, init do suspected := ; leader := pn;  trigger Ω, Leader | leader   upon event ep, suspect | pi do  suspected := suspected  {pi}  upon event ep, restore | pi do  suspected := suspected \ {pi}  upon exists leader  maxrank(-suspected) do leader := maxrank(-suspected) trigger Ω, Leader | pi    8/31/2024 32 Ali Ghodsi, alig(at)cs.berkeley.edu

Ω for Crash Recovery  Can we elect a recovered node? [d]  Not if it keeps crash-recovering infinitely often!  Basic idea  Count number of times you’ve crashed (epoch)  Distribute your epoch periodically to all nodes  Elect leader with lowest (epoch, node_id)  Implementation  Similar to  P and  for crash-stop  Piggyback epoch with heartbeats  Store and load leader upon crash 8/31/2024 33 Ali Ghodsi, alig(at)cs.berkeley.edu

Byzantine Leader Election  Processes observe leaders behavior  Trigger bld, Complain | p if leader p doesn’t adhere to specification, i.e. its Byzantine faulty  Resilience is f Byzantine nodes  N>3f  Function leader(round) rotates through nodes  1p1, 2p2, …, NpN, N+1p1, N+2p2, … 8/31/2024 34 Ali Ghodsi, alig(at)cs.berkeley.edu

Interface of Byzantine ELE  Module:  Name: ByzantineLeaderElection, instance bld  Events:  Indication: bld, Trust | p Notify that p is trusted to be leader   Request: bld, Complain| p Complain that leader pisn’t trusted   Properties:  Eventual Succession. If more than f correct nodes complain about p then every correct node eventually trusts some other leader than p  Putsch Resistance. A correct node does not trust a new leader unless at least one correct node complained about the current one  Eventual Agreement. Eventually no two correct nodes trust different nodes 8/31/2024 35 Ali Ghodsi, alig(at)cs.berkeley.edu

Rotating Byzantine LE Implements: ByzantineLeaderElection, instance bld. Uses: AuthPerfectPointToPointLinks, instance al.   upon event bld, init do round := 1; complList := [nil]*N; compl:= false; trigger bld, trust | leader(round)    upon event bld, Complain| p s.t. p=leader(round) and compl:=false do compl := True forall q in  do trigger al, send | q, [Complaint, round]  If p suspected of being faulty broadcast Complaint    8/31/2024 36 Ali Ghodsi, alig(at)cs.berkeley.edu

Rotating Byzantine LE (2)  Implements: ByzantineLeaderElection, instance bld.  Uses: AuthPerfectPointToPointLinks, instance al.  upon event al, Deliver | p, [Complaint, r] s.t. r=round and complList[p]=nil  complList[p] := Complaint  if #(complList) > f and compl = False then compl := True forall q in  do  trigger al, send | q, [Complaint, round] If more than f complain then broadcast Complaint    else if #(complList) > 2f then round := round + 1 complList := [nil]*N compl := False trigger bld, trust | leader(round) If more than 2f complain then switch leader     8/31/2024 37 Ali Ghodsi, alig(at)cs.berkeley.edu

Correctness of Byzantine ELE  Why not switch leader as soon as received more than f complaints? [d]  Byzantine nodes send complaints to only 1 node.  Thus, one correct node receives f+1 complaints  All other correct nodes receive 1 complaint  Only one correct node switched leader  Eventual succession  When received f+1 complaints, correct nodes relay bcast  Every correct node will get > 2f complaints  New leader will be elected 8/31/2024 38 Ali Ghodsi, alig(at)cs.berkeley.edu

Correctness of Byzantine ELE (2)  Putsch resistance  f Byzantine nodes cannot generate f+1 complaints required for relaying and leader switch to happen  Eventual agreement  Correct nodes synchronously move through rounds, as a node moves to new round when it ensured every other correct node will move too (by relaying)  Eventually all faulty (Byzantine) nodes detected 8/31/2024 39 Ali Ghodsi, alig(at)cs.berkeley.edu

Reductions 8/31/2024 40 Ali Ghodsi, alig(at)cs.berkeley.edu

Reductions  We say X≼Y if  X can be solved given a solution of Y  Read X is reducible to Y  Informally, problem X is weaker or as hard as Y 8/31/2024 41 Ali Ghodsi, alig(at)cs.berkeley.edu

Preorders, partial orders…  A relation ~ is a preorder on a set A if for any x,y,z in A  x~x (reflexivity)  x~y and y~z implies x~z (transitivity)  Difference between preorder and partial order  Partial order is a preorder with antisymmetry x~y and y~x implies x=y   I.e. two different objects x and y cannot be symmetric i.e. it isn’t possible that x~y and y~x for two different x and y  8/31/2024 42 Ali Ghodsi, alig(at)cs.berkeley.edu

≼ is a preorder  ≼ is a preorder  Reflexivity. X≼X X can be solved given a solution to X   Transitivity. X≼Y and Y≼Z implies X≼Z  Since Y≼Z, use impl. of Z to impl. Y. use impl. of Y to impl. X. Hence we impl. Y from Z’s impl.  ≼ is not antisymmetric, thus not a partial order  Two different X and Y can be equivalent Distinct problems X and Y can be solved from the other’s solution  8/31/2024 43 Ali Ghodsi, alig(at)cs.berkeley.edu

Shortcut definitions  We write X≃Y if  X≼Y and Y≼X  Problem X is equivalent to Y  We write X≺Y if  X≼Y and not X≃Y  or equivalently, X≼Y and not Y≼X  Problem X is strictly weaker than Y, or  Problem Y is strictly stronger than X 8/31/2024 44 Ali Ghodsi, alig(at)cs.berkeley.edu

Example  It is true that P≼P  Given P, we can implement P We just return P’s suspicions. P always satisfies P’s properties    In fact, P≺P in the asynchronous model  Because not P≼P is true  Reductions common in computability theory  If X≼Y, and if we know X is impossible to solve  Then Y is impossible to solve too  If P≼P, and some problem Z can be solved with P Then Z can also be solved with P  8/31/2024 45 Ali Ghodsi, alig(at)cs.berkeley.edu

Weakest FD for a problem?  Often P is used to solve problem X  But P is not very practical (needs synchrony)  Is X a “practically” solvable problem?  Can we implement X with P?  Sometimes a weaker FD than P will not solve X  Proven using reductions 8/31/2024 46 Ali Ghodsi, alig(at)cs.berkeley.edu

Weakest FD for a problem  We might know that X≼P (X solvable with P)  Can we solve X weaker FD than P, say P?  Or is it impossible, i.e. P≺X  Common proof to show P is weakest FD for X  Prove that P≼X  I.e. P can be solved given X  If P≼X then P≺X  Because we know P≺P and P≃X, i.e. P≺P≃X  If we can solve X with P, then  we can solve P with P, which is a contradiction 8/31/2024 47 Ali Ghodsi, alig(at)cs.berkeley.edu

How are the detectors related 8/31/2024 48 Ali Ghodsi, alig(at)cs.berkeley.edu

Trivial Reductions  Strongly complete  P≼P P is always strongly accurate, thus also eventually strongly accurate  P  S≼S S is always weakly accurate, thus also eventually weakly accurate  P S  S≼P P is always strongly accurate, thus also always weakly accurate  S  S≼P P is always eventually strongly accurate, thus also always eventually weakly accurate  8/31/2024 49 Ali Ghodsi, alig(at)cs.berkeley.edu

Trivial Reductions (2)  Weakly complete  Q≼Q Q is always strongly accurate, thus also eventually strongly accurate  Q  W≼W W is always weakly accurate, thus also eventually weakly accurate  Q W  W≼Q Q is always strongly accurate, thus also always weakly accurate  W  W≼Q P is always eventually strongly accurate, thus also always eventually weakly accurate  8/31/2024 50 Ali Ghodsi, alig(at)cs.berkeley.edu

Modeling Failure Detectors and Timing Assumptions in Distributed Systems