1 / 55

Fast Leader (Full) Recovery despite Dynamic Faults

Fast Leader (Full) Recovery despite Dynamic Faults. Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Sébastien Tixeuil. Join Work. Sébastien Tixeuil. Ajoy K. Datta & Lawrence L. Larmore. Self-Stabilization [Dijkstra,74]. Self-Stabilization [Dijkstra,74].

truong
Download Presentation

Fast Leader (Full) Recovery despite Dynamic Faults

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Leader (Full) Recovery despite Dynamic Faults Ajoy K. Datta Stéphane Devismes Lawrence L. Larmore Sébastien Tixeuil

  2. Join Work Sébastien Tixeuil Ajoy K. Datta & Lawrence L. Larmore ICDCN, 04/01/2013, Mumbia

  3. Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia

  4. Self-Stabilization [Dijkstra,74] A fault = a process state corruption ICDCN, 04/01/2013, Mumbia

  5. Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia

  6. Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia

  7. Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia

  8. Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia

  9. Self-Stabilization [Dijkstra,74] ICDCN, 04/01/2013, Mumbia

  10. Self-Stabilization [Dijkstra,74] Recover after any number of transient faults ICDCN, 04/01/2013, Mumbia

  11. Price of the Versatility • Several impossibility results • E.g., Leader Election and Token Circulation in anonymous networks • The stabilization time usually depends on global parameters (diameter, size of the network …) ICDCN, 04/01/2013, Mumbia

  12. Price of the Versatility • Several impossibility results • E.g., Leader Election and Token Circulation in Anonymous Networks • The stabilization time usually depends on global parameters (diameter, size of the network …) ICDCN, 04/01/2013, Mumbia

  13. When a few number of faults hit the system • Self-Stabilization: Ω(D) rounds ICDCN, 04/01/2013, Mumbia

  14. When a few number of faults hit the system • Self-Stabilization: Ω(D) rounds • Stronger forms: • Fault Containment [Ghosh et al, Dist Comp 2007] • k-adaptive Self-Stabilization [Burman et al, OPODIS’05] • Weakened forms: • k-stabilization [Beauquier et al, PODC’98] ICDCN, 04/01/2013, Mumbia

  15. When a few number of faults hit the system • Self-Stabilization: Ω(D) rounds • Stronger forms: • Fault Containment [Ghosh et al, Dist Comp 2007] • k-adaptive Self-Stabilization [Burman et al, OPODIS’05] • Weakened forms: • k-stabilization [Beauquier et al, PODC’98] ICDCN, 04/01/2013, Mumbia

  16. Fault-Containment • Pros • Self-stabilizing • If f ≤ k faults, stabilization time in O(f) rounds • Containment radius • Fault gap is small • Cons (currently) • k=1, or • Surrounded by a majority of correct processes, or • Synchronous setting, or • Probabilistic recovery ICDCN, 04/01/2013, Mumbia

  17. Fault gap • The minimum time between consecutive faulty transitions to have O(f) recovery time ≥ Fault gap Illegitimate O(f) Legitimate ICDCN, 04/01/2013, Mumbia

  18. Fault gap • The minimum time between consecutive faulty transitions to have O(f) recovery time < fault gap Illegitimate >Ω(D) Legitimate ICDCN, 04/01/2013, Mumbia

  19. Time-Adaptive Self-stabilization • Self-Stabilization • If the hamming distance to a legitimate configuration is f ≤ k, i.e., f ≤ k faults occurs simultaneous (Static faults), • “output” stabilization in O(f) rounds ICDCN, 04/01/2013, Mumbia

  20. Output vs. State Stabilization Illegitimate O(f) Correct Output >Ω(D) Legitimate f ≤ k faults ICDCN, 04/01/2013, Mumbia

  21. Output vs. State Stabilization Illegitimate O(f) Correct Output >Ω(D) Legitimate The fault gap depends on global parameters f ≤ k faults ICDCN, 04/01/2013, Mumbia

  22. k-Stabilization (first definition) If the hamming distance to a legitimate configuration is f ≤ k, i.e., f ≤ k faults occurs simultaneous, the system eventually recovers Otherwise no guarantee ICDCN, 04/01/2013, Mumbia

  23. k-Stabilization (first definition) • Pros • Can solve more problems than self-stabilization • Usually, only-k-dependent stabilization time • Usually, only-k-dependent fault gap • Cons • Not self-stabilizing • Static faults:f ≤ k faults should occur in a single transition ICDCN, 04/01/2013, Mumbia

  24. Our definition of k-stabilization • Faulty transition = one process state corruption • Dynamic faults: • if f ≤ k faulty transitions occur in an arbitrary manner • The system eventually recovers ICDCN, 04/01/2013, Mumbia

  25. Our definition of k-stabilization Illegitimate Legitimate 1fault 1fault 1fault f ≤ k faults ICDCN, 04/01/2013, Mumbia

  26. Our contribution • Leader recovery protocol • On an anonymous (yet oriented) ring • Asynchronous atomic read/write • k-stabilizing if n ≥ 18k + 1 • Stabilization time O(k2) rounds • Log(k) bits per process • This problem is unsolvable in self-stabilizing setting ICDCN, 04/01/2013, Mumbia

  27. Our contribution The system stars in a legitimate configuration where one process is elected ICDCN, 04/01/2013, Mumbia

  28. Our contribution Some faulty transitions occurs in an arbitrary manner ICDCN, 04/01/2013, Mumbia

  29. Our contribution Some faulty transitions occurs in an arbitrary manner Fault propagation ICDCN, 04/01/2013, Mumbia

  30. Our contribution Some faulty transitions occurs in an arbitrary manner Fault propagation ICDCN, 04/01/2013, Mumbia

  31. Our contribution If n ≥ 18k + 1, the system recovers the same leader in O(k2) rounds ICDCN, 04/01/2013, Mumbia

  32. Our contribution If n ≥ 18k + 1, the system recovers the same leader in O(k2) rounds ICDCN, 04/01/2013, Mumbia

  33. Our contribution If n ≥ 18k + 1, the system recovers the same leader in O(k2) rounds ICDCN, 04/01/2013, Mumbia

  34. Our contribution If n ≥ 18k + 1, the system recovers the same leader in O(k2) rounds ICDCN, 04/01/2013, Mumbia

  35. Our contribution If n ≥ 18k + 1, the system recovers the same leader in O(k2) rounds ICDCN, 04/01/2013, Mumbia

  36. Fault gap 0 O(k2) rounds 0 Illegitimate Legitimate f ≤ k faulty transition f ≤ k faulty transitions ICDCN, 04/01/2013, Mumbia

  37. Main ideas of the algorithm ICDCN, 04/01/2013, Mumbia

  38. Vote = Relative Address ∈{-3k..3k}∪{⊥} 0 -1 1 -2 3k 2 -3 3 Interval of relevance: 6+1 votes ⊥ ⊥ ⊥ ICDCN, 04/01/2013, Mumbia

  39. After k faults 0 -1 1 -2 2 -3 3 ⊥ ⊥ ⊥ ICDCN, 04/01/2013, Mumbia

  40. After k faults 0 -1 1 -2 0 -3 3 ⊥ ⊥ ⊥ ICDCN, 04/01/2013, Mumbia

  41. After k faults 1 At most 3k processes change their votes 0 1 -2 0 -3 3 ⊥ ⊥ ⊥ ICDCN, 04/01/2013, Mumbia

  42. After k faults 1 At most 3k processes change their votes 0 1 -2 0 -3 3 Always a majority of votes for the previous leader ⊥ ⊥ ⊥ ICDCN, 04/01/2013, Mumbia

  43. Rumors 1 Vote 1 Rumor In a legitimate state, Vote = Rumor, for all process Main idea: Vote: hard to change Rumor: easy to change ICDCN, 04/01/2013, Mumbia

  44. Rumors 1 Vote 2 • If Rumor ≠ Vote • If Rumor ≠ ⊥ • Candidate ← Rumor • Else • Candidate ← Vote • Initiate Query(Candidate) Rumor ICDCN, 04/01/2013, Mumbia

  45. Rumors 1 Vote 2 Query(Candidate) traverses the interval of relevance of the candidate (6k+1 processes), and Count the votes for the candidate Rumor ICDCN, 04/01/2013, Mumbia

  46. Query Return • If at least 3k+1 votes for the Candidate • If Rumor ≠ ⊥ ≠ Candidate • Initiate a Denial of rumor in its interval of relevance • Vote←Candidate • Rumor←Candidate • Else • If Rumor = Candidate, then Rumor←⊥ • Initiate a Denial of Candidate in its interval of relevance • If Vote = Candidate, then Vote←⊥ ICDCN, 04/01/2013, Mumbia

  47. Query Tracks ICDCN, 04/01/2013, Mumbia

  48. Other tracks • Denial (to kill a rumor) • To manage lost queries • Probe wave • Report (see the paper) ICDCN, 04/01/2013, Mumbia

  49. Deadlock Prevention • Each two neighboring processes share a resource • Think of chopstick between 2 philosophers ICDCN, 04/01/2013, Mumbia

  50. Deadlock Prevention • Each two neighboring processes share a resource • Think of chopstick between 2 philosophers • Only a process that holds both its left and right resources can initiate a query ICDCN, 04/01/2013, Mumbia

More Related