1 / 109

Distributed Error- Confinement

Distributed Error- Confinement. Shay Kutten (Technion) with Yossi Azar (Tel Aviv U.) Boaz Patt-Shamir (Tel Aviv U.). Talk Overview. (1) (Confinement in) the context of self stabilization (2) What is error confinement?

galvin
Download Presentation

Distributed Error- Confinement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Error-Confinement Shay Kutten (Technion) with Yossi Azar (Tel Aviv U.) Boaz Patt-Shamir (Tel Aviv U.)

  2. Talk Overview (1) (Confinement in) the context of self stabilization (2) What is error confinement? (3) The new “agility” measure for fault tolerance (4) The new “core- bootstrapping” idea for algorithm. (5) Optimization question and answer for “core” construction. (6) Additional results: practical considerations, building blocks, lower bound (7) Generalizations, open problems

  3. “Self Stabilization” versus Error Confinement - Error confinement can be studied in the context of any faults model - We study error confinement in the context of “self stabilization” (explained below) since if we manage to handle a sever kind of faults, handling other faults may be easier.

  4. text text Common model for distributed algorithms X=3 A B E C D state A, B, … Unique node Ids. -No shared memory. -Time complexity: sending a message over a link= time unit (at most, for asynchronous net.) name Node Link weight Message

  5. “Self Stabilization” - Node’s state:value of all its variable - Global state: states of all nodes - Legal states:set of global states (those desired by the algorithm designers) - Stabilization: legality: starting fromany global state, eventually the state is legal closure: starting from a legal global state, no illegal state is reached (except by faults)

  6. “Self Stabilization” - Node’s state:value of all its variable - Global state: states of all nodes - Legal states:set of global states (those desired by the algorithm designers) - Stabilization: legality: starting fromany global state, eventually the state is legal closure: starting from a legal global state, no illegal state is reached (except by faults) A “fault” means starting in an illegal state. Only the state may be faulty, not the program!

  7. A B E C D Self Stabilization example: Token passing Legality: - Exactly ONE node has the token token

  8. A B E C D Self Stabilization example: Token passing Legality: - Exactly ONE node has the token - The token circulates by messages token

  9. A B E C D Self Stabilization example: Token passing Legality: - Exactly ONE node has the token - The token circulates token

  10. A B E C D token Self Stabilization example: Token passing Legality: - Exactly ONE node has the token - The token circulates

  11. A B E C D Self Stabilization example: Token passing Legality: - Exactly ONE node has the token - The token circulates token

  12. A B E C D Self Stabilization Problem Example: Token passing Legality: - Exactly ONE node has the token - The token circulates token token fault A fault brings the system to an illegal global state

  13. Talk Overview (1) Confinement in the context of self stabilization (2) What is error confinement? (3) The new “agility” measure for fault tolerance. (4) The new “core- bootstrapping” idea for algorithm. (5) Optimization question and answer for “core” construction. (6) Additional results: practical considerations, building blocks, lower bound (7) Generalizations, open problems

  14. C S A B Motivation: “error propagation” (example) (1) Assume no fault: My distance to C via S: 7+4=11 Message from S to A: distance 7toC 7 4 C Traffic to C Internet routing: Node A compute shortest path to C based on messages from S.

  15. C distance 0toC Motivation: “error propagation” (example) (2) with fault (at B): My distance to C via S: 7+4=11 Message from S to A: C distance 7toC 7 S A 2 4 B Traffic to C State corrupting fault (adversary modifies data memory)

  16. distance 0toC Recall: state corrupting fault (self stabilization): Not malicious! Just a one time change of memory content. C 7 S A 2 4 B State corrupting fault (adversary modifies data memory)

  17. C Motivation: “error propagation” (example) (2) With fault (at B): My distance to C via S: 7+4=11 Message from S to A: C distance 7toC 7 S A 4 2 B Traffic to C distance 0toC fault

  18. C Motivation: “error propagation” (example) (3) B’s fault propagated to A My distance to C via B: 2+0=2 Message from S to A: C distance 7toC 7 S A 4 2 B Traffic to C distance 0toC fault

  19. C C Motivation: “error propagation” (example) B’s fault propagated to A My distance to C via B: 2+0=2 Message from S to A: C distance 7toC 7 S A 4 2 B (4) Traffic to C is sent the wrong way as a result of the fault propagation distance 0toC fault

  20. This is, actually, how the Internet (than Called “ARPANET”) in 1980 S C crashed D C S D A C B D I have distance 0to everybody fault

  21. C I do not believe you! “Error confinement”: non faulty node A outputs only correctoutput(or nooutputat all) Sounds impossible? S A Output (to routing:) My distance to C via S: 7+4=11 B distance 0toC fault

  22. Error Confinement (Formally) • : problem specification, P: protocol. • P solves  with error confinement if for any execution of P with behavior  (possibly containing a state corrupting fault), there exists a behavior ’ & for all non-faulty nodes v: ’v= v

  23. Error Confinement (Formally) • : problem specification, P: protocol. • P solves  with error confinement if for any execution of P with behavior  (possibly containing a state corrupting fault), there exists a behavior ’ & for all non-faulty nodes v: ’v= v (behavior- ignoring time)

  24. Error Confinement (Formally) • : problem specification, P: protocol. • P solves  with error confinement if for any execution of P with behavior  (possibly containing a state corrupting fault), there exists a behavior ’ & for all non-faulty nodes v: ’v= v • (“stabilization” deals also with faultynodes)

  25. Talk Overview (1) Confinement in the context of self stabilization (2) What is error confinement? (3)The new “agility” measure for fault tolerance. (4) The new “core- bootstrapping” idea for algorithm. (5) Optimization question and answer for “core” construction. (6) Additional results: practical considerations, building blocks, lower bound (7) Generalizations, open problems

  26. t 0 t 0 Introducing a new measure of fault resilience: The resilience of a protocol is smaller at first t Environment (e.g. user) 2 time t 1 Input is given to S at time C S A B D

  27. time The resilience of a protocol is smaller at first (cont.) Environment (e.g. user) gives input to S at time t t 2 0 If adversary changes the state of S at timetf shortly after the input t 1 t f C t S A 0 B D

  28. t 2 t 1 The resilience of a protocol is smaller at first (cont.) time Environment (e.g. user) gives Input to S at time t 0 If adversary changes the state of S at time tf shortly after the input then the input is lost forever t f C t S A 0 B D

  29. t 2 t 1 t f The resilience of a protocol grows with time time However, a fault, even in S, can be tolerated if it waits until after S distributed the input value C S A B D C t S A 0 B D input

  30. t 2 t 1 t t f f The resilience of a protocol grows with time (cont.) time However, a fault, even in S, can be tolerated if it waits until after S distributed the input value distribution C S A B D C t S A 0 B D input

  31. t 2 t 1 tf tf The resilience of a protocol grows with time time A fault even in S can be tolerated if it “waits” until after S distributed the input value distribution C S A B D C t S A 0 B D input

  32. t 2 t 1 t t f f The resilience of a protocol grows with time time A fault even in S can be tolerated if it “waits” until after S distributed the input value distribution C A S B D C t S A 0 B D input

  33. t t f f The resilience of a protocol grows with time time To destroy the replicated value the adversary needs to hit more nodes at > > t0 t1 tf t0 C S t1 A B D C t0 S A B D input

  34. t t 3 3 If no faults occurred by some later , then the input is replicated even further The resilience continues to grows with time time C S S A B D C t S A 2 B D C t S A 1 B D

  35. tf The resilience continues to grows with time time C S A t B D 3 C t S A 2 B D The later the faults, the more faults can be tolerated C t S A 1 B D

  36. Time Space Cone time C S S A t B D 3 C t S A 2 B D The later the faults, the more faults can be tolerated if the protocol is designed to be robust C t S A 1 B D

  37. “Narrow” cone a LESS fault tolerant algorithm time C S S A t B D 3 C t S A 2 B D Slower replication less nodes offer help C t S A 1 B D

  38. A “Wider” cone a more fault tolerant algorithm time C S S A t B D 3 C t S A 2 B D Replication to more nodes faster C t S A 1 B D

  39. So, a recovery of corrupted values is theoretically possible, for an adversary that is constrained according to a space-time-cone, but what is the algorithm that does the recovery? time S

  40. Constraining faults: Agility • c-constrained environment: environment generating faultstf time units after the input, (c 1), only in: • with agilityc: Broadcast algorithm that guarantees error confinement against c-constrained environments. minority of· |Balls(c·tf)| nodes. algorithm V V c·tf S Balls

  41. Algorithm’s “agility” measures the rate the constraint on the adversary can be lifted C S S D C S time D Agility: S

  42. Talk Overview (1) Confinement in the context of self stabilization (2) What is error confinement? (3) The new “agility” measure for fault tolerance. (4) The new “core- bootstrapping” idea for algorithm. (5) Optimization question and answer for “core” construction. (6) Additional results: practical considerations, building blocks, lower bound (7) Generalizations, open problems

  43. The message resides at some nodes we term “core”

  44. A node can join the core when it “made sure” it heard the votes of all core nodes

  45. A node can join the core when it “made sure” it heard the votes of all core nodes

  46. A node can join the core when it “made sure” it heard the votes of all core nodes

  47. A node can join the core when it “made sure” it heard the votes of all core nodes

  48. and even the fault can be corrected

  49. and even the fault can be corrected

More Related