1 / 34

Enhancing The Fault-Tolerance of Nonmasking Programs

Enhancing The Fault-Tolerance of Nonmasking Programs. Sandeep S. Kulkarni and Ali Ebnenasir Software Engineering and Network Systems Laboratory Computer Science and Engineering Department Michigan State University. Acknowledgement. This work is partially sponsored by: NSF, DARPA NEST,

Download Presentation

Enhancing The Fault-Tolerance of Nonmasking Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enhancing The Fault-Tolerance of Nonmasking Programs Sandeep S. Kulkarni and Ali Ebnenasir Software Engineering and Network Systems Laboratory Computer Science and Engineering Department Michigan State University

  2. Acknowledgement • This work is partially sponsored by: • NSF, • DARPA NEST, • ONR URI, and • Michigan State University

  3. Motivation • Programs are subject to unanticipated faults • Encounter new classes of faults, add corresponding fault-tolerance • How to add fault-tolerance? • Develop from scratch (expensive approach) • Incrementally add fault-tolerance • Reuse of the behaviors of the fault-intolerant program • Potential to preserve properties that are hard to specify (e.g., efficiency) • How to ensure correctness? • After the fact verification • Automatic addition of fault-tolerance (correct by construction)

  4. Motivation (Continued) • Problem: Complexity of automatic addition • Automatic addition of fault-tolerance to distributed programs is NP-hard [FTRTFT00], [ICDCS02] • How do we deal with this complexity? • Develop heuristics • Identifying the boundary of polynomial-time addition • Step-wise addition (weaker forms of fault-tolerance) • The goal of this paper • Enhance the fault-tolerance of nonmasking programs • Partial automation of fault-tolerance programs

  5. Outline • Preliminary Concepts • Enhancement Problem • Enhancement in High Atomicity Model • Enhancement for Distributed Programs • Example: Byzantine Agreement Program • Conclusion and Future Work

  6. f Program T S p/f p Fault Preliminary Concepts:Programs and Faults • Finite State space Sp • Invariant S, fault-span T Sp • Program p, Fault f, Safety { (s0, s1) | (s0, s1)  Sp Sp } • Fault-tolerance • Failsafe, Nonmasking, Masking Sp

  7. Step-Wise Addition Masking fault-tolerant This paper Failsafe fault-tolerant Nonmasking fault-tolerant [FTRTFT00] [ICDCS02] Intolerant Program

  8. S' = T'  S f T T' Enhancement Problem Nonmasking program p Masking program p' Synthesis Algorithm Specification Spec Invariant S' Invariant S Faults f Fault-span T' Requirements: Only fault-tolerance is added; no new functional behavior is added Sp S

  9. Enhancement in High Atomicity Model

  10. f ms ms: States from where safety will be violated by fault transitions Enhancement in High Atomicity Model • High Atomicity Model • Each process can read/write all program variables T S

  11. S' T' Enhancement in High Atomicity Model – (Continued) • Find a state predicate T' such that: • T' is closed in the computations of the program in the presence of faults • The specification is satisfied from every state of T' (i.e., no deadlocks) • Construct p' such that for every (s0, s1)  p': • (s0, s1) does not violate safety • s0  T' s1  T' T S ms • Deadlock States appear due to removing some transitions

  12. Enhancement Addition Fault-intolerant program Masking program Automatic: Enhancement Manual Nonmasking program [FTRTFT00] HighAtomicityEnhancement (p,f: transitions, T:StatePredicate, specification spec) { Calculate ms; Calculate mt; T' = ConstructFaultSpan( ); if ( T' = {} ) declare no masking f-tolerant program exists;exit; elseConstruct the transitions of p'; } AddMasking (p,f: transitions, S:StatePredicate, specification spec) { 1. Calculate ms; Calculate mt; 2. . . . 3. . . . 4. repeat 4-1) . . . 4-2) . . . 4-3) T := ConstructFaultSpan( ); 4-4) . . . 4-5) if (S = {} \/ T = {}) declare no masking f-tolerant program exists; exit; until (ExitConditionHolds); 5. Remove cycles in outside the invariant in T; 6. Construct the transitions of p'; } Partial Automation

  13. Enhancement For Distributed Programs

  14. a=1,b=0 a=0,b=0 Only if we include the transition a=1,b=1 a=0,b=1 Difficulties with Distribution • Read/Write restrictions (low atomicity model). • A program p • Two processes j, k • Two Boolean variables a and b • Process j cannot read b • Can we include the following transition? Groups of transitions (instead of individual transitions) must be chosen.

  15. Enhancement of Nonmasking Distributed Programs Start Calculate T'high Calculate S'init = S'low Calculate Sreachable from S'low by fault/program transitions Search in (T'high– S'low) Under distribution restrictions Calculate Srecovery from where recovery is possible to S'low S'low = S'low Srecovery T' = S'low Calculate p' transitions No Srecovery = {} Yes Declare failure No Yes Sreachable = {} Stop

  16. S' high = S T' high T' high A High Atomicity Fault-Span • The largest possible domain for the states that can be included in the fault-span of the distributed program ms T S

  17. T' high S0 S'init S'high The Initial Low Atomicity Invariant • Remove states from where an outgoing transition crosses the boundary of S'high • E.g., s0 • Removal is a non-deterministic choice, where we have more than one state to remove

  18. S2 S3 T'high S1 f S'low S0 S1 Sreachable S2 S3 Single-Step Reachable States • Reachable by a fault/program transition (denoted Sreachable) S'init

  19. S2 S3 T'high S0 Srecovery S2 S3 Single-Step Recovery States • Safer recovery in a single step (denoted Srecovery) • Goal: infinite computations are possible from all states in S'low • s0 represents a typical recovery state S'init S'low

  20. Enhancement of Nonmasking Distributed Programs Start Calculate T'high Calculate S'init = S'low Calculate Sreachable from S'low by fault/program transitions Calculate Srecovery from where recovery is possible to S'low S'low = S'low Srecovery T' = S'low Calculate p' transitions No Srecovery = {} Yes Declare failure No Yes Sreachable = {} Stop

  21. Example: Byzantine Agreement • Why this example? • Was used to illustrate the addition of masking fault-tolerance in [SRDS01] • Manual enhancement has been already applied [TSE98] • Processes: General, g, and three non-generals j, k, and l • Variables • d.g : {0, 1} • d.j, d.k, d.l : {0, 1, ┴ } • b.g, b.j, b.k, b.l : {0, 1} • f.j, f.k, f.l : {0, 1} • Safety Specification: • Agreement: No two non-Byzantine non-generals can finalize with different decisions • Validity: If g is not Byzantine, no process can finalize with different decision with respect to g • A finalized process should not execute any transition g j k l

  22. Example: Byzantine Agreement • Read/Write restrictions • Readable variables for process j • b.j, d.j, f.j, d.g, d.k, d.l • Process j can write d.j, f.j • Disjkstra’s guarded commands • Guard Statement • { (s0, s1) | Guard holds at s0 and atomic execution of Statement yieldss1 } • Nonmasking fault-tolerant program transitions • d.j = ┴ f.j = 0  d.j := d.g • d.j ≠ ┴  f.j = 0  f.j := 1 • d.j = 1  d.k = 0  d.l = 0  d.j := 0 • d.j = 0  d.k = 1  d.l = 1  d.j := 1 • Fault transitions • ¬b.g ¬b.j ¬b.k ¬b.l  b.j := true • b.j  d.j :=0|1

  23. Example: Byzantine Agreement (Continued) • Why enhancement is easier? d.j = d.k =┴, d.g = 1, d.l = 1, f.l = 0 S0 A good transition inside the invariant Premature finalization d.j = d.k =┴, d.g = 1, d.l = 1, f.l = 1 S1 Fault transition d.j = d.k =┴, d.g = 0, d.l = 1, f.l = 1 S2 b.g = 1 d.j = d.k =┴, d.g = 0, d.l = 1, f.l = 1 S3 d.j = d.k =0, d.g = 0, d.l = 1, f.l = 1 A deadlock state S4

  24. Example: Byzantine Agreement (Continued) • Masking fault-tolerant program • High atomicity reasoning • Synthesize a masking program in high atomicity and then refine it to a distributed program d.j = ┴  f.j = 0  d.j := d.g d.j ≠ ┴  f.j = 0  f.j := 1 d.j = 1  d.k = 0  d.l = 0  d.j := 0 d.j = 0  d.k = 1  d.l = 1  d.j := 1  ((d.j = d.k)  (d.j = d.l))  (f.j = 0)  (f.j = 0)

  25. Enhancement vs. Addition • Reuse the computations of the nonmasking program • Reasoning in high atomicity model has the potential to reduce the complexity of addition

  26. Synthesis Framework • Development of a synthesis framework • Developers of fault-tolerance can interactively add fault-tolerance to fault-intolerant programs • Partial automation helps us to reap the benefits of automation as much as possible • Enhancement identifies programs where partial automation is possible • Implementation of enhancement algorithms in the synthesis framework • http://www.cse.msu.edu/~sandeep/software/Code/synthesis-framework/

  27. Conclusion and Future Work • Enhancement simplifies automated design of masking programs • Less asymptotic complexity • Polynomial-time enhancement in the low atomicity model (in the state space of the nonmasking program) • Sound, but not complete • Reasoning in high atomicity simplifies the synthesis of masking distributed programs • Future Work: • A polynomial-time sound and complete enhancement algorithm for a restricted class of programs and specifications

  28. Thank You! Questions?

  29. Example: Triple Modular Redundancy • Processes: Three processes: j, k, and l • Variables and their domains • in.j, in.k, and in.l are Boolean variables • out belongs to { 0, 1, ┴ } • Nonmasking program (+ addition in modulo 3): N1: (out = ┴)  out := in.j N2: (out != ┴) /\ (out != in.j) /\ ((in.j = in.k) \/ (in.j = in.l))  out := in.j • Faults: F: (in.j = in.k) /\ (in.j = in.l)  in.j := 0|1 • Safety specification: • Do not reach states where out is different than the majority of inputs. • out should not be changed after it is assigned a value.

  30. Example: Triple Modular Redundancy • Invariant: S = ((out = ┴) /\ (in.j = in.k = in.k)) \/ (out = in.j = in.k) \/ (out = in.j = in.l) \/ (out = in.k = in.l) • Fault-span: T = ( (in.j = in.k = in.l) => ((out = ┴) \/ (out = in.j = in.k = in.l)) ) • Enhancement algorithm: • Compute ms: ms = { } • Remove bad transitions: {t: t violates safety} and {t: t reaches ms} • Construct a new fault-span T’: T’ = T – { s: (out !=┴) /\ (out is not equal to majority of inputs) } • Masking program: M1: (out = ┴) /\ (in.j = in.k) \/ (in.j = in.l)  out := in.j

  31. Enhancement of Nonmasking Distributed Programs Start Calculate T'high Calculate S'init = S'low Calculate Sreachable from S'low by fault/program transitions Calculate Srecovery from where recovery is possible to S'low S'low = S'low Srecovery No Srecovery = {} Yes Declare failure No Yes Sreachable = {} T' = S'low , calculate p' transitions

  32. Enhancement of Nonmasking Distributed Programs Start Calculate T'high Calculate S'init = S'low Calculate Sreachable from S'low by fault/program transitions Calculate Srecovery from where recovery is possible to S'low S'low = S'low Srecovery No Srecovery = {} Yes Declare failure No Yes Sreachable = {} T' = S'low , calculate p' transitions

  33. Enhancement of Nonmasking Distributed Programs Start Calculate T'high Calculate S'init = S'low S'init = S'low at the first iteration Calculate Sreachable from S'low by fault/program transitions Calculate Srecovery from where recovery is possible to S'low S'low = S'low Srecovery No Srecovery = {} Yes Declare failure No Yes Sreachable = {} T' = S'low , calculate p' transitions

  34. Enhancement of Nonmasking Distributed Programs Start Calculate T'high Calculate S'init = S'low Calculate Sreachable from S'low by fault/program transitions Calculate Srecovery from where recovery is possible to S'low S'low = S'low Srecovery No Srecovery = {} Yes Declare failure No Yes Sreachable = {} T' = S'low , calculate p' transitions

More Related