Download Presentation
## Motivation: Reliable Software

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**On Building Reliable Concurrent SystemsVijay K.**GargProfessor,Department of ECE and CSDirector, PDSLThe University of Texas at AustinAustin, TX 78712**Motivation: Reliable Software**• Multithreaded Distributed programs are prone to errors. • Concurrency, nondeterminism, process and channel failures Techniques to ensure program correctness • Before program development: Model Checking • During: Testing and Debugging • After: Software Fault-Tolerance**Paradise Approach**Key Abstraction: Global Properties • Model Checking and Verification: Check global properties against model of the program (Promela) • Testing and Debugging: Global breakpoints, Trace analysis • Software Fault-Tolerance: Monitoring for global properties, Controlled Reexecution**Talk Outline**• Motivation • Monitoring Distributed Systems • Clock : Tracking Dependency • Camera : Global Snapshot (Checkpoint) • Sensor : Detecting Global Properties • Slicer : Computation Slicing • Supervisor: Controlling Execution • Other Projects at PDSL**Observe**Program Monitor Slicer Predicate Paradise Environment Control**Trace Model: Total Order vs Partial Order**• Total order: interleaving of events in a trace • Partial order: Lamport’s happened-before model Successful Trace ¬CS1 ¬CS2 CS1 CS2 Specification: CS1 ΛCS2 f1 f2 e2 e1 Partial Order Trace Faulty Trace ¬ CS1 CS1 P1 e1 e2 ¬CS1 CS2 CS1 ¬CS2 ¬CS2 CS2 P2 f1 e2 f2 e1 f2 f1**Tracking Dependency**computation: a set of events ordered by “happened before” relation • Problem: Timestamp events to answer • e happened before f ? • e concurrent with f ?**Clocks in a Distributed System**Result:s happened before t i the vector at s is less than the vector at t. Vector Clocks [Fidge 89, Mattern 89] (1,0,0) (2,1,0) (3,1,0) P1 (0,1,0) (0,2,0) P2 (0,0,1) (0,0,2) (2,1,3) P3**a**b c d e f g h Dynamic Chain Clocks • Problem with vector clocks: scalability, dynamic process structure • Idea: Computing the “chains” in an online fashion [Aggarwal and Garg PODC 05] for relevant events a P1 P2 P3 P4 b c e d h f g The relevant subcomputation A computation with 4 processes**Experimental Results**Simulation of a computation with 1% relevant events Measured • number of components vs number of threads • total time overhead vs number of threads**Talk Outline**• Motivation and Overview • Instrumentation • Clock : Tracking Dependency • Property Checking • Camera : Global Snapshot (Checkpoint) • Sensor : Detecting Global Properties • Slicer : Computation Slicing**Global Snapshot**Problem: Compute a global snapshot/checkpoint of the system (state of the processes and the channels) Motivation • Checkpointing for fault tolerance • Distributed debugging • Detecting stable predicates**G2**G1 P1 P2 P3 Key Difficulties: Taking care of messages Two sites A and B with $400 each. Site A sends a message with $ 100 to site B. • Problem 1: Inconsistent State checkpoint (A) before message sent - $400 message received before checkpoint (B) - $500 • Problem 2: Messages in transit message sent before checkpoint (A) - $300 checkpoint (B) before message received - $400**Current Algorithms**Key idea: white/red processes and messages • A process must be red to act on a red message • Record white messages received by red processes • [Chandy and Lamport 85] A "marker" message on every channel • [Mattern 89, SBF+ 04] Number of white messages sent on every channel O(N2) messages for completely connected topology**Our Algorithms**joint work with Rahul Garg, and Yogish Sabharwal, IBM IRL. (ACM International Conference on Supercomputing 2006) • Lower Bound:Ω(N log w) messages • Implementation: Blue Gene/L with MPI. • w: average number of messages in transit**G1**G2 P1 P2 Critical section Global Property Detection Predicate: A global condition expressed using variables on processes • e.g., more than one process is in critical section, there is no token in the system Problem: find a global state that satisfies the given predicate**The Main Difficulty in Partial Order**Algorithm for general predicate [Cooper and Marzullo 91] {e2, e1, f2, f1, ┴ e1 e2 P1 {e2, e1, f1, ┴} {e1, f2, f1, ┴} {e2, e1, ┴} ┴ {e1, f1, ┴} T P2 {f1, ┴} {e1, ┴} f1 f2 {┴} Too many global states : A computation may contain as many as O(kn) global states • k: maximum number of events on a process • n: number of processes**Efficient Predicate Detection for Special Cases**• stable predicate:[Chandy and Lamport 85] • once the predicate becomes true, it stays true e.g., deadlock • unstable predicate: • observer independent predicate[Charron-Bost et al 95] occurs in one interleaving occurs in all interleavings e.g., any disjunction of local predicate • linear predicate[Chase and Garg 95] e.g., conjunctive predicates such as there is no leader in the system • relational predicate: x1 + x2 +…+ xn ≥ k [Chase and Garg 95] e.g., violation of k-mutual exclusion**W**X W X W Å X Linear Predicates The set of consistent cuts that satisfy a linear predicate is closed under intersection [Chase and Garg 95]. Examples: • conjunctive predicates: critical1and critical2 • channel predicates: “all channels are empty” , “there are exactly k messages in the channel from process P to Q” • some relational predicates: x1 + x2· k, when xi is mon. non-decreasing**Conjunctive Predicates**A predicate that can be expressed as l1Λl2Λ … ln , where liis local to Pi. Detect errors that may be hidden in some run due to race conditions. Examples: • mutual exclusion problem: (P1 in CS) and (P2 in CS) • missing primary: (P1 is secondary) and (P2 is secondary) and (P3 is secondary) Importance: Sufficient for detection of any boolean expression of local predicates**Conjunctive Predicates: Centralized Algorithm**(l1Λl2Λ … ln) is true iff there exist si in Pi such that liis true in state si, and si and sj are incomparable for distinct i,j.**Algorithms for Conjunctive Predicates**Centralized Algorithm [Garg and Waldecker 92] Each non-checker process maintains its local vector and sends to the checker process the chain clock whenever • local predicate is true • at most once in each message interval. Time complexity: Checker requires at most O(n2m) comparisons. • token based algorithm [Garg and Chase 95] • completely distributed algorithm [Garg and Chase 95] • keeping queues shorter [Chiou and Korfhage 95] • avoiding control messages [Hurfin, Mizuno, Raynal, Singhal 96]**Other Special Classes of Predicates**• Relational Predicates • Let xi: number of token at Pi • Σxi < k: loss of tokens • Algorithms: max-flow techniques [Groselj 93, Chase and Garg 95, Wu and Chen 98] • Dilworth's partition [Tomlinson and Garg 96]**Relational Predicates**Let xi 0 be a variable at Pi . Predicates of the form [Chase and Garg 95] xi k Algorithm: Consistent cut with minimum value = min cut in the flow graph**Predicate Detection in General**{e2, e1, f2, f1, ┴ Explore the state-space (need to examine all global states) without constructing the graph • breadth first manner [Cooper and Marzullo 91] • depth first manner [Alagar and Venkatesan 94] • lexical order [Garg 03] {e2, e1, f1, ┴} {e1, f2, f1, ┴} {e2, e1, ┴} {e1, f1, ┴} {f1, ┴} {e1, ┴} {┴}**Talk Outline**• Motivation and Overview • Instrumentation • Clock : Tracking Dependency • Property Checking • Camera : Global Snapshot (Checkpoint) • Sensor : Detecting Global Properties • Slicer : Computation Slicing • Supervisor: Controlling Execution**The Main Idea of Computation Slicing**state explosion Partial order trace keep all red global states slicing slice**How does Computation Slicing Help?**Partial order trace check b1Λb2 satisfy b1 retain all global states satisfying b1 slicing for b1 slice check b2**1**4 4 2 2 0 1 2 -1 1 0 3 d a b c e f x h w v u g x P1 {a,e,f,u,v} {b} y P2 {w} {g} z P3 Example • Detect predicate (x*y + z < 5) Λ (x ≥1) Λ (z ≤3) Slice with respect to (x ≥1) Λ (z ≤3) Computation**Slicer**Slice slice: a sub-trace such that: • it contains all consistent cuts of the trace satisfying the given predicate • it contains the least number of consistent cuts [Garg and Mittal 01, Mittal and Garg 01] predicate slice trace**Results**Efficient polynomial-time algorithms for computing the slice for: • linear predicates: [Garg and Mittal 01] • time-complexity: O(n2m) • general predicate: • Theorem: Given a computation, if a predicate b can be detected efficiently then the slice for b can also be computed efficiently. [Mittal,Sen and Garg 03] • combining slices: Boolean operators • temporal logic operators: EF, AG, EG • approximate slice: For arbitrary boolean expression • n: number of processes • m: number of events**POTA Architecture [Sen and Garg 04]**Predicate (Specification) Program Analyzer yes/ witness Slice Instrumentor Slicer Predicate Detector no/ counter example Trace Slice Instrumented Program Promela yes Execute Program Trace Translator Execute SPIN no/ counter example Specification**Experiments: Dining Philosophers Trace Verification**• POTA: Partial Order Trace Analyzer (based on slicing) [Sen and Garg 03] • SPIN: A widely used model checking tool [Holzmann 97] • SPIN: 250 seconds for n = 6, runs out of memory for n > 6. • POTA: can handle n= 200. Used 400 seconds. Predicate: Two neighboring dining philosophers do not eat concurrently**Supervisor: Motivation for Control**• maintain global invariants or proper order of events • Examples: Distributed Debugging • ensure that busy1 V busy2 is always true • ensure that m1 is delivered before m2 • Fault tolerance • On fault, rollback and execute under control**faulty state**P1 P2 P3 restored state Rollback Recovery for Software Faults Re-execution Problem: • To re-execute in order to avoid a recurrence of a previously detected failure • Progressive Retry [Wang et al 97] • Controlled Re-execution [Tarafdar and Garg 98]**G1**a c P1 d b P2 Critical section Controlled Re-execution Add the synchronization necessary to maintain safety property • e.g., mutual exclusion**Results**Efficient algorithms for computing the synchronization for: • Locks[Tarafdar, Garg DISC98] • O(nm) algorithm for various types of locks • disjunctive predicate[Mittal, Garg 00] e.g., (n-1)-mutual exclusion • time-complexity:} O(m2) • minimizes the number of synchronization arrows • region predicate[Mittal, Garg PODC 00] e.g., virtual clocks of processes are “approximately” synchronized • time-complexity:O(nm2) • maximizes the concurrency in the controlled computation n: number of processes, m: number of events**Conclusions**• Efficient algorithms possible for monitoring global properties • Observation and Control: a powerful abstraction • Current execution engines are designed for performance rather than fault-tolerance**Other Research Projects**• Distributed simulation: GVT algorithms, fault-tolerance • Recovery Schemes: Optimistic Message Logging, fault-tolerance without replication • Model Checking: Partial Order Methods • Formal Methods: Petri Nets, Lattice Theory, Max Plus Algebra**Questions**• ?