1 / 40

Motivation: Reliable Software

On Building Reliable Concurrent Systems Vijay K. Garg Professor, Department of ECE and CS Director, PDSL The University of Texas at Austin Austin, TX 78712. Motivation: Reliable Software. Multithreaded Distributed programs are prone to errors.

jaegar
Download Presentation

Motivation: Reliable Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Building Reliable Concurrent SystemsVijay K. GargProfessor,Department of ECE and CSDirector, PDSLThe University of Texas at AustinAustin, TX 78712

  2. Motivation: Reliable Software • Multithreaded Distributed programs are prone to errors. • Concurrency, nondeterminism, process and channel failures Techniques to ensure program correctness • Before program development: Model Checking • During: Testing and Debugging • After: Software Fault-Tolerance

  3. Paradise Approach Key Abstraction: Global Properties • Model Checking and Verification: Check global properties against model of the program (Promela) • Testing and Debugging: Global breakpoints, Trace analysis • Software Fault-Tolerance: Monitoring for global properties, Controlled Reexecution

  4. Talk Outline • Motivation • Monitoring Distributed Systems • Clock : Tracking Dependency • Camera : Global Snapshot (Checkpoint) • Sensor : Detecting Global Properties • Slicer : Computation Slicing • Supervisor: Controlling Execution • Other Projects at PDSL

  5. Observe Program Monitor Slicer Predicate Paradise Environment Control

  6. Trace Model: Total Order vs Partial Order • Total order: interleaving of events in a trace • Partial order: Lamport’s happened-before model Successful Trace ¬CS1 ¬CS2 CS1 CS2 Specification: CS1 ΛCS2 f1 f2 e2 e1 Partial Order Trace Faulty Trace ¬ CS1 CS1 P1 e1 e2 ¬CS1 CS2 CS1 ¬CS2 ¬CS2 CS2  P2 f1 e2 f2 e1 f2 f1

  7. Tracking Dependency computation: a set of events ordered by “happened before” relation • Problem: Timestamp events to answer • e happened before f ? • e concurrent with f ?

  8. Clocks in a Distributed System Result:s happened before t i the vector at s is less than the vector at t. Vector Clocks [Fidge 89, Mattern 89] (1,0,0) (2,1,0) (3,1,0) P1 (0,1,0) (0,2,0) P2 (0,0,1) (0,0,2) (2,1,3) P3

  9. a b c d e f g h Dynamic Chain Clocks • Problem with vector clocks: scalability, dynamic process structure • Idea: Computing the “chains” in an online fashion [Aggarwal and Garg PODC 05] for relevant events a P1 P2 P3 P4 b c e d h f g The relevant subcomputation A computation with 4 processes

  10. Experimental Results Simulation of a computation with 1% relevant events Measured • number of components vs number of threads • total time overhead vs number of threads

  11. Talk Outline • Motivation and Overview • Instrumentation • Clock : Tracking Dependency • Property Checking • Camera : Global Snapshot (Checkpoint) • Sensor : Detecting Global Properties • Slicer : Computation Slicing

  12. Global Snapshot Problem: Compute a global snapshot/checkpoint of the system (state of the processes and the channels) Motivation • Checkpointing for fault tolerance • Distributed debugging • Detecting stable predicates

  13. G2 G1 P1 P2 P3 Key Difficulties: Taking care of messages Two sites A and B with $400 each. Site A sends a message with $ 100 to site B. • Problem 1: Inconsistent State checkpoint (A) before message sent - $400 message received before checkpoint (B) - $500 • Problem 2: Messages in transit message sent before checkpoint (A) - $300 checkpoint (B) before message received - $400

  14. Current Algorithms Key idea: white/red processes and messages • A process must be red to act on a red message • Record white messages received by red processes • [Chandy and Lamport 85] A "marker" message on every channel • [Mattern 89, SBF+ 04] Number of white messages sent on every channel O(N2) messages for completely connected topology

  15. Our Algorithms joint work with Rahul Garg, and Yogish Sabharwal, IBM IRL. (ACM International Conference on Supercomputing 2006) • Lower Bound:Ω(N log w) messages • Implementation: Blue Gene/L with MPI. • w: average number of messages in transit

  16. G1 G2 P1 P2 Critical section Global Property Detection Predicate: A global condition expressed using variables on processes • e.g., more than one process is in critical section, there is no token in the system Problem: find a global state that satisfies the given predicate

  17. The Main Difficulty in Partial Order Algorithm for general predicate [Cooper and Marzullo 91] {e2, e1, f2, f1, ┴ e1 e2 P1 {e2, e1, f1, ┴} {e1, f2, f1, ┴} {e2, e1, ┴} ┴ {e1, f1, ┴} T P2 {f1, ┴} {e1, ┴} f1 f2 {┴} Too many global states : A computation may contain as many as O(kn) global states • k: maximum number of events on a process • n: number of processes

  18. Efficient Predicate Detection for Special Cases • stable predicate:[Chandy and Lamport 85] • once the predicate becomes true, it stays true e.g., deadlock • unstable predicate: • observer independent predicate[Charron-Bost et al 95] occurs in one interleaving  occurs in all interleavings e.g., any disjunction of local predicate • linear predicate[Chase and Garg 95] e.g., conjunctive predicates such as there is no leader in the system • relational predicate: x1 + x2 +…+ xn ≥ k [Chase and Garg 95] e.g., violation of k-mutual exclusion

  19. W X W X W Å X Linear Predicates The set of consistent cuts that satisfy a linear predicate is closed under intersection [Chase and Garg 95]. Examples: • conjunctive predicates: critical1and critical2 • channel predicates: “all channels are empty” , “there are exactly k messages in the channel from process P to Q” • some relational predicates: x1 + x2· k, when xi is mon. non-decreasing

  20. Conjunctive Predicates A predicate that can be expressed as l1Λl2Λ … ln , where liis local to Pi. Detect errors that may be hidden in some run due to race conditions. Examples: • mutual exclusion problem: (P1 in CS) and (P2 in CS) • missing primary: (P1 is secondary) and (P2 is secondary) and (P3 is secondary) Importance: Sufficient for detection of any boolean expression of local predicates

  21. Conjunctive Predicates: Centralized Algorithm (l1Λl2Λ … ln) is true iff there exist si in Pi such that liis true in state si, and si and sj are incomparable for distinct i,j.

  22. Algorithms for Conjunctive Predicates Centralized Algorithm [Garg and Waldecker 92] Each non-checker process maintains its local vector and sends to the checker process the chain clock whenever • local predicate is true • at most once in each message interval. Time complexity: Checker requires at most O(n2m) comparisons. • token based algorithm [Garg and Chase 95] • completely distributed algorithm [Garg and Chase 95] • keeping queues shorter [Chiou and Korfhage 95] • avoiding control messages [Hurfin, Mizuno, Raynal, Singhal 96]

  23. Other Special Classes of Predicates • Relational Predicates • Let xi: number of token at Pi • Σxi < k: loss of tokens • Algorithms: max-flow techniques [Groselj 93, Chase and Garg 95, Wu and Chen 98] • Dilworth's partition [Tomlinson and Garg 96]

  24. Relational Predicates Let xi 0 be a variable at Pi . Predicates of the form [Chase and Garg 95]  xi k Algorithm: Consistent cut with minimum value = min cut in the flow graph

  25. Predicate Detection in General {e2, e1, f2, f1, ┴ Explore the state-space (need to examine all global states) without constructing the graph • breadth first manner [Cooper and Marzullo 91] • depth first manner [Alagar and Venkatesan 94] • lexical order [Garg 03] {e2, e1, f1, ┴} {e1, f2, f1, ┴} {e2, e1, ┴} {e1, f1, ┴} {f1, ┴} {e1, ┴} {┴}

  26. Talk Outline • Motivation and Overview • Instrumentation • Clock : Tracking Dependency • Property Checking • Camera : Global Snapshot (Checkpoint) • Sensor : Detecting Global Properties • Slicer : Computation Slicing • Supervisor: Controlling Execution

  27. The Main Idea of Computation Slicing state explosion Partial order trace keep all red global states slicing slice

  28. How does Computation Slicing Help? Partial order trace check b1Λb2 satisfy b1 retain all global states satisfying b1 slicing for b1 slice check b2

  29. 1 4 4 2 2 0 1 2 -1 1 0 3 d a b c e f x h w v u g x P1 {a,e,f,u,v} {b} y P2 {w} {g} z P3 Example • Detect predicate (x*y + z < 5) Λ (x ≥1) Λ (z ≤3) Slice with respect to (x ≥1) Λ (z ≤3) Computation

  30. Slicer Slice slice: a sub-trace such that: • it contains all consistent cuts of the trace satisfying the given predicate • it contains the least number of consistent cuts [Garg and Mittal 01, Mittal and Garg 01] predicate slice trace

  31. Results Efficient polynomial-time algorithms for computing the slice for: • linear predicates: [Garg and Mittal 01] • time-complexity: O(n2m) • general predicate: • Theorem: Given a computation, if a predicate b can be detected efficiently then the slice for b can also be computed efficiently. [Mittal,Sen and Garg 03] • combining slices: Boolean operators • temporal logic operators: EF, AG, EG • approximate slice: For arbitrary boolean expression • n: number of processes • m: number of events

  32. POTA Architecture [Sen and Garg 04] Predicate (Specification) Program Analyzer yes/ witness Slice Instrumentor Slicer Predicate Detector no/ counter example Trace Slice Instrumented Program Promela yes Execute Program Trace Translator Execute SPIN no/ counter example Specification

  33. Experiments: Dining Philosophers Trace Verification • POTA: Partial Order Trace Analyzer (based on slicing) [Sen and Garg 03] • SPIN: A widely used model checking tool [Holzmann 97] • SPIN: 250 seconds for n = 6, runs out of memory for n > 6. • POTA: can handle n= 200. Used 400 seconds. Predicate: Two neighboring dining philosophers do not eat concurrently

  34. Supervisor: Motivation for Control • maintain global invariants or proper order of events • Examples: Distributed Debugging • ensure that busy1 V busy2 is always true • ensure that m1 is delivered before m2 • Fault tolerance • On fault, rollback and execute under control

  35. faulty state P1 P2 P3 restored state Rollback Recovery for Software Faults Re-execution Problem: • To re-execute in order to avoid a recurrence of a previously detected failure • Progressive Retry [Wang et al 97] • Controlled Re-execution [Tarafdar and Garg 98]

  36. G1 a c P1 d b P2 Critical section Controlled Re-execution Add the synchronization necessary to maintain safety property • e.g., mutual exclusion

  37. Results Efficient algorithms for computing the synchronization for: • Locks[Tarafdar, Garg DISC98] • O(nm) algorithm for various types of locks • disjunctive predicate[Mittal, Garg 00] e.g., (n-1)-mutual exclusion • time-complexity:} O(m2) • minimizes the number of synchronization arrows • region predicate[Mittal, Garg PODC 00] e.g., virtual clocks of processes are “approximately” synchronized • time-complexity:O(nm2) • maximizes the concurrency in the controlled computation n: number of processes, m: number of events

  38. Conclusions • Efficient algorithms possible for monitoring global properties • Observation and Control: a powerful abstraction • Current execution engines are designed for performance rather than fault-tolerance

  39. Other Research Projects • Distributed simulation: GVT algorithms, fault-tolerance • Recovery Schemes: Optimistic Message Logging, fault-tolerance without replication • Model Checking: Partial Order Methods • Formal Methods: Petri Nets, Lattice Theory, Max Plus Algebra

  40. Questions • ?

More Related