consistent cuts n.
Skip this Video
Loading SlideShow in 5 Seconds..
Consistent Cuts PowerPoint Presentation
Download Presentation
Consistent Cuts

Consistent Cuts

106 Views Download Presentation
Download Presentation

Consistent Cuts

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Consistent Cuts Ken Birman

  2. Idea • We would like to take a snapshot of the state of a distributed computation • We’ll do this by asking participants to jot down their states • Under what conditions can the resulting “puzzle pieces” be assembled into a consistent whole?

  3. An instant in real-time • Imagine that we could photograph the system in real-time at some instant • Process state: • A set of variables and values • Channel state • Messages in transit through the network • In principle, the system is fully defined by the set of such states

  4. Problems? • Real systems don’t have real-time snapshot facilities • In fact, real systems may not have channels in this sense, either • How can we approximate the real-time concept of a cut using purely “logical time”?

  5. Deadlock detection • Need to detect cycles A B C D

  6. Deadlock is a “stable” property • Once a deadlock occurs, it holds in all future states of the system • Easy to prove that if a snapshot is computed correctly, a stable condition detected in the snapshot will continue to hold • Insight is that adding events can’t “undo” the condition

  7. Leads us to define consistent cut and snapshot • Think of the execution of a process as a history of events, Lamport-style • Events can be local, msg-send, msg-rcv • A consistent snapshot is a set of history prefixes and messages closed under causality • A consistent cut is the frontier of a consistent snapshot – the “process states”

  8. Deadlock detection • Need to detect cycles A B C D

  9. Deadlock detection • Need to detect cycles A B C D

  10. Deadlock detection • Need to detect cycles A B C D

  11. Deadlock detection • A “ghost” or “false” cycle! A B C D

  12. A ghost deadlock • Occurs when we accidently snapshot process states so as to include some events while omitting prior events • Can’t occur if the cut is computed consistently since this violates causal closure requirement

  13. A ghost deadlock A B C D

  14. A ghost deadlock A B C D

  15. A ghost deadlock A B C D

  16. Algorithms for computing consistent cuts • Paper focuses on a flooding algorithm • We’ll consider several other methods too • Logical timestamps • Flooding algorithm without blocking • Two-phase commit with blocking • Each pattern arises commonly in distributed systems we’ll look at in coming weeks

  17. Cuts using logical clocks • Suppose that we have Lamport’s basic logical clocks • But we add a new operation called “snap” • Write down your process state • Create empty “channel state” structure • Set your logical clock to some big value • Think of clock as (epoch-number, counter)? • Record channel state until rcv message with big incoming clock value

  18. How does this work? • Recall that with Lamport’s clocks, if e is causally prior to e’ then LT(e) < LT(e’) • Our scheme creates a snapshot for each process at instant it reaches logical time t • Easy to see that these events are concurrent: a possible instant in real-time • Depends upon FIFO channels, can’t easily tell when cut is complete – a sort of lazy version of the flooding algorithm

  19. Flooding algorithm • To make a cut, observer sends out messages “snap” • On receiving “snap” the first time, A • Writes down its state, creates empty channel state record for all incoming channels • Sends “snap” to all neighbor processes Waits for “snap” on all incoming channels • A’s piece of the snapshot is its state and the channel contents once it receives “snap” from all neighbors • Note: also assumes FIFO channels

  20. With 2-phase commit • In this, the initiator sends to all neighbors: • “Please halt” • A halts computation, sends “please halt” to all downstream neighbors • Waits for “halted” from all of them • Replies “halted” to upstream caller • Now initiator sends “snap” • A forwards “snap” downstream • Waits for replies • Collects them into its own state • Send’s own state to upstream caller and resumes

  21. Why does this work? • Forces the system into an idle state • In this situation, nothing is changing… • Usually, sender in this scheme records unacknowledged outgoing channel state • Alternative: upstream process tells receiver how many incoming messages to await, receiver does so and includes them in its state. • So a snapshot can be safely computed and there is nothing unaccounted for in the channels

  22. Observation • Suppose we use a two-phase property detection algorithm • In first phase, asks (for example), “what is your current state” • You reply “waiting for a reply from B” and give a “wait counter” • If a second round of the same algorithm detects the same condition with the same wait-counter values, the condition is stable…

  23. A ghost deadlock A B C D

  24. Look twice and it goes away… • But we could see “new” wait edges mimicking the old ones • This is why we need some form of counter to distinguish same-old condition from new edges on the same channels… • Easily extended to other conditions

  25. Consistent cuts • Offer the illusion that you took a picture of the system at an instant in real-time • A powerful notion widely used in real systems • Especially valuable after a failure • Allows us to reconstruct the state so that we can repair it, e.g. recreate missing tokens • But has awkward hidden assumptions

  26. Hidden assumptions • Use of FIFO channels is a problem • Many systems use some form of datagram • Many systems have multiple concurrent senders on same paths • These algorithms assume knowledge of system membership • Hard to make them fault-tolerant • Recall that a slow process can seem “faulty”

  27. High costs • With flooding algorithm, n2messages • With 2-phase commit algorithm, system pauses for a long time • We’ll see some tricky ways to hide these costs either by continuing to run but somehow delaying delivery of messages to the application, or by treating the cut algorithm as a background task • Could have concurrent activities that view same messages in different ways…

  28. Fault-tolerance • Many issues here • Who should run the algorithm? • If we decide that a process is faulty, what happens if a message from it then turns up? • What if failures leave a “hole” in the system state – missing messages or missing process state • Problems are overcome in virtual synchrony implementations of group communication tools

  29. Systems issues • Suppose that I want to add notions such as real-time, logical time, consistent cuts, etc to a complex real-world operating system (list goes on) • How should these abstractions be integrated with the usual O/S interfaces, like the file system, the process subsystem, etc? • Only virtual synchrony has really tackled these kinds of questions, but one could imagine much better solutions. A possible research topic, for a PhD in software engineering

  30. Theory issues • Lamport’s ideas are fundamentally rooted in static notions of system membership • Later with his work on Paxos he adds the idea of dynamically changing subsets of a static maximum set • Does true dynamicism, of sort used when we look at virtual synchrony, have fundamental implications?

  31. Example of a theory question • Suppose that I want to add a “location” type to a language like Java: • Object o is at process p at computer x • Objects {a,b,c} are replicas of  • Now notions of system membership and location are very fundamental to the type system • Need a logic of locations. How should it look? • Extend to a logic of replication and self-defined membership? But FLP lurks in the shadows…

  32. FLP

  33. Other questions • Checkpoint/rollback • Processes make checkpoints, probably when convenient • Some systems try to tell a process “when” to make them, using some form of signal or interrupt • But this tends to result in awkward, large checkpoints • Later if a fault occurs we can restart from the most recent checkpoint

  34. So, where’s the question? • The issue arises when systems use message passing and want to checkpoint/restart • Few applications are deterministic • Clocks, signals, threads & scheduling, interrupts, multiple I/O channels, order in which messages arrived, user input… • When rolling forward from a checkpoint actions might not be identical • Hence anyone who “saw” my actions may be in a state that won’t be recreated!

  35. Technical question • Suppose we make checkpoints in an uncoordinated manner • Now process p fails • Which other processes should roll back? • And how far might this rollback cascade?

  36. Rollback scenario

  37. Avoiding cascaded rollback? • Both making checkpoints, and rolling back, should happen along consistent cuts • In mid 1980’s several papers developed this into simple 2-phase protocols • Today would recognize them as algorithms that simply run on consistent cuts • For those who are interested: sender-based logging is the best algorithm in this area. (Alvisi’s work)