1 / 17

Distributed Snapshots:

Distributed Snapshots:. Non-blocking checkpoint coordination protocol. Next: Uncoordinated Chkpnt. Uncoordinated. Processes take chkpnt independently Domino Effect!. Next: Coordinated Blocking Chkpnt. Coordinated Blocking. Processes are coordinated to form a consistent global state, and ….

Leo
Download Presentation

Distributed Snapshots:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Snapshots: Non-blocking checkpoint coordination protocol Next: Uncoordinated Chkpnt

  2. Uncoordinated • Processes take chkpntindependently • Domino Effect! Next: Coordinated Blocking Chkpnt

  3. Coordinated Blocking • Processes are coordinated to form a consistent global state, and … * okay, channels flushed Ready! Go! initiator * p1 * p2 * p3 Next: Coordinated Blocking Chkpnt (cont’)

  4. Coordinated Blocking (cont’) • Advantage • Always consistent • No Domino Effect • Less storage overhead • Disadvantage • Large latency to chkpnt! Next: Coordinated Non-blocking Chkpnt

  5. Coordinated Non-blocking • Processes are coordinated, but … • Do we really need to block …? ! Leslie Lamport K. Mani Chandy ! Next: Global-state Recording Algorithm

  6. Global-state Recording Alg. “Distributed snapshots: determining global states of distributed systems”, K. Mani Chandy and Leslie Lamport • Step 1: process states • Step 2: channel states • Step 3: end of the algorithm Next: Model of Distributed System

  7. c1 p q r c2 c3 c4 Model of Distributed System • Processes • Channels: directed, FIFO, error-free Next: Step 1, process states

  8. Step 1: process states • Initiator: • Save its local state • Send marker tokens on all outgoing edges • All other processes: • On receiving the first marker on any incoming edges, • Save state, and propagate markers on all outgoing edges • Resume execution. • Further markers will be eaten up. Next: Example

  9. c1 initiator c2 c3 c4 r p q marker checkpoint Example p x x q x x x r Next: Proof

  10. x x x x x p q • Proof Let us assume that a message m exists, and it makes our cut inconsistent. p m q Next: Proof (cont’)

  11. x2 x x1 x x p q [Incomplete page] • Proof(cont’) p m x1 • x1 is the 1st marker • for process q q x2 p m (2) x1 is not the 1st marker for process q x1 q x2 Contradict the assumption. Next: Step 2, channel states

  12. Step 2: channel states p In-flight messages q • Sent along the channel before the sender’s chkpnt • Received along the channel after the receiver’s chkpnt Next: Example

  13. Example (2) p has just saved its state (1) p is receiving messages r r s s q q x x 7 7 x x 8 8 5 5 x 3 6 6 2 1 4 4 p p x x u u t t Next: Example (cont’)

  14. Example(cont’) p’s chkpnt triggered by a marker from q r s x q x 7 1 2 3 5 4 6 7 8 p x 8 5 x x 3 6 q 2 1 4 x x x p r x u s t x Next: Algorithm (revised)

  15. Algorithm (revised) • Initiator: • Save its local state • Send marker tokens on all outgoing edges • All other processes: • On receiving the first marker on any incoming edges, • Save state, and propagate markers on all outgoing edges • Resume execution, but also save incoming messages until a marker arrives through the channel • Guarantees a consistent global state! Next: Step 3, end of the algorithm

  16. initiator r p q Step 3: end of the algorithm • Did every process save its state and in-flight messages? • direct channel to the initiator? • spanning tree? • General solution? Next: References

  17. References “Distributed snapshots: determining global States of distributed systems”, K. Mani Chandy and Leslie Lamport

More Related