1 / 10

Hierarchical Coordinated Checkpointing Protocol

Hierarchical Coordinated Checkpointing Protocol. Himadri Sekhar Paul. Arobinda Gupta. R. Badrinath . Dept. of Computer Sc. & Engg. Indian Institute of Technology, Kharagpur, INDIA 721302. <hpaul,agupta,badri>@cse.iitkgp.ernet.in. Motivation.

abena
Download Presentation

Hierarchical Coordinated Checkpointing Protocol

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hierarchical Coordinated Checkpointing Protocol Himadri Sekhar Paul. Arobinda Gupta. R. Badrinath. Dept. of Computer Sc. & Engg. Indian Institute of Technology, Kharagpur, INDIA 721302. <hpaul,agupta,badri>@cse.iitkgp.ernet.in

  2. Motivation • Long running application executing on Distributed Systems. • Metacomputer running over WAN. • Prone to failure, fault tolerance is important. • Checkpoint and recovery technique.

  3. Motivation • Coordinated Checkpointing protocol is a popular scheme. • Coordinated checkpointing protocol is bottlenecked by the slowest link in the network. • Hierarchical Coordinated Checkpointing Protocol caters for the heterogeneous link speed, as in WAN.

  4. System Model • Nodes are fail-safe. • Network is immune to partitioning. • Links are unreliable. • All computing nodes are reachable from the others. • Network is hierarchically connected • Clusters of computing nodes realized by high speed networks. • Clusters inter-connected by lower speed networks.

  5. Computation Nodes Cluster System Model

  6. Coordinator Ckpt Estb Ckpt Rqst Ack Ckpt Estb Ack Ckpt Rqst Follower Follower Process blocked … Message Checkpoint Flat Coordinated CheckpointingProtocol(2-phase commit)

  7. Initiator AckCkpt_rqst AckCkpt_estb AckCkpt_commit Ckpt_rqst Ckpt_estb Ckpt_commit Follower AckCkpt_rqst AckCkpt_commit Leader AckCkpt_rqst Ckpt_commit AckCkpt_estb Ckpt_rqst AckCkpt_commit Ckpt_estb Follower Message Blocking at Extra-cluster msg Blocked Checkpoint Hierarchical CoordinatedCheckpointing Protocol

  8. Simulation Result • Simulation Setup • Two level network, with intra-cluster link speed of 10 Mbps and inter-cluster link speed of 1 Mbps. • Communication pattern of the application is random. • Varying fraction of extra-cluster application message. (Flat = Flat Coordinated Checkpointing Protocol) (Hier = Hierarchical Coordinated Checkpointing Protocol)

  9. Simulation Result

  10. Conclusion & Future Work • In a two-level hierarchical network the hierarchical checkpointing protocol incurs less latency than the flat checkpointing protocol, even for very high communication intensity. • The protocol can be extended to a generic hierarchical network.

More Related