1 / 35

12. Recovery

12. Recovery. Study Meeting M1 Yuuki Horita 2004/5/14. Contents. Introduction Recovery Checkpointing Difficulty of Checkpointing Synchronous checkpointing / recovery ( Asynchronous checkpointing / recovery ). Introduction. Long computation in distributed environments High failure rate

thalia
Download Presentation

12. Recovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 12. Recovery Study Meeting M1 Yuuki Horita 2004/5/14

  2. Contents • Introduction • Recovery • Checkpointing • Difficulty of Checkpointing • Synchronous checkpointing / recovery • (Asynchronous checkpointing / recovery)

  3. Introduction • Long computation in distributed environments • High failure rate • Host failure (a lot of hosts) • Network failure One failure may disturb entire computation ⇒ Need to start it again from the beginning • High cost Why don’t we utilize the previous computation? Recovery

  4. Recovery is not easy Suppose that a parallel computation is running in distributed resources… 1 7 8 1 7 8 1 7 1 7 for(i=0; i<MAXITER; i++){ local_compute(); // compute at each host global_state_exchange(); // communicate with neighbors } • need to save process states periodically • usually other processes have to restore to previous state • overhead

  5. Recovery

  6. Back/Forward Error Recovery • Forward-error recovery • Only when it is possible to remove errors • Enable processes to move forward • Ex) Redundancy, vote • Backward-error recovery • General • Restore to a previous error-free state • Ex) Checkpoint

  7. Backward-error recovery • operational-based approach • Record all modifications of a process’ state • state-based approach • Record complete state at certain point

  8. State-based approach Terminology • checkpointingthe process of saving state • checkpoint the recovery point at which checkpointing occurs • rolling back the process of restoring a process to a prior-state

  9. Checkpointing

  10. Problem of naïve checkpointing • Orphan Messages and the Domino Effect • Orphan message :a message that make an inconsistent state • Domino Effect : what a single rolling back induce other rolling back • Lost Messages • Livelocks

  11. Orphan message and Domino Effect x1 x2 x3 [ [ [ X Y has not sent yet, but X has received. y1 y2 : Orphan message [ [ Y Roll back z1 z2 [ [ Z : Domino Effect

  12. Lost messages x1 x2 x3 [ [ [ X X has sent, but Y cannot receive forever y1 y2 : Lost message [ [ Y Roll back z1 z2 [ [ Z

  13. Livelocks x1 [ X n2 n1 m2 m1 n1 y1 Y [

  14. Consistency of Checkpoint • Strongly consistent set of checkpoints no messages penetrating the set • Consistent set of checkpoints no messages penetrating the set backward x1 x2 [ [ need to deal with lost messages y1 y2 [ [ Strongly consistent consistent z1 z2 [ [

  15. Checkpoint/Recovery Algorithm • Synchronous • with global synchronization at checkpointing • Asynchronous • without global synchronization at checkpointing

  16. Preliminary (Assumption) ~Synchronous Checkpoint~ Goal To make a consistent global checkpoint Assumptions • Communication channels are FIFO • No partition of the network • End-to-end protocols cope with message loss due to rollback recovery and communication failure • No failure during the execution of the algorithm

  17. Preliminary (Two types of checkpoint) ~Synchronous Checkpoint~ tentative checkpoint: • a temporary checkpoint • a candidate for permanent checkpoint permanent checkpoint: • a local checkpoint at a process • a part of a consistent global checkpoint

  18. Checkpoint Algorithm ~Synchronous Checkpoint~ Algorithm • an initiating process (a single process that invokes this algorithm) takes a tentative checkpoint • it requests all the processes to take tentative checkpoints • it waits for receiving from all the processes whether taking a tentative checkpoint has been succeeded • if it learns all the processes has succeeded, it decides all tentative checkpoints should be made permanent; otherwise, should be discarded. • it informs all the processes of the decision • The processes that receive the decision act accordingly Supplement Once a process has taken a tentative checkpoint, it shouldn’t send messages until it is informed of initiator’s decision.

  19. Diagram of Checkpoint Algorithm ~Synchronous Checkpoint~ Tentative checkpoint decide to commit Initiator permanent checkpoint [ | [ request to take a tentative checkpoint OK [ | [ [ | [ consistent global checkpoint Unnecessary checkpoint consistent global checkpoint

  20. Optimized Algorithm ~Synchronous Checkpoint~ Each message is labeled by order of sending Labeling Scheme ⊥ : smallest label т : largest label last_label_rcvdX[Y] : the last message that X received from Y after X has taken its last permanent or tentative checkpoint. if not exists, ⊥is in it. first_label_sentX[Y] : the first message that X sent to Y after X took its last permanent or tentative checkpoint . if not exists, ⊥is in it. ckpt_cohortX :the set of all processes that may haveto take checkpoints when X decides to take a checkpoint. [ X x3 x2 y1 y2 [ Y y2 x2 Checkpoint request need to be sent to only the processes included in ckpt_cohort

  21. Optimized Algorithm ~Synchronous Checkpoint~ ckpt_cohortX : { Y | last_label_rcvdX[Y] > ⊥} Y takes atentative checkpoint only if last_label_rcvdX[Y] >= first_label_sentY[X] > ⊥ last_label_rcvdX[Y] [ X [ Y first_label_sentY[X]

  22. Optimized Algorithm ~Synchronous Checkpoint~ Algorithm • an initiating process takes a tentative checkpoint • it requests p∈ ckpt_cohortto take tentative checkpoints ( this message includes last_label_rcvd[reciever] of sender ) • if the processes that receive the request need to take a checkpoint, they do the same as 1.2.; otherwise, return OK messages. • they wait for receiving OK from all of p∈ ckpt_cohort • if the initiator learns all the processes have succeeded, it decides all tentative checkpoints should be made permanent; otherwise, should be discarded. • it informs p∈ ckpt_cohortof the decision • The processes that receive the decision act accordingly

  23. Diagram of Optimized Algorithm ~Synchronous Checkpoint~ Tentative checkpoint Permanent checkpoint decide to commit [ [ | A 2 >= 0 > 0 ab1 ba1 ba2 ac1 ca2 [ | [ 2 >= 1 > 0 B OK ac2 cb2 cb1 bd1 [ [ | 2 >= 2 > 0 C cd1 dc1 dc2 [ D ckpt_cohortX : { Y | last_label_rcvdX[Y] > ⊥ } last_label_rcvdX[Y] >= first_label_sentY[X] > ⊥

  24. Correctness ~Synchronous Checkpoint~ • A set of permanent checkpoints taken by this algorithm is consistent • No process sends messages after taking a tentative checkpoint until the receipt of the decision • New checkpoints include no message from the processes that don’t take a checkpoint • The set of tentative checkpoints is fully either made to permanent checkpoints or discarded.

  25. Recovery Algorithm ~Synchronous Recovery~ Labeling Scheme ⊥ : smallest label т : largest label last_label_rcvdX[Y] : the last message that X received from Y after X has taken its last permanent or tentative checkpoint. If not exists, ⊥is in it. first_label_sentX[Y] :the first message that X sent to Y after X took its last permanent or tentative checkpoint . If not exists, ⊥is in it. roll_cohortX :the set of all processes that may have to roll back to the latest checkpoint when process X rolls back. last_label_sentX[Y] : the last message that X sent to Y before X takes its latest permanent checkpoint. If not exist, т is in it.

  26. Recovery Algorithm ~Synchronous Recovery~ roll_cohortX = { Y | X can send messages to Y} Y will restart from thepermanent checkpoint only if last_label_rcvdY[X] > last_label_sentX[Y]

  27. Recovery Algorithm ~Synchronous Recovery~ Algorithm • an initiator requests p∈ roll_cohortto prepare to rollback ( this message includes last_label_sent[reciever] of sender ) • if the processes that receive the request need to rollback, they do the same as 1.; otherwise, returnOKmessage. • they wait for receiving OK from all of p∈ ckpt_cohort. • if the initiator learns p∈ roll_cohort have succeeded, it decides to rollback; otherwise, not to rollback. • it informs p∈ roll_cohortof the decision • the processes that receive the decision act accordingly

  28. Diagram of Synchronous Recovery decide to roll back [ [ A ab1 ba1 ba2 ac1 OK [ [ 2 > 1 0 > 1 B request to roll back ac2 cb2 cb1 bd1 [ [ C 2 > 1 dc1 dc1 dc2 [ D 0 >т 0 >т roll_cohortX = { Y | X can send messages to Y } last_label_rcvdY[X] > last_label_sentX[Y]

  29. Drawbacks of Synchronous Approach • Additional messages are exchanged • Synchronization delay • An unnecessary extra load on the system if failure rarely occurs

  30. Asynchronous Checkpoint Characteristic • Each process takes checkpoints independently • No guarantee that a set of local checkpoints is consistent • A recovery algorithm has to search consistent set of checkpoints • No additional message • No synchronization delay • Lighter load during normal excution

  31. Preliminary (Assumptions) ~Asynchronous Checkpoint / Recovery~ Goal To find the latest consistent set of checkpoints Assumptions • Communication channels are FIFO • Communication channels are reliable • The underlying computation is event-driven

  32. Preliminary (Two types of log) ~Asynchronous Checkpoint / Recovery~ • save an event on the memory at receipt of messages (volatile log) • volatile log periodically flushed to the disk (stable log) ⇔ checkpoint volatile log : quick accesslost if the corresponding processor fails stable log : slow accessnot lost even if processors fail

  33. Preliminary (Definition) ~Asynchronous Checkpoint / Recovery~ Definition CkPti : the checkpoint (stable log) that i rolled back to when failure occurs RCVDi←j (CkPti / e ) :the number of messages received by processor i from processor j, per the information stored in the checkpoint CkPti or event e. SENTi→j(CkPti / e ) :the number of messages sent by processor i to processor j, per the information stored in the checkpoint CkPti or event e

  34. Recovery Algorithm ~Asynchronous Checkpoint / Recovery~ Algorithm • When one process crashes, it recovers tothe latest checkpoint CkPt. • It broadcasts the message that it had failed. Others receive this message, and rollback to the latest event. • Each process sends SENT(CkPt) to neighboring processes • Each process waits for SENT(CkPt) messages from every neighbor • On receiving SENTj→i(CkPtj) from j, if i notices RCVDi←j (CkPti) > SENTj→i(CkPtj), it rolls back to the event e such that RCVDi←j (e) = SENTj→i(e), • repeat 3,4,and 5 N times (N is the number of processes)

  35. Asynchronous Recovery X:Y X:Z x1 Ex0 Ex1 Ex2 Ex3 [ X 2 <= 2 3 <= 2 0 <= 0 (X,2) (Z,0) (Y,2) Y:X Y:Z y1 Ey0 Ey1 Ey2 Ey3 [ 1 <= 2 1 <= 1 Y (X,0) (Z,1) (Y,1) Z:X Z:Y Ez1 Ez2 Ez0 [ 0 <= 0 1 <= 1 2 <= 1 Z z1 RCVDi←j (CkPti) <= SENTj→i(CkPtj)

More Related