1 / 43

Failure Recovery of Overlay Tree-based Structures

Doctoral Thesis. Failure Recovery of Overlay Tree-based Structures. Ing. Vladim í r Dynda Doc. RNDr. Ing. Petr Zem á nek, CSc. (supervisor). Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science and Engineering. Agenda. Introduction

trilby
Download Presentation

Failure Recovery of Overlay Tree-based Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Doctoral Thesis Failure Recoveryof Overlay Tree-basedStructures Ing. Vladimír Dynda Doc. RNDr. Ing. Petr Zemánek, CSc. (supervisor) Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science and Engineering

  2. Agenda • Introduction • Solution • BR Platform • Bypass Routing • Leader Link Election • Tree Reconnection • Summary of Results • Conclusion Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures

  3. Agenda • Introduction • Solution • BR Platform • Bypass Routing • Leader Link Election • Tree Reconnection • Summary of Results • Conclusion Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures

  4. Introduction • Problem statement TR= (TM\FC, CE’ ) T4 T = (TM, CE) TM T5 CE T6 T3 FC T0 T2 S= (N, L) T1 Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 1

  5. Introduction • Problem statement • Failure recovery • Reconnection ofT0, T1, ..., TN-1intoa restored network TR= (TM \FC, CE’) • Correctness – TR is acyclic • Completeness –TRcontains all the fragments Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 2

  6. Introduction • Problem statement • Environment • Asynchronous distributed system • No central authority / no global knowledge • Unlimited sizes of S and T • Arbitrary traffic directionin T • Failures • Node failures only • Fail stop failure model • Failures must not split S Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 3

  7. Introduction • Goals of the thesis • Proposal of a generic recovery platform • Illustration of the tree restoration methods • Simulation & verification of the theoretical properties • Survey of possible applications Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 4

  8. Introduction • State of the art • On-demand / preplanned recovery • Preplanned methods • Employ pre-computed backup structures • Existing preplanned methods • Complete graph (Narada) • Ancestor list (Yang-Fei, EFTMRP, HMTP) • Administrative hierarchy (Nice, Nemo) • Secondary trees (Dual-tree, Coop-net) • Link to random nodes (HMTP, Yoid) Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 5

  9. Introduction • State of the art • Weaknesses of the existing methods • Poor scalability • Restricted set of applicable trees • Single points of failure • Fixed level of fault tolerance • Unrecoverable multiple failures • Non-local restoration Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 6

  10. Agenda • Introduction • Solution • BR Platform • Bypass Routing • Leader Link Election • Tree Reconnection • Summary of Results • Conclusion Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures

  11. BR Platform • Bypass ring platform • Ensures correctness and completeness • Forms a basis for a tree reconnection • Fabric of redundant links in T: • Bypass rings of optional diameter • Alternative paths in the event of failure • Location & routing among the fragments Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 7

  12. BR Platform • Failure recovery Bypass routing Tree reconnection Leader link election Bypass rings BC(FC) n1 Leader BRT(n1,4) BRT(n2,2) BRT(n1,3) BRT(n1,2) FC n1 n2 TR= (TM\FC, CE’ ) n2 T = (TM, CE) Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 8

  13. BR Platform • Elemental steps of the recovery • Initialization of the platform • Failure detection • Designated nodes discovery • Leader link election • Tree reconnection • Bypass rings reconfiguration Bypass routing Correctness & Completeness Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 9

  14. Agenda • Introduction • Solution • BR Platform • Bypass Routing • Leader Link Election • Tree Reconnection • Summary of Results • Conclusion Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures

  15. Bypass Routing • Partially ordered tree (POT) Ordered rays Ordered neighbor sequence R-(A0,3C) R+(A0,3C) 17 CE E8 9F BT(A0,3C) B9 72 67 79 09 0F 3C A0 93 B2 1D SeqT(A0) 24 SeqT(3C) 42 T = (TM, CE) 5E 4A F7 11 R+(A0,3C) Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 10

  16. Bypass Routing • Bypass ring BRT(n, d) R+(n,n1) R-(n,n0) dmax = 4 BT(n,n1) BRT(n,4) BRT(n,dmax) BRT(n,3) BT(n,n0) n1 BRT(n,2) n0 R-(n,n1) R+(n,n2) R+(n,n0) n2 n n3 R-(n,n3) SeqT(n) BT(n,n2) R+(n,n3) BT(n,n3) R-(n,n2) Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 11

  17. BRT(nm,dmax) BRT(n2,5) BRT(n2,4) BRT(n1,3) BRT(n1,2) Bypass Routing • Bypass rings R+(n,n1) ndmax n5 n4 n3 FC n2 n1 n BT(n,n1) T = (TM, CE) Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 12

  18. Bypass Routing • Routing algorithm • <FC>T = BT(ni, nj), njAT(ni)  FC ni1 nj1 BC(FC) BT(ni2,nj2) BT(ni3,nj3) FC T = (TM, CE) nj3 R+(ni1,nj1) ni3 nj2 ni2 BT(ni1,nj1) Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 13

  19. BRT(A0,4) BRT(3C,3) BRT(3C,2) Bypass routing • Example BC(FC) R+(72,3C) CE 17 E8 9F 72 B9 0F 67 FC 79 09 3C A0 93 B2 1D 24 T = (TM, CE) 42 5E 4A F7 11 Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 14

  20. Bypass Routing • Properties • Memory overhead at node nT:O(degT(n) * dmax) • Routing is successful iflenX(ni, ni+1)  dmax, X = R+(ni, nj)for all neighborsni andni+1 BC(FC) • Lower bound of maximum size ofFC:dmax/2 nodes for arbitrary clusters Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 15

  21. Agenda • Introduction • Solution • BR Platform • Bypass Routing • Leader Link Election • Tree Reconnection • Summary of Results • Conclusion Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures

  22. Leader Link Election • Leader link election(LLE) • Guarantees correctness • Communication structure – BC(FC) • Node states • Passive – initial state of the election • Active – leader candidates • Relay – election is lost Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 16

  23. ID(nN-1) < ID(n0) Leader Link Election • LLE on ordered rings ID(n0) < ID(n1) < ... < ID(nN-1) Leader ELECTION(n0) n0 nN-1 ID(n0) < ID(n1) n1 ELECTION(n1) FC n6 n2 ID(n1) < ID(n2) n BC(FC) = BRT(n,2) SeqT(n) n5 n3 n4 <FCAT(FC)> Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 17

  24. A1.BA < A1.16 Leader Link Election • LLE in partially ordered trees Sweep process Hierarchical identifier HIDT(nr,ni) ELECTION(4F.*) Leader BC(FC) R+ HIDT(4F,D8) D8 4F.A1.BA.D8 SWEEP(4F.A1) BA HIDT(4F,97) 97 4F.A1.BA.97 ELECTION(A1.BA.97) A1 4F HIDT(4F,16) 4F.A1.16 16 nr SeqT(nr) SeqT(A1) FC <FCAT(FC)> Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 18

  25. 17 9F 67 79 93 24 3C.A0 < 3C.A0 A0.B9 < A0.1D 42 5E 4A F7 11 Leader Link Election • Example CE Leader ELECTION(3C.A0.1D) E8 72 FC B9 SWEEP(3C.A0) 0F nr nr 09 3C A0 ELECTION(A0.B9.CE) B2 1D T = (TM, CE) <FCAT(FC)> Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 19

  26. Leader Link Election • Properties • Average message complexity:O(N logbN); b is the average branching factor of FC nodes in T • Time complexity: O(N) Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 20

  27. Agenda • Introduction • Solution • BR Platform • Bypass Routing • Leader Link Election • Tree Reconnection • Summary of Results • Conclusion Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures

  28. Tree Reconnection • Reconnection methods • Reconnect the fragments located by the routing algorithm • Abide by the results of LLE • Designed to meet the specific application requirements • Influence properties of the restored tree Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 21

  29. Tree Reconnection • LR method BC(FC) n1 n2 n3 Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 22

  30. Tree Reconnection • HR-x method HR-1 (q0, qi) if i  1 (mod x) (qi-1, qi) otherwise BC(FC) n1 = q0 q3 q1 q2 q2 q1 n2 = q0 = q3 n3 q5 = q0 = q1 q4 q2 q3 Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 23

  31. Tree Reconnection • HR-x method HR-2 (q0, qi) if i  1 (mod x) (qi-1, qi) otherwise BC(FC) n1 n2 n3 Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 24

  32. 17 9F 67 79 93 24 42 5E 4A F7 11 Tree Reconnection • Example CE ELECTION(3C.A0.1D) E8 72 FC B9 SWEEP(3C.A0) 0F 09 3C A0 ELECTION(A0.B9.CE) B2 TR= (TM\FC, CE’ ) 1D <FCAT(FC)> HR-2 Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 25

  33. Tree Reconnection • Properties Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 26

  34. Tree Reconnection • Properties Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 27

  35. Agenda • Introduction • Solution • BR Platform • Bypass Routing • Leader Link Election • Tree Reconnection • Summary of Results • Conclusion Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures

  36. Summary of Results • Properties of the BR platform • Node memory overhead: • O(degT(n) * dmax) • Average message complexity: • O(N logbN) for arbitrary failures • Nfor single failures • Lower bound of max. recoverable failure: • dmax/2 nodes for arbitrary failed clusters • dmax-1 nodes for internal failed clusters Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 28

  37. Summary of Results • Simulation results • Successfully recovered cluster • Average diameter: dmax-2 • Average size: 1.5 dmax • Linear recovery time • dmax parameter • Controls fault-tolerance vs. costs • dmax=4 provides ample tolerance for GFS Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 29

  38. Summary of Results • Properties of the platform • Locality • Multiple failure recovery • Scalability • Application requirements consideration • Optional level of fault tolerance • Protection selectivity • Designated nodes discovery • Tree reconnection method • Independence of the protected tree type Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 30

  39. Summary of Results • Applications • Overlay multicast • Applicable in all types • Network-layer multicast • Extension with BR(n,1) needed • Sample application – GFS multicast • Designed for large-scale P2P systems • Based on a layered administrative hierarchy • Employs BR platform to achieve fault-tolerance Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 31

  40. Agenda • Introduction • Solution • BR Platform • Bypass Routing • Leader Link Election • Tree Reconnection • Summary of Results • Conclusion Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures

  41. Conclusion • Thesis summary • Analysis of overlay trees environment and identification of recovery properties • Proposal of BR platform • Design of the specialized leader election • Illustration of the tree reconnection • Simulation of the platform • Outline of the overlay multicast scheme Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 32

  42. Conclusion • Ideas for further research • Autonomous management of fault-tolerance level and protection selectivity • More sophisticated tree reconnection methods • Extension of the platform fornetwork-layer multicast Vladimír Dynda: Failure Recovery of Overlay Tree-based Structures 33

  43. Thank You

More Related