1 / 27

Scalable Failure Recovery for Tree-based Overlay Networks

Scalable Failure Recovery for Tree-based Overlay Networks. Dorian C. Arnold University of Wisconsin. Paradyn/Condor Week April 30 – May 3, 2007 Madison, WI. Overview. Motivation Address the likely frequent failures in extreme-scale systems State Compensation:

lisle
Download Presentation

Scalable Failure Recovery for Tree-based Overlay Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Failure Recovery forTree-based Overlay Networks Dorian C. Arnold University of Wisconsin Paradyn/Condor Week April 30 – May 3, 2007 Madison, WI

  2. Overview • Motivation • Address the likely frequent failures in extreme-scale systems • State Compensation: • Tree-Based Overlay Networks (TBŌNs) failure recovery • Use surviving state to compensate for lost state • Leverage TBŌN properties • Inherent information redundancies • Weak data consistency model: convergent recovery • Final output stream converges to non-failure case • Intermediate output packets may differ • Preserves all output information Scalable TBŌN Failure Recovery

  3. HPC System Trends • 60% larger than 103 processors • 10 systems larger than 104 processors Scalable TBŌN Failure Recovery

  4. Large Scale System Reliability BlueGene/L Purple Scalable TBŌN Failure Recovery

  5. Current Reliability Approaches • Fail-over (hot backup) • Replace failed primary w/ backup replica • Extremely high overhead: 100% minimum! • Rollback recovery • Rollback to checkpoint after failure • May require dedicated resources and lead to overloaded network/storage resources Scalable TBŌN Failure Recovery

  6. Our Approach:State Compensation • Leverage inherent TBŌN redundancies • Avoid explicit replication • No overhead during normal operation • Rapid recovery • Limited process participation • General recovery model • Applies to broad classes of computations Scalable TBŌN Failure Recovery

  7. CP0 CP0 PacketFilter CP1 CP1 CP2 CP2 FilterState Tree ofCommunicationProcesses … CP CP CP CP … CP CP CP CP CP CP CP CP Background: TBŌN Model FE Application Front-end TBŌN Output TBŌN Input Application Back-ends BE BE BE BE BE BE BE BE Scalable TBŌN Failure Recovery

  8. A Trip Down Theory Lane • Formal treatment provides confidence in recovery model • Reason about semantics before/after recovery • Recovery model doesn’t change computation • Prove algorithmic soundness • Understand recovery model characteristics • System reliability cannot depend upon intuition and ad-hoc reasoning Scalable TBŌN Failure Recovery

  9. Theory Overview • TBŌN end-to-end argument: output only depends on state at the end-points • Can recover from lost of any internal filter and channel states CPi channel state filter state CPj channel state CPk CPl Scalable TBŌN Failure Recovery

  10. Theory Overview (cont’d) CPi TBŌN Output Theorem Output depends only on channel states and root filter state channel state filter state All-encompassing Leaf State Theorem State at leaves subsume channel state(all state throughout TBŌN) CPj channel state Result: only need leaf state to recoverfrom root/internal failures CPk CPl Scalable TBŌN Failure Recovery

  11. Theory Overview (cont’d) • TBON Output Theorem • All-encompassing Leaf State Theorem • Builds on Inherent Redundancy Theorem Scalable TBŌN Failure Recovery

  12. Background: Notation fsn(CPi ) CPi csn,p( CPi ) fsp(CPj ) CPj cs( CPj ) CPk CPl Scalable TBŌN Failure Recovery

  13. ( ( ) ( ) ) f ( ) ( ) g f f f C P C P C P C P i t n s o u s ! i i i i 1 + n n n n ; ; Background: Data Aggregation Filter function: Packets frominput channels Current filter state Output packet Updated filter state Scalable TBŌN Failure Recovery

  14. ( ( ) ) ( ( ) ) ( ) b b b f b f C P C P C P i t t t t t t t t t a a n a a a c a a s c s = = = ! i i i 1 + n n n Background: Filter Function • Built on state join and difference operators • State join operator, • Update current state by merging inputs • Commutative: • Associative: • Idempotent: Scalable TBŌN Failure Recovery

  15. Background: Descendant Notation desc0(CPi ) CPi desc1(CPi ) CP CP … desck-1(CPi ) CP CP desck(CPi ) CP CP CP CP … fs( desck(CPi ) ): join of filter states of specified processes cs( desck(CPi ) ): join of channel states of specified processes Scalable TBŌN Failure Recovery

  16. t t t TBŌN Properties:Inherent Redundancy Theorem The join of a CP’s filter state with its pending channel state equals the join of the CP’s children’s filter states. The join of a CP’s filter state with its pending channel state equals the join of the CP’s children’s filter states. The join of a CP’s filter state with its pending channel state equals the join of the CP’s children’s filter states. fsn(CP0 ) csn,p(CP0 ) csn,q(CP0 ) fsp(CP1 ) fsq(CP2 ) = CP0 CP1 CP2 Scalable TBŌN Failure Recovery

  17. t TBŌN Properties:Inherent Redundancy Theorem The join of a CP’s filter state with its pending channel state equals the join of the CP’s children’s filter states. fs( desc0(CP0 )) cs( desc0(CP0)) fs( desc1(CP0 )) = CP0 CP1 CP2 Scalable TBŌN Failure Recovery

  18. TBŌN Properties:All-encompassing Leaf State Theorem The join of the states from a sub-tree’s leaves equalsthe join of the states at the sub-tree’s root and all in-flight data The join of the states from a sub-tree’s leaves equalsthe join of the states at the sub-tree’s root and all in-flight data CP0 CP2 CP1 CP5 CP6 CP3 CP4 Scalable TBŌN Failure Recovery

  19. k k 0 1 ¡ ( ( ) ) ( ) ( ( ) ) ( ( ) ) f d f d d C P C P C P C P t t t s e s c s c s e s c c s e s c = 0 0 0 0 : : : k k k 1 1 ¡ ¡ ( ( ) ) ( ( ) ) ( ( ) ) f d f d d C P C P C P t s e s c s e s c c s e s c 1 0 0 = ( ( ) ) ( ( ) ) ( ( ) ) 0 0 0 f d f d d C P C P C P 2 1 1 ( ( ) ) ( ( ) ) ( ( ) ) t f d f d d C P C P C P s e s c s e s c c s e s c t = 0 0 0 s e s c s e s c c s e s c = 0 0 0 TBŌN Properties:All-encompassing Leaf State Theorem The join of the states from a sub-tree’s leaves equalsthe join of the states at the sub-tree’s root and all in-flight data From Inherent Redundancy Theorem: … Scalable TBŌN Failure Recovery

  20. State Composition • Motivated by previous theory • State at leaves of a sub-tree subsume state throughout the higher levels • Compose state below failure zones to compensate for lost state • Addresses root and internal failures • State decomposition for leaf failures • Generate child state from parent and sibling’s Scalable TBŌN Failure Recovery

  21. State Composition If CPj fails, all state associated withCPj is lost CPi TBŌN Output Theorem: Output depends only on channel states and root filter state cs( CPi , m) fs(CPj ) CPj All-encompassing Leaf State Theorem: State at leaves subsume channel state(all state throughout TBŌN) cs( CPj ) CPk CPl Therefore, leaf states can replacelost channel state without changingcomputation’s semantics Scalable TBŌN Failure Recovery

  22. State Composition Algorithm ifdetect child failure remove failed child from input list resume filtering from non-failed children endif ifdetect parent failure do determine/connect to new parent whilefailure to connect propagate filter state to new parent endif Scalable TBŌN Failure Recovery

  23. Summary: Theory can be Good! • Allows us to make recovery guarantees • Yields sensible, understandable results • Better-informed implementation • What needs to be implemented • What does not need to be implemented Scalable TBŌN Failure Recovery

  24. References • Arnold and Miller, “State Compensation: A Scalable Failure Recovery Model for Tree-based Overlay Networks”, UW-CS Technical Report, February 2007. • Other papers and software:http://www.paradyn/org/mrnet Scalable TBŌN Failure Recovery

  25. Bonus Slides! Scalable TBŌN Failure Recovery

  26. ( ) l b l h d i i t t t t t £ + r e c o v e r y a e n c y c o n n e c o n e s a s m e n m a x a o p e e s = h d t t o u p u o v e r e a State Composition Performance LAN connection establishment: ~1 millisecond Scalable TBŌN Failure Recovery

  27. Failure Model • Fail-stop • Multiple, simultaneous failures • Failure zones: regions of contiguous failure • Application process failures • May view as sequential data sources/sink • Amenable to basic reliability mechanisms • Simple restart, sequential checkpointing Scalable TBŌN Failure Recovery

More Related