1 / 36

Asynchronous rebalancing of a replicated tree

Asynchronous rebalancing of a replicated tree. Marek Zawirski INRIA & UPMC, France Marc Shapiro INRIA & LIP6, France Nuno Preguiça UNL, Portugal CFS E , May 2011, Saint-Malo. Summary. Overview of Treedoc: Abstractly, a lways- responsive replicated sequence

nenet
Download Presentation

Asynchronous rebalancing of a replicated tree

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Asynchronous rebalancingof a replicated tree MarekZawirskiINRIA & UPMC, France Marc Shapiro INRIA & LIP6, France Nuno Preguiça UNL, Portugal CFSE, May 2011, Saint-Malo

  2. Summary • Overview of Treedoc: • Abstractly, always-responsivereplicated sequence • Built as a replicated ordering tree • Problem faced: • Tree unbalance • Solution for asynchronous tree rebalance • Algorithm requirements statement • Novel F-translate algorithm Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  3. Treedoc – a replicated sequence replica2 replica1 I I 0 1 0 1 • Replicated representation: • Grow-only binary tree • Stable, unique position ids L P L P Total order ”<”: infix traversal =1 P • Sequence of atoms: • Ops: read, addAt, removeAt L I P replica3 I 0 1 L P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  4. Treedoc – a replicated sequence replica2 replica1 I I 0 1 0 1 L P L P L I P replica3 I S 0 1 addAt(, S) < < S I S P L P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  5. Treedoc – a replicated sequence replica2 replica1 I I 0 1 0 1 L P L P 0 S L I S P replica3 I 0 1 addAt(, S) =10 < < S S I S P L P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  6. Treedoc – a replicated sequence replica2 replica1 I I 0 1 0 1 L P L P 0 S X L I S P replica3 I 0 1 addAt(, S) S L P removeAt( ) P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  7. Treedoc – a replicated sequence replica2 replica1 I I 0 1 0 1 L P L P 0 S L I S replica3 I 0 1 addAt(, S) S L P removeAt( ) P [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  8. Treedoc – a replicated sequence replica2 replica1 • Operation-based replication: • Immediate local execution • Propagate (cbcast) & replay I I 0 1 0 1 L P L P 0 0 S S L I S replica3 I 0 1 addAt(, S) addAt(, S) addAt(, S) S S S L P 0 removeAt( ) P S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  9. Treedoc – a replicated sequence replica2 replica1 • Operation-based replication: • Immediate local execution • Propagate (cbcast) & replay I I 0 1 0 1 L P L P 0 0 S S L I S replica3 I 0 1 addAt(, S) S L P 0 removeAt( ) removeAt( ) removeAt( ) P P P S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  10. Treedoc – a replicated sequence replica2 replica1 • Operation-based replication: • Immediate local execution • Propagate (cbcast) & replay I I 0 1 0 1 L P L P 0 0 0 0 E S A S addAt(, A) addAt(, A) A A E L I S replica3 addAt(, E) I E 0 1 L P ? 0 S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  11. Treedoc – a replicated sequence replica2 replica1 • Operation-based replication: • Immediate local execution • Propagate (cbcast) & replay I I 0 1 0 1 L P L P 0 0 0 0 S A S E A addAt(, A) addAt(, A) addAt(, A) A A A A E L I S replica3 addAt(, E) addAt(, E) addAt(, E) I E E E 0 1 L P 0 Predefined order:red <green< blue… S [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  12. Treedoc – a replicated sequence replica2 replica1 • Operation-based replication: • Immediate local execution • Propagate (cbcast) & replay • Concurrent commute • Eventually consistent I I 0 1 0 1 L P L P 0 0 0 0 S S E A E A A E L I S replica3 I 0 1 L P 0 0 Predefined order:red <green< blue… S E A [Shapiro, Preguiça et. al, 2007, 2009] Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  13. The tree rebalance problem • With time tree gets worse and worse • Unbalanced, empty nodes, lot of colors… • Various negative impacts I 0 1 rebalance L P 0 0 S E A • Tree rebalance: • Create minimal tree from nonempty nodes • Keep order “<“ • Use single color (white) • New ids (rectangles), incompatible with old L 0 1 • Challenge: • How to avoid system-wide consensus? E S 0 1 Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree A I

  14. The core-nebula architecture Idea: limit consensus to a smaller number of replicas Divide replicas into two disjoint sets: [Lețiaet. al, 2009] CORE • a stable group • execute tree operations& agree on rebalance • easier agreement NEBULA • sites join & leave, dynamic • generate tree operations • learns about rebalance • perform catch-upprotocolto integrate conc. changes • never blocked Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  15. Rebalance in core, catch-up from nebula core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L P P L P P L P P L P P 1 1 1 1 S S S S addAt(, S) addAt(, S) addAt(, S) addAt(, S) S S S S • Any pair of replicas can exchange operations in the same epoch Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  16. Rebalance in core, catch-up from nebula core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L P P L P P L P P L P P 1 1 1 1 S S S S • Any pair of replicas can exchange operations in the same epoch removeAt( ) removeAt( ) removeAt( ) removeAt( ) P P P P Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  17. Rebalance in core, catch-up from nebula core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L I P L I P L I P L I P 1 1 1 1 S S S S • Any pair of replicas can exchange operations in the same epoch removeAt( ) I Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  18. Rebalance in core, catch-up from nebula core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U addAt(, P) addAt(, P) addAt(, U) addAt(, U) final state U P U P • Any pair of replicas can exchange operations in the same epoch • rebalance@coreinitiates new epoch • rebalance@coreand operations@coreinherently concurrent to ops@nebula! S 0 L 0 T addAt(, T) T Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  19. Rebalance in core, catch-up from nebula core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U addAt(, U) addAt(, P) addAt(, P) final state P U P X • Pairwise catch-up movesnebula replica to the next epoch S ? 0 L 0 T addAt(, T) T Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  20. Rebalance in core, catch-up from nebula core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state • Pairwise catch-up movesnebula replica to the next epoch • replay core ops until finalstate S 0 L removeAt( ) 0 I T Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  21. Rebalance in core, catch-up from nebula core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state catch-up • Pairwise catch-up movesnebula replica to the next epoch • replay core ops until finalstate • replay rebalance on final nodes • translate nodes of nebula operations || rebalanceinto the new tree S S 0 0 L L 0 1 ? T P Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  22. Naive translation algorithm(s) core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state catch-up S S • Naive translation algorithm: Create new position respecting old order observed at the nebula replica 0 0 L L 0 1 < < ~[Lețiaet. al, 2009] L P S T P < < S L P Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  23. Naive translation algorithm(s) core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state catch-up S S 0 0 L L removeAt( ) 0 I 1 T P Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  24. Naive translation algorithm(s) core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state catch-up catch-up S S S 0 0 0 L L L 0 1 1 T P U < < L U S Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  25. Naive translation algorithm(s) core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state catch-up catch-up S S S 0 0 0 L L L 0 1 1 T P U addAt(, P) addAt(, P) P U Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  26. Naive translation algorithm(s) core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 rebalance L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state catch-up catch-up Order observed at nebula2 broken! S S S 0 0 0 L L L < 0 U P 1 1 < P P U U T P U Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  27. Towards correct translate: requirements • Order-preserving • For every , the order is preserved between epochs: < => < • Deterministic • For every , nebulai, nebulaj, is translated identically: @nebulai = @nebulaj • Non-disruptive • For every created by addAtand created by translate: != Solution: designate all cases in advance using final state only! X Y X Y Y X X X X X X Y X Y Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  28. F-Translatealgorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state • Sentinel position : • Designated position for every potential translate • < < … < < • Materialized on translate S Xi S 0 0 L L S1 0 X X2 X1 Y Xim T L1 L2 L3 L4 Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  29. F-Translatealgorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state F-Translate: • Right child of final node S S 0 0 L L S1 0 T L1 L1 L2 L3 L4 1 U Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  30. F-Translatealgorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state F-Translate: • Child of empty f. node • Etc. S S S 0 0 0 L L S1 L S1 0 T L1 L2 L3 L3 L4 L1 L1 L2 L3 L4 0 1 P U Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  31. F-Translatealgorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state S S S 0 0 0 L L S1 L S1 0 T L1 L2 L3 L3 L4 L1 L1 L2 L3 L4 removeAt( ) I 0 1 P U Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  32. F-Translatealgorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state S S S S 0 0 0 0 L L S1 L S1 L S1 0 T L1 L2 L3 L3 L4 L1 L1 L2 L3 L3 L4 L1 L1 L2 L3 L4 0 1 0 1 P U P U Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  33. F-Translatealgorithm & sentinels core1 nebula1 nebula2 nebula3 I I I I 0 1 0 1 0 1 0 1 L I P L I P L I P L I P 1 0 0 1 1 1 1 1 S P S U S P S U final state S S S S 0 0 0 0 L L S1 L S1 L S1 0 0 0 0 T T L1 L1 L2 L3 L3 L4 T L1 L1 L2 L3 L3 L4 T L1 L1 L2 L3 L3 L4 L1 L1 L2 L3 L3 L4 1 0 1 0 0 1 1 0 P U U P P U P U Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  34. F-Translate: sentinel implementation • How to implement sentinelAfter? • TODO: REPLCACE w/figure!!! • show if time allows & audience is not lost • Empty nodes are necessary! • Is more empty nodes discarded than introduced?Yes, but… • How to minimize empty nodes in sentinelAfter • Using balanced binary tree! • Define addAt such that it avoids empty nodes, long paths • Two-round empty node discard X O(1) number of empty nodes / path + O(log imax) / path Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  35. Summary • Problem faced: • Tree unbalance • Solution for asynchronous tree rebalance • Algorithm requirements statement • Novel F-translate algorithm: • Identify and utilize final state prior to rebalance • Use sentinel positions • Prototype catch-up implementation • Future work? • Evaluation of sentinelAfterimplementation • Formal order-preservation proof Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

  36. Appendix: the unbalance problem • Use sparse tree and heuristic to assign PosID[Weiss et. al, ‘09] • Tested on Wikipedia traces; works at the cost of possible anomaly • Use list instead of a tree [Roh et. al, ‘10] • Different costs and convergence characteristics? • Rebalance the tree [Shapiro, Preguiça et. al, ‘09] • System-wide consensus;inherent limitations • The core-nebula idea [Lețiaet. al ‘09]; incorrect translation • This work brings: • More formalization of the core-nebula for asynchronous systems • Flaws revealed in naive algorithms • Translation requirements statement • Novel F-translate algorithm • First prototype implementation in Java (subject to further studies) Zawirski, Shapiro, Preguiça- Asynchronous rebalancing of a replicated tree

More Related