1 / 31

Minimizing Churn in Distributed Systems

Minimizing Churn in Distributed Systems. P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06. Road Map. Introduction Simulation Basic Properties Analysis Applications Discussion Conclusion. Introduction. Churn

chesmu
Download Presentation

Minimizing Churn in Distributed Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Minimizing Churn in Distributed Systems P. Brighten Godfrey, Scott Shenker, and Ion Stoica UC Berkeley SIGCOMM’06

  2. Road Map • Introduction • Simulation • Basic Properties • Analysis • Applications • Discussion • Conclusion

  3. Introduction • Churn • Change in the set of participating nodes due to joins, graceful leaves, and failures • A quantitative guide to the churn form selection strategies • Analytically characterize the performance of strategies • Compare the performance of strategies with different real traces

  4. Road Map • Introduction • Simulation • Basic Properties • Analysis • Applications • Discussion • Conclusion

  5. Churn Simulations Model • System Model • Node status • Up (in use, or available), down • Nodes in use • Definition of churn • Example • Two nodes fail and replaced by others

  6. Predictive fixed strategies Fixed decent Select randomly from 50% with more up time Fixed most available The most time up Fixed longest lived Greatest average session time Agnostic fixed strategies Fixed random Predictive replacement strategies Max Expectation Greatest expected remaining uptime Longest uptime Longest current uptime Optimal Agnostic replacement strategies Random Replacement (RR) Passive Preference list Fail and then replace Active preference list Selection Strategies

  7. Traces • Synthetic traces • PDF • a = 1.5 and b fixed so that mean is 30 minutes

  8. Simulation Setup • Event-based simulator • Selection algorithm to react immediately after each change • Chord protocol simulator • No loss, except the node fail when then datagram is in flight • At least 10 trails • Sample 1000 random nodes • 95% confidence intervals

  9. Basic Properties • Synthetic Pareto lifetimes • Fixed k = 50 • Fixed strategies are the same • The same mean session time

  10. Benefit of Replacement Strategies • 1.3~5 times improvement • The dynamically selecting nodes for long-running distributed application would be worthwhile

  11. Benefit of Replacement Strategies • The best fixed strategies match the performance of the best replacement one • The trace are shorter

  12. Agnostic Strategies • RR is worse for small k, but is with in a factor of 2 of Max Expectation • RR is 1.2~3 times better than Passive and 2.5~10 times better than Active PL

  13. Road Map • Introduction • Simulation • Basic Properties • Analysis • Applications • Discussion • Conclusion

  14. Analysis of Fixed and PL strategies • Fixed strategies • Node recover instantaneously • Each failure and recovery, normalized by time • The number of a node failure • Expected churn • Passive Preference List strategies • If k is large, then same as Fixed strategies • Active Preference List strategies • It pays more to switch back after the recovery of the node

  15. Analysis of Random Replacement • Intuition • Waiting time paradox • RR is (roughly) selecting the current session of a random node • This is biased towards longer sessions • RR does very badly when stable nodes are rare • One with mean r >> 1 and others’ are 1 • Churn of RR is about 2 and the best fixed strategies is • Churn rate

  16. Analysis of Random Replacement • Agreement of the analysis with a simulation for n = 20 and the previous Pareto-distributed session time plot

  17. Characteristics of Random Replacement • X’ is more skewed than X • If E[X’] = E[X], then • x’ and x are the yth percentile values of X’ and X • The churn of RR decreases as the distributions become more “skewed” • If the session time distributions are stable and have equal mean , RR’s expected churn is at most twice the expected churn of any fixed or Preference List strategy

  18. Road Map • Introduction • Simulation • Basic Properties • Analysis • Applications • Discussion • Conclusion

  19. Anycast • Whenever its current server fails, it obtains a list of the m servers to which it has lowest latency and connects to random on of these m • Switching to another server is not counted • Latencies were obtained from a synthetic edge network delay space generator • It is modeled on measurements of latency between DNS servers

  20. Anycast • Trade of between server list m and latency t • t increases => Passive PL • m increases => RR • hybrid: • ω decrease: Passive PL to Longest Uptime

  21. Anycast • When session time is small, the end host experiences the mean server failure tare , as in Active PL

  22. DHT Neighbor Selection • Long-distant neighbor • Deterministic topology (Active PL) • Randomized topology (RR) • Simulation • Sample n nodes from Gnutella • Feed into Chord protocol simulator • Two node send message to a node with single key • It is failed when two message are lossed

  23. DHT Neighbor Selection • Randomized topology are more stable, but have slightly longer routes • Randomized topology also can reduce maintenance bandwidth

  24. Multicast • Select one of m suitable nodes as parent • Suitable: available bandwidth to serve another child • Strategies • Longest uptime, Minimum Depth, Minimum Latency • Homogeneous bandwidth

  25. Multicast

  26. DHT Replica Placement • Root set (Passive PL) • Nodes with ID closer to key (Object) should keep the replica • Root directory (RR) • Replica of directory is the same as root set • Replica may be on any node in the system • Simulation • Lazy replication • On equal footing

  27. DHT Replica Placement • There are many permanent failures in Gnutella traces

  28. Road Map • Introduction • Simulation • Basic Properties • Analysis • Applications • Discussion • Conclusion

  29. Discussion • When would one use Random Replacement? • Minimize churn • Longest Uptime • RR would be easier to implement • Uptime is not easy to determine • Network problem, liar • What about load balance? • The result do not address fairness between users

  30. Road Map • Introduction • Simulation • Basic Properties • Analysis • Applications • Discussion • Conclusion

  31. Conclusion • A guide to performance of a range of node selection strategies in real-world traces • Highlight and explain analytically the god performance of RR relative to smart strategies • Explain the performance implications of a variety of existing distributed systems designs

More Related