130 likes | 153 Views
Study tradeoffs in node failure detection in P2P networks to optimize control resources while reducing false positives. Analyze techniques and models for efficient failure detection.
E N D
Exploring Tradeoffs in Failure Detection in P2P Networks Shelley Zhuang, Ion Stoica, Randy Katz Sahara Retreat January, 2003
Problem Statement • One of the key challenges to achieve robustness in overlay networks: quickly detect a node failure • Canonical solution: each node periodically pings its neighbors • Study the fundamental limitations and tradeoffs between detection time, control overhead, and probability of false positives • Determine the optimal control resource allocation strategy for a given network topology, failure rate, and load distribution
Network Model • P2P system with n nodes • Each node A knows d other nodes • Average path length = l
Failure Model • Failure rate of each node is λf • Node up-time ~ i.i.d. T = exponential(λf) • Failstop failures • If a neighbor is lost, a node can use another neighbor to route the packet w/o affecting the path length
Packet Loss Probability • δ = average time it takes a node to detect that a neighbor has failed • Probability that a node forwards a packet to a neighbor that has failed is 1- e-λf δδλf P(T-t δ | Tt) = P(T<=δ) • Probability that the packet is lost is pl lδλf pdf T δ
Aliveness Techniques • Baseline • Each node sends a ping message to each of its neighbors every Δ seconds B C A D
Aliveness Techniques • Information Sharing • Piggyback failures of neighbors in acknowledgement messages • Best case: completely connected graph of degree d B C A D
Aliveness Techniques • Information Sharing with Boosting • When a node detects failure of a neighbor, D, it announces to all other nodes that have D as their neighbor • Best case: completely connected graph of degree d B C A D
Case Studies • d-regular network • Chord (PROBE_TO_THRESH) • Constant overhead: T seconds, S probes • Δ = Td/S • Tradeoff between loss probability and size of neighborset, d
d-Regular Network Packet Loss Probability
ChordPacket Loss Probability Sharing w/ boosting (simple)
baseline 3 0. 0000337177 boosting 10 0. 0000121711 ChordProbability of False Positive
Conclusion • Analyzed packet loss probability in a d-regular network • Examined four keep-alive techniques in Chord • By carefully designing keep-alive algorithms, it is possible to significantly reduce packet loss probability w/o additional control overhead • Boosting can achieve both lower packet loss probability and probability of false positive than baseline