Peer Pressure: Distributed Recovery in Gnutella

1 / 31

Peer Pressure: Distributed Recovery in Gnutella - PowerPoint PPT Presentation

Peer Pressure: Distributed Recovery in Gnutella

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

1. Peer Pressure: Distributed Recovery in Gnutella Pedram Keyani Brian Larson Muthukumar Senthil Computer Science Department Stanford University

2. Introduction • Gnutella is a P2P file sharing protocol • The issue we are addressing is distributed recovery from malicious attacks in Gnutella • Our solution is a mechanism for proactive failure detection and recovery • Our experimental process and models • The fruits of our labor: RESULTS!

3. Failure in Gnutella • Failure of nodes in Gnutella can be caused by any number of reasons • Failure of 4% of the most highly connected nodes in Gnutella fragments the network to the point where it is unusable by anyone • The exact details of this are outlined in work done by Stefan Saroiu

4. Scale Free Networks (Gnutella, Internet) • Abide by power law where • # of nodes of degree N is proportional toN -lambda • Lambda is observed to be roughly 2.3 • Scale Free networks are highly resilient to large scale random failures but weak for malicious attacks on the most highly connected well known nodes

5. Exponential Networks • Connections between nodes are random • No preferential connections ensures no node holds the entire network together • They react the same way to malicious attacks and random failures

6. Scale Free and Exponential

7. Our Hypothesis In order to allow Gnutella to recover from malicious attacks nodes must plan for failures by discovering and maintaining backup connections to form an exponential network. These backups will be used to replace dead neighbors in the case of a malicious attack.

8. Recovery Method • Build and maintain a virtual exponential network connecting all the nodes • Accomplish this through random node discovery • Detect malicious attacks on active network • Switch over to exponential network

9. Random Node Discovery • Problem: no centralized name authority to give a truly random node • Solution: use random walks through the network to arrive at random node • Random Discovery Ping (RDP) is forwarded to only one of a node’s neighbors, selected in such a way to give a random distribution • RDPs use a hop count of 20, roughly equal to the network diameter

10. Maintenance of Virtual Exponential Network • Each node discovers N random nodes, where N is the minimum number of connections the node wants to maintain • Then periodically ping these nodes to make sure they are alive • Discover new neighbors to replace them should they die

11. Failure Detection • Random failures result in loss of 1st degree neighbors • Malicious attacks result in greater loss of 2nd degree neighbors than 1st degree • Keep a history (30 seconds) of 1st and 2nd degree neighbor loss • If 2nd degree loss exceeds 1st degree loss and a threshold (50%), mark as malicious

12. Reacting to Failures • For each neighbor lost, replace it with a node from the virtual exponential network • Only nodes local to an attack will switch, preserving the rest of the network structure • Do not attempt to discover additional random nodes during an attack • When attack is deemed to be over, return to normal operations

13. P2P Simulator • Generalized P2P network simulator • Handles message routing, time management • Support for bringing nodes up or down, injecting failures, logging • Also created a compatible Gnutella client, and our enhanced Gnutella client • About 5k lines of Java

14. Modeling Gnutella • No standard way to do this • Protocol only specifies message formats • Clients free to implement other aspects • Some degree of standardization • We used the most common client in our simulation model - Bearshare

15. Bootstrapping • How do nodes connect in our simulation? • Defunct www.gnutellahosts.com • Maintain list of highly-available, well-connected nodes • Clients connect by receiving one of these nodes • Bearshare clients do something similar • Connect to service “pubic.bearshare.net” • Keep a range of neighbors (3-10)

16. Uptime Distribution • How long do nodes stay up in our simulation? • Modeled by a power law function • Most nodes are up for a short period of time, few are up for a long period • Many users just sign off after getting their content • Most users are dialup users • Within a reasonable time slice, nodes have uptimes following the power law distribution

17. Our Experiments • Ran with recovery method and without • No failures – just ran our simulator without removing any nodes (control) • Malicious attack on most highly connected nodes

18. Malicious Attack • Ran the experiment for 10 minutes • We removed 5% of the most highly connected nodes over a 5 minute interval in the middle • Representative of a coordinated distributed attack on the network

19. Metrics • Large number of metrics that we could have used • We picked metrics that measure • How partitioned the network is • How useful the network is in sending queries

20. Size of Largest Connected Component • Largest set of nodes V, where any vm and vn V have a path between each other • Measures the number of nodes that can potentially communicate with each other • Can get any data from any other node

21. # of Connected Components • Number of separate pieces of the network • If number of CC’s is large then the network is heavily partitioned • Not possible to retrieve content between CC’s • Want to monitor this number to make sure it is not increasing

22. Nodes Reachable Within 6 Hops • Sum of number of 1st, 2nd . . ., 6th degree neighbors of a node • End to end measurement of how many nodes you can reach with a query • Typically queries are forwarded about 6 nodes • Rough estimate of the number of nodes a user can search.

23. Results – Largest CC

24. Results – Number of CCs

25. Results - % of nodes within 6 hops

26. Failure Detection Results

27. Random Node Distribution

28. Messages Per Node Results

29. Conclusions • By planning for and detecting failures our recovery method can drastically increase the likelihood that the network will not become partitioned • It lessens the impact of malicious attacks on the querying capability of the network

30. Further Work • Investigating other techniques for random node discovery • Restoring network to a scale free topology immediately following failures • How the Gnutella network has changed over time

31. Thanks • Stefan Saroiu and Steven Gribble for letting us use their data and giving us advice • Armando Fox, George Candea, Dave Patterson, Aaron Brown Bling-Bling Industries, 2001