1 / 24

GIA: Making Gnutella-like P2P Systems Scalable

GIA: Making Gnutella-like P2P Systems Scalable. Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau. The Peer-to-peer Phenomenon. Internet-scale distributed system Distributed file-sharing applications E.g., Napster, Gnutella, KaZaA File sharing is the dominant P2P app

edison
Download Presentation

GIA: Making Gnutella-like P2P Systems Scalable

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau

  2. The Peer-to-peer Phenomenon • Internet-scale distributed system • Distributed file-sharing applications • E.g., Napster, Gnutella, KaZaA • File sharing is the dominant P2P app • Mass-market • Mostly music, some video, software

  3. The Problem • Potentially millions of users • Wide range of heterogeneity • Large transient user population • Existing search solutions cannot scale • Flooding-based solutions limit capacity • Distributed Hash Tables (DHTs) not necessarily appropriate

  4. Our Solution: GIA • Scalable Gnutella-like P2P system • Design principles: • Explicitly account for node heterogeneity • Query load proportional to node capacity • Results: • Gia outperforms Gnutella by 3–5 orders of magnitude

  5. Outline • Existing approaches • GIA: Scalable Gnutella • Results: Simulations & Experiments • Conclusion

  6. Gnutella • Distributed search and download • Unstructured: ad-hoc topology • Peers connect to random nodes • Random search • Flood queries across network • Scaling problems • As network grows, search overhead increases P6 P5 P4 has “madonna- american-life.mp3” P1 P4 who has “madonna” P2 has “madonna- ray-of-light.mp3” P3 P2

  7. Note: Not questioning the utility of DHTs in general, merely for mass-market file sharing Distributed Hash Tables (DHTs) • Structured solution • Given a filename, find its location • Can DHTs do file sharing? • Probably, but with lots of extra work:Caching, keyword searching • Do we need DHTs? • Not necessarily: Great at finding rare files, but most queries are for popular files

  8. Other Solutions • Supernodes [KaZaA] • Classify nodes as low- or high-capacity • Only pushes the problem to a bigger scale • Random Walks [Lv et al] • Forwarding is blind • Queries can get stuck in overloaded nodes • Biased Random Walks [Adamic et al] • Right idea, but exacerbates overloaded-node problem

  9. Outline • Existing approaches • GIA: Scalable Gnutella • Results: Simulations & Experiments • Conclusion

  10. GIA: 10,000-foot view • Unstructured, but take node capacity into account • High-capacity nodes have room for more queries: so, send most queries to them • Will work only if high-capacity nodes: • Have correspondingly more answers, and • Are easily reachable from other nodes

  11. GIA Design • Make high-capacity nodes easily reachable • Dynamic topology adaptation • Make high-capacity nodes have more answers • One-hop replication • Search efficiently • Biased random walks • Prevent overloaded nodes • Active flow control • Make high-capacity nodes easily reachable • Dynamic topology adaptation • Make high-capacity nodes have more answers • One-hop replication • Search efficiently • Biased random walks • Prevent overloaded nodes • Active flow control Query

  12. Dynamic Topology Adaptation • Make high-capacity nodes have high degree (i.e., more neighbors) • Per-node level of satisfaction, S: • 0  no neighbors, 1  enough neighbors • Function of: • Node’s capacity ● Neighbors’ capacities • Neighbors’ degrees ● Their age • When S << 1, look for neighbors aggressively

  13. Active Flow Control • Accept queries based on capacity • Actively allocation “tokens” to neighbors • Send query to neighbor only if we have received token from it • Incentives for advertising true capacity • High capacity neighbors get more tokens • Allocate tokens with weighted fair queuing

  14. Practical Considerations • Query resilience: node death • Periodic keep-alive messages • Query responses are implicit keep-alives • Determining node capacity • Function of bandwidth and “age” of node • Finding rare items • Bifurcate the random walk every 10 hops …

  15. Outline • Existing approaches • GIA: Scalable Gnutella • Results: Simulations & Experiments • Conclusion

  16. Simulation Results • Compare four systems • FLOOD: TTL-scoped, random topologies • RWRT: Random walks, random topologies • SUPER: Supernode-based search • GIA: search using GIA protocol suite • Metric: • Collapse point: aggregate throughput that the system can sustain

  17. Questions • What is the relative performance of the four algorithms? • Which of the GIA components matters the most? • How does the system behave in the face of transient nodes?

  18. GIA outperforms SUPER, RWRT & FLOOD by many orders of magnitude in terms of aggregate query load System Performance

  19. No single component is useful by itself; the combinationof them all is what makes GIA scalable Factor Analysis

  20. Even under heavy churn, GIA outperforms the otheralgorithms by many orders of magnitude Transient Behavior Static SUPER Static RWRT (1% repl)

  21. Deployment • Prototype client implementation using C++ • Deployed on PlanetLab: • 100 machines spread across 4 continents • Measured the progress of topology adaptation…

  22. Nodes quickly discover each other and soon reach their target “satisfaction level” Progress of Topology Adaptation

  23. Outline • Existing approaches • GIA: Scalable Gnutella • Results: Simulations & Experiments • Conclusion

  24. Summary • GIA: scalable Gnutella • 3–5 orders of magnitude improvement in system capacity • Unstructured approach is good enough! • DHTs may be overkill • Incremental changes to deployed systems • Status: Prototype implementation deployed on PlanetLab

More Related