1 / 29

P2P Architecture Case Study: Gnutella Network

P2P Architecture Case Study: Gnutella Network. Matei R î peanu The University of Chicago. Why analyze Gnutella network?. Unprecedented scale up to 100k nodes, 100TB data, 10M files today Self-organizing network Staggering growth more than 50 times during first half of 2001

hlane
Download Presentation

P2P Architecture Case Study: Gnutella Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. P2P Architecture Case Study:Gnutella Network Matei Rîpeanu The University of Chicago

  2. Why analyze Gnutella network? • Unprecedented scale • up to 100k nodes, 100TB data, 10M files today • Self-organizing network • Staggering growth • more than 50 times during first half of 2001 • Open architecture, simple and flexible protocol • Interesting mix of social and technical issues

  3. Overview • Gnutella protocol • Tools for exploring the network • Network growth • Structural graph analysis • Is Gnutella a power-law network? • Generated (overhead) network traffic • Traffic estimates • Overlay network topology mapping

  4. Gnutella protocol overview • P2P file sharing application on top of an overlay network • Nodes maintain open TCP connections • Messages are broadcasted (flooded) or back-propagated • Protocol:

  5. A Gnutella search mechanism • Steps: • Node 2 initiates search for file A 7 1 4 2 6 3 5

  6. A A A Gnutella search mechanism • Steps: • Node 2 initiates search for file A • Sends message to all neighbors 7 1 4 2 6 3 5

  7. A A A A Gnutella search mechanism • Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message 7 1 4 2 6 3 5

  8. A A A A:5 A:7 Gnutella search mechanism • Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message 7 1 4 2 6 3 5

  9. A A A:5 A:7 Gnutella search mechanism • Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is back-propagated 7 1 4 2 6 3 5

  10. A:5 A:7 Gnutella search mechanism • Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is back-propagated 7 1 4 2 6 3 5

  11. Gnutella search mechanism • Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is back-propagated • File download download A 7 1 4 2 6 3 5

  12. Tools for network exploration • Eavesdropper- insert modified nodes into the network to eavesdrop traffic. • Crawler- connects to all active nodes and uses the membership protocol to discover graph topology. • Client-server approach. • Graph analysistools • high-volume offline computations.

  13. Network growth • High user interest • Users tolerate high latency, low quality results • Better resources • DSL and cable modem nodes grew from 24% to 41% over first 6 months. Today >50%. • Open architecture / open-source environment • Competing implementations • Lower overhead network traffic, improved resource utilization, better structure

  14. Growth invariants (1): avg. node connectivity • 3.4 links per node on average

  15. Growth invariants (2): network diameter • Node-to-node distance maintains similar distribution • Average node-to-node distance grew 25% while the network grew 50 times over 6 months

  16. Is Gnutella a power-law network? Power-law networks: the number of links per node follows a power-law distribution Examples: • the Internet, • in/out links to/from HTML pages, • citation network, • US power grid, • social networks. November 2000 Implications: High tolerance to random node failure but low reliability when facing of an ‘intelligent’ adversary

  17. Is Gnutella a power-law network? • Later, larger networks display a bimodal distribution • Implications: • High tolerance to random node failures preserved • Increased reliability when facing an attack. May 2001

  18. Overview • Gnutella protocol • Network growth • Structural graph analysis • Generated network traffic: • Traffic estimates • Does Gnutella overlay network topology match the underlying resources.

  19. Trafficanalysis •  6-8 kbps per link over all connections • Traffic structure changed over time

  20. Total generated traffic 1Gbps (or 330TB/month)! • Compare to 15,000TB/month in US Internet backbone (Dec. 2000) • Note that this estimate excludes actual file transfers • Q: Does it matter? Reasoning: • QUERYandPINGmessages are flooded. They form more than 90% of generated traffic • predominant TTL=7 • >95% of nodes are less than 7 hops away • measured traffic at each link about 6kbs • network with 50k nodes and 170k links

  21. Topology mismatch The overlay network topology doesn’t match the underlying Internet infrastructure topology! • 40% of all nodes are in the 10 largest Autonomous Systems (AS) • Only 2-4% of all TCP connections link nodes within the same AS • Largely ‘random wiring’ • Entropy experiment gives similar results

  22. Conclusions • Gnutella: self-organizing, large-scale, P2P application based on overlay network. It works! • Growth hindered by the volume of generated traffic and inefficient resource use. • Discovered growth invariants specific to large-scale systems that: • Help predict resource usage • Give hints for better search and resource organization techniques.

  23. Thankyou! Questions?

  24. What’s next? • Organize the overlay network to match the underlying infrastructure topology. • Investigate methods for reducing traffic (query routing/filtering, better information organization). • Is Gnutella network a small-world network? What are the implications?

  25. Statistical laws of large-scale systems • Zipf’s law: the size of the rth largest occurrence of the event is inversely proportional to it's rank: y ~ r -b, with b close to unity. • Power law distributions: Probability distribution of event X is P[X=x]=x-k • Pareto distribution: Cumulative probability distribution P[X>x]=x–(k-1) =x– Zipf, Pareto and power-law distributions are basically different ways to express the same phenomenon

  26. F F A A E E B B D D G G C C H H F F A A E E B B G G D D C C H H

  27. Overview • Gnutella protocol • Network growth • Statistical properties of large-scale systems • Power-law distributions. • Power-law networks. • Generated (overhead) network traffic.

  28. Power-law distributions Probability distribution of event X is P[X=x]=x–k Present all over WWW and Internet space: the number of HTML pages within a site, visits to a site, links to a page, cache document popularity, etc

  29. Power-law distributions in Gnutella • Number of shared files per node • Query popularity follows a power-law distribution [Kas01] • Implications: • Caching is an effective solution to reduce traffic and query latency • New search and node organizing mechanisms!

More Related