1 / 32

Topology Mapping of Peer-to-Peer Systems

Topology Mapping of Peer-to-Peer Systems. Suat Mercan Sep 23 th , 2009 CS 790G: COMPLEX NETWORKS. Outline. Characterization of users of P2P systems Saroiu, et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002. Effect of P2P traffic on the underlying network

sasilvia
Download Presentation

Topology Mapping of Peer-to-Peer Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topology Mapping ofPeer-to-Peer Systems Suat Mercan Sep 23th, 2009 CS 790G: COMPLEX NETWORKS

  2. Outline • Characterization of users of P2P systems • Saroiu, et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002. • Effect of P2P traffic on the underlying network • Sen, et.al., “Analyzing peer-to-peer traffic across large networks”, IMW’02 • Peer-to-Peer Topologies • Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design”, IEEE Internet Computing, 2002. • Searching on the P2P network • Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, 2001 • Deciphering proprietary P2P systems (like Kazaa) • Leibowitz, et.al., “Deconstructing the Kazaa Network”, WIAPP, 2003.

  3. Introduction to Peer-to-Peer (P2P) systems • End-systems (or peers), are capable of behaving as clients and servers of data, hence system is scalable and reliable • Peers participation is voluntary, membership is dynamic, hence topology keeps changing • Most popularly used for file sharing, hence peer-to-peer systems have become synonymous with peer-to-peer file sharing networks

  4. Classification of P2P systems • Centralized (e.g. Napster) • Decentralized • Structured (e.g. Chord, CAN, Pastry, Tapestry) • Unstructured (e.g. Gnutella, Kazaa, Freenet, eDonkey, eMule, Direct Connect, …)

  5. Popularity of unstructured decentralized P2P networks • Gnutella host count, maintained by Limewire (http://www.limewire.com) • good scope for measurement studies because: • deployed and widely used • use a lot of bandwidth during data transfer, hence a concern for network operators • quite a few measurement studies have been done on these systems

  6. Gnutella protocol overview • Connecting to the Gnutella network • bootstrap using GWebCache system and locally cached hostlist • Ping/Pong messages are exchanged with potential neighbors • Searching on the network • Query messages are flooded on the network • QueryHit messages are received (back-propagated along Query path) from peers having the requested content • Downloading the content • peers download files directly from peers having the requested content

  7. Characterization of Users of P2P systems • Latency • Lifetime of peers • Bottleneck bandwidth • Number of files shared and downloaded • Degree of cooperation

  8. Measurement Methodology • active crawling of the Napster and Gnutella systems • Napster: issued queries for popular content, and then queried central server for peer information • Gnutella: used ping/pong messages in protocol to get metadata about peers, and then their neighbors and so on Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002

  9. Host Lifetime analysis • 20% peers in Napster, Gnutella have IP-level uptime of 93% or more • Napster peers have higher application uptimes than Gnutella peers • the best 20% of Napster peers have uptime of 83% or more and the best 20% of Gnutella peers have uptime of 45% or more • median session duration is 60 minutes for Napster and Gnutella Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002

  10. Latency analysis (Gnutella) • 20% peers have a latency of at most 70ms and 20% have a latency of at least 280ms • correlation between downstream bottleneck bandwidth and latency: two clusters for modems (20-60Kbps, 100-1000ms) and broadband (1Mbps, 60-300ms) Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002

  11. Downloads, Uploads and Shared Files • relative number of downloads and uploads varies significantly across bandwidth classes • clear client/server behavior of different classes Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002

  12. Shared files v/s Shared Data(Napster and Gnutella) • Strong correlation between number of files shared and amount of shared MB of data • slope of both lines is 3.7MB, the size of a typical MP3 audio file Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002

  13. Degree of Cooperation (Napster) • 30% of the peers report bandwidth as 64Kbps or less, but actually have significantly higher bandwidths • 10% of the peers reporting higher bandwidths (3Mbps or higher) actually have significantly lower bandwidth Saroiu et.al., “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN, 2002

  14. Effect of P2P traffic on underlying network • host distribution and host connectivity • traffic volume and mean bandwidth usage • traffic patterns over time • connection duration and on-time methodology: passive measurements at routers (port based)

  15. Datasets used for analysis • FastTrack is most popular in terms of number of hosts participating and average traffic volume per day • rapid growth of P2P traffic is mainly caused by increasing number of hosts in the system • Direct Connect systems have higher traffic volume per IP address S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002

  16. Host distribution analysis • # of IP addresses in FastTrack ranges from 0.5 to 2 million • ratio of # of IP addresses in FastTrack:Gnutella:DirectConnect is 150:30:1 • Density of a prefix is the number of unique active IP addresses belonging to it • Density of an AS is the number of unique prefixes belonging to it • FastTrack hosts are distributed more densely than Gnutella and Direct Connect hosts (64:16:4) S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002

  17. Host connectivity analysis (FastTrack) • 48% of individual IPs communicate with at most one IP and 89% with at most 10 IPs • 75% of prefixes and ASes communicate with at least 2 prefixes or ASes • very few hosts have very high connectivity and most hosts have very low connectivity S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002

  18. Connection duration and On-time (FastTrack) • 50% of the IPs are online for less than one minute/day • 60% IPs, 40% prefixes, 30% ASes stay for less than 10 mins/day • 65% of the IPs join only once • AS, prefix level- not very transient S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW, 2002

  19. Peer-to-Peer Topologies • Goal: To discover and analyze the Gnutella overlay topology and evaluate generated traffic • methodology: active crawling

  20. Gnutella Network Growth • number of nodes in the largest connected component in the Gnutella network • significantly larger network found during Memorial Day and Thanksgiving • 50 times increase within 6 months Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems”, 2002

  21. Distribution of node-to-node shortest paths • more than 95% node pairs are at most 7 hops away • longest node-to-node path is 12 hops Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems”, 2002

  22. Averag node connectivity • average number of connections per node remains constant = 3.4 Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems”, 2002

  23. Node connectivity distribution • Nov 2000: Gnutella nodes organize themselves in a power law • March 2001: connectivity does not look like a power law for all nodes; power law distribution is preserved for nodes with more than 10 links; for less than 10 links, the distribution is almost constant Ripeanu, et.al., “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems”, 2002

  24. Searching on the P2P network • methodology: passive measurements at one or two peers, made part of the Gnutella network, to log queries and query messages routed through it

  25. Query popularity distribution • two distinct distributions of document popularity, with a break at query rank 100 • most popular documents are equally popular • less popular documents follow a Zipf-like distribution, with alpha beween 0.63 and 1.24 K. Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, 2001.

  26. Deciphering P2P systems

  27. File download distribution by bytes • CDF of byte popularity distribution for 10%, 1% most popular files • 0.8 % of all files account for 80% of the generated traffic • 0.1% of the most bandwidth hungry files (top 1% of all files) generate 50% traffic Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003

  28. File size distribution • note the log-scale on X-axis • 3 distinct modes • 100KB for pictures • 2-5MB for music files • 700MB for movies Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003

  29. Quantity and Rate of Distinct Files • new files seen at different time scales- every day, hour, minute • 150,000 distinct files during a 17-day period • daily graph: new files seen continued to decrease, but no steady state value (rate of injection of files in the network) achieved • hourly graph: time of day effect • per-minute graph: 50 new files seen every minute on an average Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003

  30. Rate of change of popularity of files • percentage of files that make it to the N most popular files list- (a) in consecutive intervals and (b) after T intervals, compared with first list • measurement interval is 24 hours • 15% of the highly popular files remain popular throughout the experiment, and the rest are popular at short time intervals Leibowitz, M. Ripeanu and A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP, 2003

  31. Open Questions • Mapping a global snapshot of the entire Gnutella topology • Bootstrapping of peers in unstructured peer-to-peer systems • More efficient searching on P2P networks- efforts in this direction include random walks, bloom-filter based techniques etc. • End-point privacy/anonymity is absent in most of these peer-to-peer networks

  32. References • Papers covered in the seminar: • S. Saroiu, P. Gummadi and S. Gribble, “A Measurement Study of Peer-to-Peer File Sharing Systems”, MMCN 2002. • S. Sen and J. Wang, “Analyzing peer-to-peer traffic across large networks”, IMW 2002. • M. Ripeanu, I. Foster, A. Iamnitchi, “Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design”, IEEE Internet Computing, 2002. • Sripanidkulchai, “The popularity of Gnutella queries and its implications on scalability”, 2001. • N. Leibowitz, M. Ripeanu, A. Wierzbicki, “Deconstructing the Kazaa Network”, WIAPP 2003. • Papers not covered in the seminar: • J. Chu, K.Labonte and B. Levine, “Availability and Locality Measurements of Peer-to-Peer File Systems”, SPIE, July 2002. • F. Bustamante and Y. Qiao, “Friendships that last: Peer lifespan and its role in P2P protocols”, WCW 2003. • R. Bhagwan, S. Savage and G. Voelker, “Understanding Availability”, IPTPS 2003. • Saroiu, et.al., “An Analysis of Internet Content Delivery Systems”, OSDI 2002. • Markatos et.al., “Tracing a large-scale Peer-to-Peer System: An hour in the life of Gnutella”, CCGrid 2002.

More Related