A Measurement Study of Peer-to-Peer File SharingSystems Stefan Saroiu, P. Krishna Gummadi, Steven D. Gribble Presented by Zhengxiang Pan March 18th, 2003
Introduction • Napster & Gnutella • Population of users • Bottleneck bandwidth of hosts & latencies • Duration time of remain connected • Number of files shared & downloaded
Methodology-architecture • Napster’s architecture • A cluster of central servers • Each peer connects to one server • Servers cooperate to process query • Gnutella’s architecture • No centralized servers • Peers form overlay network • Send a query by a controlled flood
Methodology-crawler • Napster crawler • A larger number of connections to a single server • Issue popular queries in parallel • Captured 40%-60% local users • Gnutella crawler • Iteratively send ping messages with large TTLs • Discover new hosts by receiving pong messages. • Capture 25%-50% of the total population
Methodology-directly measure characteristics • Latency • Measure the time spent by exchanging a 40-byte TCP packet. • Lifetime • Offline: not respond to TCP SYN packets • Inactive: respond with TCP RST • Active: accept the connection • Bottleneck bandwidth • Approximate to available bandwidth • Actively measure upstream and downstream using a few TCP packets
Results-bandwidth Downstream & upstream bottleneck bandwidth -50% in Napster & 60% in Gnutella use broadband connections -25% in Napster & 8% in Gnutella use modems -20% in Napster & 30% in Gnutella have high bandwidth (>3Mbps)
Result-reported bandwidth 22% in Napster report “unknown” bandwidth
Result- latency Latencies for Gnutella users -Unstructured, ad-hoc, a substantial fraction suffer from high-lantency -Difference in trans-oceanic peers
Result- availability -only 20% peers had an IP-level uptime of 93% or more -Median session duration : 60 minutes
Result-files -25% in Gnutella do not share any files -40%-60% peers share 5%-20% of the shared files
Result-download & upload the percentage of peers in each bandwidth class is roughly the same as the percentage of files shared by that bandwidth class.
Result- cooperate -30% of the users that report their bandwidth as 64 Kbps or less actually have a significantly greater bandwidth. -10% of the users reporting high bandwidth (3Mbps or higher) in reality have significantly lower bandwidth.
Result-resilience of Gnutella overlay Although highly resilient in the face of random breakdowns, Gnutella is nevertheless highly vulnerable in the face of well-orchestrated, targeted attacks.
Conclusion • Heterogeneity of hosts • Carefully delegate responsibilities • Clearly evidence of client-like and server-like behaviors • Peers tend to misreport information if there is an incentive to do so • Built-in incentive for telling the truth • Verify reported information