A measurement study of peer to peer file sharing systems
1 / 32

A Measurement Study of Peer-to-Peer File Sharing Systems - PowerPoint PPT Presentation

  • Uploaded on

A Measurement Study of Peer-to-Peer File Sharing Systems. Presented by Cristina Abad. Motivation. In a P2P file sharing system, peers are usually in the “edge” of the network Does this affect/limit the quality of the infrastructure?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' A Measurement Study of Peer-to-Peer File Sharing Systems' - samantha-koch

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


  • In a P2P file sharing system, peers are usually in the “edge” of the network

  • Does this affect/limit the quality of the infrastructure?

  • What are the characteristics of hosts that choose to participate?

  • Solution: Measure Gnutella and Napster traffic to help understand these issues


  • Crawler periodically takes “snapshot” of Napster/Gnutella

    • capture basic info (peers, files shared, …)

  • For peers discovered

    • measure bottleneck bandwidth

    • measure latency

    • track content and degree of sharing

  • Measure lifetime

    • track availability of peers (at P2P and IP level)

Crawling napster
Crawling Napster

  • Peers can only be discovered by querying index

  • Crawler issues queries with names of popular song artists

  • Query responses contain

    • IP, reported bandwidth, files shared (number, names and sizes)

  • Results:

    • Captured 40-60% of Napster hosts (contributing to 80-95% of total files)

    • Could not capture peers that do not share files

Crawling gnutella
Crawling Gnutella

  • Crawler uses ping/pong to discover peers

  • Each crawl captured aprox. 10000 peers

Measuring bandwidth
Measuring bandwidth

  • Reported bandwidth may not be accurate (ignorance or lies)

  • Use bottleneck bandwidth as approximation to available bandwidth

    • capacity of slowest host along path between two hosts

  • Used SProbe to actively measure both upstream and downstream bottleneck bandwidth

    • Similar to “packet pair” technique

Packet pair technique
Packet Pair Technique

  • Two packets queued next to each other at bottleneck link exit the link t seconds apart:

  • Then,

    Kevin Lai and Mary Baker. “Measuringbandwidth”. In Proceedings of IEEE INFOCOM '99. 1999.

s2: size of second packet

bbnl: bottleneck bandwidth

How many peers are server like
How many peers are server-like?

8% have upstream

bb  10Mbps

  • High-bandwidth, low latency, high availability

Measurement modeling and analysis of a peer to peer file sharing workload

Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload

Presented by

Cristina Abad

Three tiered approach
Three-tiered approach File-Sharing Workload

  • Analyze 200-day trace of Kazaa traffic

    • Considered only traffic going from U. Washington to the outside

  • Develop a model of multimedia workloads

    • Analyze and confirm hypothesis

  • Explore potential impact of locality -awareness in Kazaa

Contributions File-Sharing Workload

  • Obtained some useful characterizations of Kazaa’s traffic

  • Showed that Kazaa’s workload is not Zipf

    • Showed that other workloads (multimedia) may not be Zipf either

  • Presented a model of P2P file-sharing workloads based on their trace results

    • Validated the model through simulations that yielded results very similar to those from traces

  • Proved the usefulness of exploiting locality-aware request routing

Measurement results
Measurement results File-Sharing Workload

  • Users are patient

  • Users slow down as they age

  • Kazaa is not one workload

  • Kazaa clients fetch objects at-most-once

  • Popularity of objects is often short-lived

  • Kazaa is not Zipf

User characteristics 1
User characteristics (1) File-Sharing Workload

  • Users are patient

User characteristics 2
User characteristics (2) File-Sharing Workload

  • Users slow down as they age

    • clients “die”

    • older clients ask for less each time they use system

User characteristics 3
User characteristics (3) File-Sharing Workload

  • Client activity

    • Tracing used could only detect users when their clients transfer data

    • Thus, they only report statistics on client activity, which is a lower bound on availability

    • Avg session lengths are typically small (median: 2.4 mins)

      • Many transactions fail

      • Periods of inactivity may occur during a request if client cannot find an available server with the object

Object characteristics 1
Object characteristics (1) File-Sharing Workload

  • Kazaa is not one workload

Object characteristics 2
Object characteristics (2) File-Sharing Workload

  • Kazaa object dynamics

    • Kazaa clients fetch objects at most once

    • Popularity of objects is often short-lived

    • Most popular objects tend to be recently born objects

    • Most requests are for old objects

Object characteristics 3
Object characteristics (3) File-Sharing Workload

  • Kazaa is not Zipf

  • Web access patterns are Zipf: small number of objects are extremely popular, but there is a long tail of unpopular requests.

  • Zipf’s law: popularity of ith-most popular object is proportional to i-α, (α: Zipf coefficient)

  • (Zipf) looks linear on log-log scale

Model of p2p file sharing workloads
Model of P2P file-sharing workloads File-Sharing Workload

  • On average, a client requests 2 objects/day

  • P(x): probability that a user requests an object of popularity rank x  Zipf(1)

    • Adjusted so that objects are requested at most once

  • A(x): probability that a newly arrived object is inserted at popularity rank x  Zipf(1)

  • All objects are assumed to have same size

  • Use caching to observe performance changes (effectiveness  hit rate)

Model simulation results
Model – Simulation results File-Sharing Workload

  • File-sharing effectiveness diminishes with client age

    • System evolves towards one with no locality and objects chosen at random from large space

  • New object arrivals improve performance

    • Arrivals replenish supply of popular objects

  • New clients cannot stabilize performance

    • Can’t compensate for increasing number of old clients

    • Overall bandwidth increases in proportion to population size

Model validation
Model validation File-Sharing Workload

  • By tweaking the arrival rate of of new objects, were able to match trace results (with 5475 new arrivals per year)

Exploring locality awareness
Exploring locality-awareness File-Sharing Workload

  • Currently organizations shape or filter P2P traffic

  • Alternative strategy: exploit locality in file-sharing workload

    • Caching; or,

    • Use content available within organization to substantially decrease external bandwidth usage

    • Result: 86% of externally downloaded bytes could be avoided by using an organizational proxy

Questions? File-Sharing Workload

Analysis File-Sharing Workload

  • How can results obtained be used when evaluating P2P schemes?

  • Are any of the measurements obtained biased?

  • Peers are heterogeneous

    • Incentives

    • Enforcement (e.g. super-peers in Kazaa)

SProbe File-Sharing Workload

  • Works in uncooperative environments

  • Works on asymmetric network paths

  • Exploit properties of TCP protocol

    • Send SYN packet with large payload; then, measure time dispersion of received RST packet

Zipf File-Sharing Workload

  • Linguist George Kingsley Zipf observed that for many frequency distributions, the n-th largest frequency is proportional to a negative power of the rank order n

  • "Zipf's law" is also sometimes used to refer to the corresponding probability distribution

  • Is an instance of a power law

  • Zipf's law is often demonstrated by plotting the data, with the axes being log(rank order) and log(frequency). If the points are close to a single straight line, the distribution follows Zipf's law.