a measurement study of peer to peer file sharing systems n.
Download
Skip this Video
Download Presentation
A Measurement Study of Peer-to-Peer File Sharing Systems

Loading in 2 Seconds...

play fullscreen
1 / 32

A Measurement Study of Peer-to-Peer File Sharing Systems - PowerPoint PPT Presentation


  • 139 Views
  • Uploaded on

A Measurement Study of Peer-to-Peer File Sharing Systems. Presented by Cristina Abad. Motivation. In a P2P file sharing system, peers are usually in the “edge” of the network Does this affect/limit the quality of the infrastructure?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' A Measurement Study of Peer-to-Peer File Sharing Systems' - samantha-koch


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
motivation
Motivation
  • In a P2P file sharing system, peers are usually in the “edge” of the network
  • Does this affect/limit the quality of the infrastructure?
  • What are the characteristics of hosts that choose to participate?
  • Solution: Measure Gnutella and Napster traffic to help understand these issues
methodology
Methodology
  • Crawler periodically takes “snapshot” of Napster/Gnutella
    • capture basic info (peers, files shared, …)
  • For peers discovered
    • measure bottleneck bandwidth
    • measure latency
    • track content and degree of sharing
  • Measure lifetime
    • track availability of peers (at P2P and IP level)
crawling napster
Crawling Napster
  • Peers can only be discovered by querying index
  • Crawler issues queries with names of popular song artists
  • Query responses contain
    • IP, reported bandwidth, files shared (number, names and sizes)
  • Results:
    • Captured 40-60% of Napster hosts (contributing to 80-95% of total files)
    • Could not capture peers that do not share files
crawling gnutella
Crawling Gnutella
  • Crawler uses ping/pong to discover peers
  • Each crawl captured aprox. 10000 peers
measuring bandwidth
Measuring bandwidth
  • Reported bandwidth may not be accurate (ignorance or lies)
  • Use bottleneck bandwidth as approximation to available bandwidth
    • capacity of slowest host along path between two hosts
  • Used SProbe to actively measure both upstream and downstream bottleneck bandwidth
    • Similar to “packet pair” technique
packet pair technique
Packet Pair Technique
  • Two packets queued next to each other at bottleneck link exit the link t seconds apart:
  • Then,

Kevin Lai and Mary Baker. “Measuringbandwidth”. In Proceedings of IEEE INFOCOM '99. 1999.

s2: size of second packet

bbnl: bottleneck bandwidth

how many peers are server like
How many peers are server-like?

8% have upstream

bb  10Mbps

  • High-bandwidth, low latency, high availability
measurement modeling and analysis of a peer to peer file sharing workload

Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload

Presented by

Cristina Abad

three tiered approach
Three-tiered approach
  • Analyze 200-day trace of Kazaa traffic
    • Considered only traffic going from U. Washington to the outside
  • Develop a model of multimedia workloads
    • Analyze and confirm hypothesis
  • Explore potential impact of locality -awareness in Kazaa
contributions
Contributions
  • Obtained some useful characterizations of Kazaa’s traffic
  • Showed that Kazaa’s workload is not Zipf
    • Showed that other workloads (multimedia) may not be Zipf either
  • Presented a model of P2P file-sharing workloads based on their trace results
    • Validated the model through simulations that yielded results very similar to those from traces
  • Proved the usefulness of exploiting locality-aware request routing
measurement results
Measurement results
  • Users are patient
  • Users slow down as they age
  • Kazaa is not one workload
  • Kazaa clients fetch objects at-most-once
  • Popularity of objects is often short-lived
  • Kazaa is not Zipf
user characteristics 1
User characteristics (1)
  • Users are patient
user characteristics 2
User characteristics (2)
  • Users slow down as they age
    • clients “die”
    • older clients ask for less each time they use system
user characteristics 3
User characteristics (3)
  • Client activity
    • Tracing used could only detect users when their clients transfer data
    • Thus, they only report statistics on client activity, which is a lower bound on availability
    • Avg session lengths are typically small (median: 2.4 mins)
      • Many transactions fail
      • Periods of inactivity may occur during a request if client cannot find an available server with the object
object characteristics 1
Object characteristics (1)
  • Kazaa is not one workload
object characteristics 2
Object characteristics (2)
  • Kazaa object dynamics
    • Kazaa clients fetch objects at most once
    • Popularity of objects is often short-lived
    • Most popular objects tend to be recently born objects
    • Most requests are for old objects
object characteristics 3
Object characteristics (3)
  • Kazaa is not Zipf
  • Web access patterns are Zipf: small number of objects are extremely popular, but there is a long tail of unpopular requests.
  • Zipf’s law: popularity of ith-most popular object is proportional to i-α, (α: Zipf coefficient)
  • (Zipf) looks linear on log-log scale
model of p2p file sharing workloads
Model of P2P file-sharing workloads
  • On average, a client requests 2 objects/day
  • P(x): probability that a user requests an object of popularity rank x  Zipf(1)
    • Adjusted so that objects are requested at most once
  • A(x): probability that a newly arrived object is inserted at popularity rank x  Zipf(1)
  • All objects are assumed to have same size
  • Use caching to observe performance changes (effectiveness  hit rate)
model simulation results
Model – Simulation results
  • File-sharing effectiveness diminishes with client age
    • System evolves towards one with no locality and objects chosen at random from large space
  • New object arrivals improve performance
    • Arrivals replenish supply of popular objects
  • New clients cannot stabilize performance
    • Can’t compensate for increasing number of old clients
    • Overall bandwidth increases in proportion to population size
model validation
Model validation
  • By tweaking the arrival rate of of new objects, were able to match trace results (with 5475 new arrivals per year)
exploring locality awareness
Exploring locality-awareness
  • Currently organizations shape or filter P2P traffic
  • Alternative strategy: exploit locality in file-sharing workload
    • Caching; or,
    • Use content available within organization to substantially decrease external bandwidth usage
    • Result: 86% of externally downloaded bytes could be avoided by using an organizational proxy
analysis
Analysis
  • How can results obtained be used when evaluating P2P schemes?
  • Are any of the measurements obtained biased?
  • Peers are heterogeneous
    • Incentives
    • Enforcement (e.g. super-peers in Kazaa)
sprobe
SProbe
  • Works in uncooperative environments
  • Works on asymmetric network paths
  • Exploit properties of TCP protocol
    • Send SYN packet with large payload; then, measure time dispersion of received RST packet
slide32
Zipf
  • Linguist George Kingsley Zipf observed that for many frequency distributions, the n-th largest frequency is proportional to a negative power of the rank order n
  • "Zipf's law" is also sometimes used to refer to the corresponding probability distribution
  • Is an instance of a power law
  • Zipf's law is often demonstrated by plotting the data, with the axes being log(rank order) and log(frequency). If the points are close to a single straight line, the distribution follows Zipf's law.