A measurement study of peer to peer file sharing systems
This presentation is the property of its rightful owner.
Sponsored Links
1 / 32

A Measurement Study of Peer-to-Peer File Sharing Systems PowerPoint PPT Presentation


  • 104 Views
  • Uploaded on
  • Presentation posted in: General

A Measurement Study of Peer-to-Peer File Sharing Systems. Presented by Cristina Abad. Motivation. In a P2P file sharing system, peers are usually in the “edge” of the network Does this affect/limit the quality of the infrastructure?

Download Presentation

A Measurement Study of Peer-to-Peer File Sharing Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A measurement study of peer to peer file sharing systems

A Measurement Study of Peer-to-Peer File Sharing Systems

Presented by

Cristina Abad


Motivation

Motivation

  • In a P2P file sharing system, peers are usually in the “edge” of the network

  • Does this affect/limit the quality of the infrastructure?

  • What are the characteristics of hosts that choose to participate?

  • Solution: Measure Gnutella and Napster traffic to help understand these issues


Napster

Napster


Gnutella

Gnutella


Methodology

Methodology

  • Crawler periodically takes “snapshot” of Napster/Gnutella

    • capture basic info (peers, files shared, …)

  • For peers discovered

    • measure bottleneck bandwidth

    • measure latency

    • track content and degree of sharing

  • Measure lifetime

    • track availability of peers (at P2P and IP level)


Crawling napster

Crawling Napster

  • Peers can only be discovered by querying index

  • Crawler issues queries with names of popular song artists

  • Query responses contain

    • IP, reported bandwidth, files shared (number, names and sizes)

  • Results:

    • Captured 40-60% of Napster hosts (contributing to 80-95% of total files)

    • Could not capture peers that do not share files


Crawling gnutella

Crawling Gnutella

  • Crawler uses ping/pong to discover peers

  • Each crawl captured aprox. 10000 peers


Measuring bandwidth

Measuring bandwidth

  • Reported bandwidth may not be accurate (ignorance or lies)

  • Use bottleneck bandwidth as approximation to available bandwidth

    • capacity of slowest host along path between two hosts

  • Used SProbe to actively measure both upstream and downstream bottleneck bandwidth

    • Similar to “packet pair” technique


Packet pair technique

Packet Pair Technique

  • Two packets queued next to each other at bottleneck link exit the link t seconds apart:

  • Then,

    Kevin Lai and Mary Baker. “Measuringbandwidth”. In Proceedings of IEEE INFOCOM '99. 1999.

s2:size of second packet

bbnl: bottleneck bandwidth


How many peers are server like

How many peers are server-like?

8% have upstream

bb  10Mbps

  • High-bandwidth, low latency, high availability


A measurement study of peer to peer file sharing systems

  • Availability – Host uptimes


A measurement study of peer to peer file sharing systems

  • Availability – Session duration


Free riders

Free-riders


Is gnutella robust

Is Gnutella robust?


Measurement modeling and analysis of a peer to peer file sharing workload

Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload

Presented by

Cristina Abad


Three tiered approach

Three-tiered approach

  • Analyze 200-day trace of Kazaa traffic

    • Considered only traffic going from U. Washington to the outside

  • Develop a model of multimedia workloads

    • Analyze and confirm hypothesis

  • Explore potential impact of locality -awareness in Kazaa


Contributions

Contributions

  • Obtained some useful characterizations of Kazaa’s traffic

  • Showed that Kazaa’s workload is not Zipf

    • Showed that other workloads (multimedia) may not be Zipf either

  • Presented a model of P2P file-sharing workloads based on their trace results

    • Validated the model through simulations that yielded results very similar to those from traces

  • Proved the usefulness of exploiting locality-aware request routing


Measurement results

Measurement results

  • Users are patient

  • Users slow down as they age

  • Kazaa is not one workload

  • Kazaa clients fetch objects at-most-once

  • Popularity of objects is often short-lived

  • Kazaa is not Zipf


User characteristics 1

User characteristics (1)

  • Users are patient


User characteristics 2

User characteristics (2)

  • Users slow down as they age

    • clients “die”

    • older clients ask for less each time they use system


User characteristics 3

User characteristics (3)

  • Client activity

    • Tracing used could only detect users when their clients transfer data

    • Thus, they only report statistics on client activity, which is a lower bound on availability

    • Avg session lengths are typically small (median: 2.4 mins)

      • Many transactions fail

      • Periods of inactivity may occur during a request if client cannot find an available server with the object


Object characteristics 1

Object characteristics (1)

  • Kazaa is not one workload


Object characteristics 2

Object characteristics (2)

  • Kazaa object dynamics

    • Kazaa clients fetch objects at most once

    • Popularity of objects is often short-lived

    • Most popular objects tend to be recently born objects

    • Most requests are for old objects


Object characteristics 3

Object characteristics (3)

  • Kazaa is not Zipf

  • Web access patterns are Zipf: small number of objects are extremely popular, but there is a long tail of unpopular requests.

  • Zipf’s law: popularity of ith-most popular object is proportional to i-α, (α: Zipf coefficient)

  • (Zipf) looks linear on log-log scale


Model of p2p file sharing workloads

Model of P2P file-sharing workloads

  • On average, a client requests 2 objects/day

  • P(x): probability that a user requests an object of popularity rank x  Zipf(1)

    • Adjusted so that objects are requested at most once

  • A(x): probability that a newly arrived object is inserted at popularity rank x  Zipf(1)

  • All objects are assumed to have same size

  • Use caching to observe performance changes (effectiveness  hit rate)


Model simulation results

Model – Simulation results

  • File-sharing effectiveness diminishes with client age

    • System evolves towards one with no locality and objects chosen at random from large space

  • New object arrivals improve performance

    • Arrivals replenish supply of popular objects

  • New clients cannot stabilize performance

    • Can’t compensate for increasing number of old clients

    • Overall bandwidth increases in proportion to population size


Model validation

Model validation

  • By tweaking the arrival rate of of new objects, were able to match trace results (with 5475 new arrivals per year)


Exploring locality awareness

Exploring locality-awareness

  • Currently organizations shape or filter P2P traffic

  • Alternative strategy: exploit locality in file-sharing workload

    • Caching; or,

    • Use content available within organization to substantially decrease external bandwidth usage

    • Result: 86% of externally downloaded bytes could be avoided by using an organizational proxy


A measurement study of peer to peer file sharing systems

Questions?


Analysis

Analysis

  • How can results obtained be used when evaluating P2P schemes?

  • Are any of the measurements obtained biased?

  • Peers are heterogeneous

    • Incentives

    • Enforcement (e.g. super-peers in Kazaa)


Sprobe

SProbe

  • Works in uncooperative environments

  • Works on asymmetric network paths

  • Exploit properties of TCP protocol

    • Send SYN packet with large payload; then, measure time dispersion of received RST packet


A measurement study of peer to peer file sharing systems

Zipf

  • Linguist George Kingsley Zipf observed that for many frequency distributions, the n-th largest frequency is proportional to a negative power of the rank order n

  • "Zipf's law" is also sometimes used to refer to the corresponding probability distribution

  • Is an instance of a power law

  • Zipf's law is often demonstrated by plotting the data, with the axes being log(rank order) and log(frequency). If the points are close to a single straight line, the distribution follows Zipf's law.


  • Login