Characteristics of Current P2P File-Sharing Systems
Download
1 / 60

Stefan Saroiu P. Krishna Gummadi Steven Gribble University of Washington - PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on

Characteristics of Current P2P File-Sharing Systems (with a brief excursion into network measurement tools). Stefan Saroiu P. Krishna Gummadi Steven Gribble University of Washington. Peer-to-Peer Frenzy. Both research and industrial excitement

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Stefan Saroiu P. Krishna Gummadi Steven Gribble University of Washington' - tuwa


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Characteristics of Current P2P File-Sharing Systems(with a brief excursion into network measurement tools)

Stefan Saroiu

P. Krishna Gummadi

Steven Gribble

University of Washington


Peer to peer frenzy
Peer-to-Peer Frenzy

  • Both research and industrial excitement

    • CAN, Chord, Past, Tapestry, JXTA, Farsite, Publius, Morpheus, AudioGalaxy

  • Basic Premise

    • wide-area, distributed system

    • voluntary, ad-hoc, dynamic home-user peers exchange information (mostly large files)

  • Many proposals, yet nobody knows the participating peers’ characteristics and behavior


Napster gnutella
Napster & Gnutella

napster.com

P

P

D

S

S

Q

P

P

P

P

R

S

S

P

R

R

Q

P

P

Q

Q

P

Q

D

P

P

Q

P

Napster

Gnutella

Q

peer

query

P

D

file download

R

response

server

S


Methodology
Methodology

2 stages:

  • periodically crawl Gnutella/Napster

    • discover peers and their metadata

  • feed output from crawl into measurement tools:

    • bottleneck bandwidth – SProbe

    • latency – SProbe

    • peer availability – LF

    • degree of content sharing – Napster crawler


Network bandwidth scenarios
Network Bandwidth Scenarios

  • Network measurements

  • Dynamic server/peer selection

  • P2P overlay formation

    • or application-level multicast

  • Placement of content replicas


Network bandwidth
Network Bandwidth

  • Throughput:

    • number of transferred bytes during a fix interval of time

  • Available bandwidth:

    • the maximum attainable throughput of a newly started flow

  • Bottleneck bandwidth:

    • maximum throughput ideally obtained across the slowest link

  • Hard to measure:

    • throughput, available bandwidth

  • Easier to measure:

    • bottleneck bandwidth


One packet model
One-Packet Model

probing packet

Traversal Time

1

slope =

bottleneck

bandwidth

Packet Size


Packet pair model
Packet-Pair Model

packet

size

=

bottleneck

bandwidth

Δt

bottleneck

bandwidth

time dispersion

proportional to

bottleneck bandwidth


Vital properties of an ideal tool
Vital Properties of an Ideal Tool

  • Accurate

  • Fast:

    • 1 min/measurement too slow

  • Scalable:

    • flooding the network will not work

  • Works in Uncooperative Environments

    • can’t deploy software at both endpoints


Properties of an ideal tool
Properties of an Ideal Tool

  • Active:

    • existent traffic might not be suitable

  • TCP/UDP based:

    • ICMP heavily filtered

  • Cross-traffic resilient:

    • should detect and give up in the face of cross traffic

  • Works on Asymmetric Paths

  • Flexible to Bandwidth Changes

  • Controlled Evaluations



Sprobe uses tcp tricks
SProbe Uses TCP Tricks

  • From local host To remote host

    • No cooperation needed

Local

Remote

SYN packet

RST packet


Sprobe uses tcp tricks1
SProbe Uses TCP Tricks

  • From local host To remote host

    • No cooperation needed

Local

Remote

SYN packet

RST packet


Sprobe uses tcp tricks2
SProbe Uses TCP Tricks

  • From local host To remote host

    • No cooperation needed

Local

Remote

SYN packet

RST packet


Sprobe uses tcp tricks3
SProbe Uses TCP Tricks

  • From local host To remote host

    • No cooperation needed

Local

Remote

SYN packet

RST packet


Sprobe uses tcp tricks4
SProbe Uses TCP Tricks

  • From local host To remote host

    • No cooperation needed

Local

Remote

SYN packet

RST packet


Sprobe uses tcp tricks5
SProbe Uses TCP Tricks

  • From local host To remote host

    • No cooperation needed

Local

Remote

SYN packet

RST packet


Sprobe uses tcp tricks6
SProbe Uses TCP Tricks

  • From local host To remote host

    • No cooperation needed

Local

Remote

SYN packet

RST packet


Sprobe uses tcp tricks7
SProbe Uses TCP Tricks

  • From local host To remote host

    • No cooperation needed

Local

Remote

SYN packet

RST packet


Sprobe uses tcp tricks8
SProbe Uses TCP Tricks

  • From local host To remote host

    • No cooperation needed

Local

Remote

SYN packet

RST packet


Sprobe uses tcp tricks9
SProbe Uses TCP Tricks

  • From local host To remote host

    • No cooperation needed

Local

Remote

SYN packet

RST packet


Sprobe uses tcp tricks10
SProbe Uses TCP Tricks

  • From local host To remote host

    • No cooperation needed

Local

Remote

SYN packet

RST packet


Sprobe uses tcp tricks11
SProbe Uses TCP Tricks

  • From local host To remote host

    • No cooperation needed

Local

Remote

SYN packet

RST packet


Sprobe uses tcp tricks12
SProbe Uses TCP Tricks

  • From remote To local

    • Involuntary cooperation of application layer

Local

Remote

(Web)

HTTP Get request

Data packet

ACK (last data packet)




More sprobe
More SProbe

  • Bottleneck Bandwidth

  • Latency

  • Availability (LF):

    • send a SYN packet

    • receive:

      • SYN/ACK – host active

      • RST – host inactive, but online

      • nothing – host offline


P2p characteristics
P2P Characteristics

  • How many peers are “server-like”?

  • Who are the free-riders?

  • Do peers tend to lie?

  • How robust is the Gnutella overlay?


P2p characteristics1
P2P Characteristics

  • How many peers are “server-like”?

  • Who are the free-riders?

  • Do peers tend to lie?

  • How robust is the Gnutella overlay?



Most peers have cable modem like bandwidths
Most Peers have Cable Modem-like Bandwidths





Availability
Availability

  • Period probes yield data like:

end

start


Availability1
Availability

  • Period probes yield data like:

  • Divide into two periods

  • Keep segments that:

    • start in 1st period

    • end in 1st or 2nd periods

    • draw conclusion only on segments no larger than 2nd period

end

start

12 hours




P2p characteristics2
P2P Characteristics

  • How many peers are “server-like”?

  • Who are the free-riders?

  • Do peers tend to lie?

  • How robust is the Gnutella overlay?





P2p characteristics3
P2P Characteristics

  • How many peers are “server-like”?

  • Who are the free-riders?

  • Do peers tend to lie?

  • How robust is the Gnutella overlay?




P2p characteristics4
P2P Characteristics

  • How many peers are “server-like”?

  • Who are the free-riders?

  • Do peers tend to lie?

  • How robust is the Gnutella overlay?


Power law networks are here to stay
Power-Law Networks are here to Stay

  • Barabasi and Albert showed that networks which…

    • grow by continuous addition of new nodes

    • exhibit preferential attachment (likelihood of connecting to a node depends on the node’s degree)

  • …power-law distribution of vertex degree

  • Internet, WWW, Gnutella


Resilience to failures
Resilience to Failures

  • Power-law networks (Cohen et al.):

    • very resilient in face of random node failures

      • a giant spanning cluster still exists

    • fairly resilient in face of cascading failures

    • very vulnerable in face of orchestrated attacks (towards high-degree nodes)


Gnutella

Popular sites:

  • 212.239.171.174

  • adams-00-305a.Stanford.EDU

  • 0.0.0.0

Gnutella

1771 hosts

Fri Feb 16 05:21:52-05:23:22 PST


30 random failures
30% random failures

1771 – 471 – 294 hosts

Fri Feb 16 05:21:52-05:23:22 PST


4 orchestrated failures
4% orchestrated failures

1771 - 63 hosts

Fri Feb 16 05:21:52-05:23:22 PST


Discussion
Discussion

  • Heterogeneity:

    • 3 orders of magnitude of bandwidth

      • 50Kbps-100Mbps

    • 6 orders of magnitude of latency

      • 10us-10s

    • >4 orders of magnitude in availability

      • 1%-99.99%

  • Peers should not be treated as equals


Cooperating well behaved peers
Cooperating, Well-Behaved Peers

  • Incentive:

    • game-theoretic approaches of enforcing local behavior for global benefit

  • System enforcement:

    • peers can:

      • measure each others characteristics (SProbe)

      • enforce the reported ones

        • a reported 56Kbps peer should not download content at higher speed


Feedback to current proposals
Feedback to Current Proposals

  • CAN, Chord, Past:

    • great memory and lookup algorithms:

      • log(N) time and space

    • at the price of maintaining rigid network structure: hypercubes, butterflies, Plaxton trees

    • unclear how network structure is maintained given heterogeneity and dynamics of peers

  • Conjecture – these networks will have a hard time stabilizing:

    • will need lots of routine, maintenance traffic


Instead gnutella
Instead Gnutella…

  • Easy join procedure:

    • this simplicity gave Gnutella its power-law shape

  • Easy to implement protocol (broadcast)

  • Lots of maintenance traffic already

    • although the protocol has become smarter with its subsequent versions

  • Searching is a nightmare


Document popularity
Document Popularity

  • Follows Zipf distribution

    • long-tailed

  • Popular documents become more popular with Napster/Gnutella

  • Currently, need to resubmit queries in the hope that someone will answer

  • Wish-list based system


Wide area network measurements
Wide-area Network Measurements

  • Sending a few packets can be identified with hostile behavior

  • Even a few SYN packets are sufficient to trigger software firewalls

    • dialogue box pops up – possible scan from washington.edu, click OK or Cancel

  • Many confused, angry, threatening e-mails sent to many people (security, root, Ed):

    • active Internet measurements are not simple to perform


Excerpt from e mail
Excerpt from e-mail

“Thank you for your reply. Unfortunately, I did not authorise anybody from washington.edu to attempt to crack into my computer. Attempting to break into computers is a crime in Australia. Please advise the names and contact details of the people involved in this "research" so that I can contact the Australian Federal Police, who will no doubt contact your Federal Bureau of Investigation to investigate this incident and institute criminal proceedings against those concerned.”


Current work
Current Work

  • Quantify and show that current proposals are too rigid for Napter/Gnutella-like peers dynamics

  • Wish-list, delayed exchange system

    • big distributed scheduling problem

  • SGet

    • a downloading tool with automatic server selection

    • no bandwidth is wasted


Questions
Questions?

Beautiful Sieg Hall

“Pride of UW”


ad