A Measurement Study of Peer-to-Peer File Sharing Systems
This presentation is the property of its rightful owner.
Sponsored Links
1 / 64

Presentation by Nanda Kishore Lella [email protected] PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on
  • Presentation posted in: General

A Measurement Study of Peer-to-Peer File Sharing Systems by Stefan Saroiu P. Krishna Gummadi Steven D. Gribble. Presentation by Nanda Kishore Lella [email protected] Outline. P2P Overview What is a peer? Example applications Benefits of P2P P2P Content Sharing Challenges

Download Presentation

Presentation by Nanda Kishore Lella [email protected]

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Presentation by nanda kishore lella lella 2 wright

A Measurement Study of Peer-to-Peer File Sharing SystemsbyStefan SaroiuP. Krishna GummadiSteven D. Gribble

Presentation

by

Nanda Kishore Lella

[email protected]


Outline

Outline

  • P2P Overview

    • What is a peer?

    • Example applications

    • Benefits of P2P

  • P2P Content Sharing

    • Challenges

    • Group management/data placement approaches

    • Measurement studies

  • Conclusion


What is peer to peer p2p

What is Peer-to-Peer (P2P)?

  • Most people think of P2P as music sharing

    Examples:

  • Napster

  • Gnutella


What is a peer

What is a peer?

  • Contrasted with Client-Server model

  • Servers are centrally maintained and administered

  • Client has fewer resources than a server


What is a peer1

What is a peer?

  • A peer’s resources are similar to the resources of the other participants

  • P2P – peers communicating directly with other peers and sharing resources


P2p application taxonomy

P2P Application Taxonomy

P2P Systems

Distributed Computing

File Sharing

Collaboration

Platforms

JXTA


P2p goals benefits

P2P Goals/Benefits

  • Cost sharing

  • Resource aggregation

  • Improved scalability/reliability

  • Increased autonomy

  • Anonymity/privacy

  • Dynamism

  • Ad-hoc communication


P2p file sharing

P2P File Sharing

  • Content exchange

    • Gnutella

  • File systems

    • Oceanstore

  • Filtering/mining

    • Opencola


Research areas

Research Areas

  • Peer discovery and group management

  • Data location and placement

  • Reliable and efficient file exchange

  • Security/privacy/anonymity/trust


Current research

Current Research

  • Group management and data placement

    • Chord, CAN, Tapestry, Pastry

  • Anonymity

    • Publius

  • Performance studies

    • Gnutella measurement study etc.


Management placement challenges

Management/Placement Challenges

  • Per-node state

  • Bandwidth usage

  • Search time

  • Fault tolerance/resiliency


Approaches

Approaches

  • Centralized

  • Flooding

  • Document Routing


Centralized

Centralized

Bob

Alice

  • Napster model

  • Benefits:

    • Efficient search

    • Limited bandwidth usage

    • Efficient network handling

  • Drawbacks:

    • Central point of failure

    • Limited scale

Judy

Jane


Flooding

Flooding

Carl

Jane

  • Gnutella model

  • Benefits:

    • No central point of failure

    • Limited per-node state

  • Drawbacks:

    • Slow searches

    • Bandwidth intensive

Bob

Alice

Judy


Document routing

Document Routing

001

012

  • FreeNet, Chord, CAN, Tapestry, Pastry model

  • Benefits:

    • More efficient searching

    • Limited per-node state

  • Drawbacks:

    • Limited fault-tolerance vs redundancy

212 ?

212 ?

332

212

305


Document routing can

Document Routing – CAN

  • Associate to each node and item a unique id in an d-dimensional space

  • Goals

    • Scales to hundreds of thousands of nodes

    • Handles rapid arrival and failure of nodes

  • Properties

    • Routing table size O(d)

    • Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes

Slide modified from another presentation


Can example two dimensional space

CAN Example: Two Dimensional Space

  • Space divided between nodes

  • All nodes cover the entire space

  • Each node covers either a square or a rectangular area

  • Example:

    • Node n1:(1, 2) first node that joins  cover the entire space

7

6

5

4

3

n1

2

1

0

0

2

3

4

6

7

5

1

Slide modified from another presentation


Can example two dimensional space1

CAN Example: Two Dimensional Space

  • Node n2:(4, 2) joins  space is divided between n1 and n2

7

6

5

4

3

n2

n1

2

1

0

0

2

3

4

6

7

5

1

Slide modified from another presentation


Can example two dimensional space2

CAN Example: Two Dimensional Space

  • Node n2:(4, 2) joins  space is divided between n1 and n2

7

6

n3

5

4

3

n2

n1

2

1

0

0

2

3

4

6

7

5

1

Slide modified from another presentation


Can example two dimensional space3

CAN Example: Two Dimensional Space

  • Nodes n4:(5, 5) and n5:(6,6) join

7

6

n5

n4

n3

5

4

3

n2

n1

2

1

0

0

2

3

4

6

7

5

1

Slide modified from another presentation


Can example two dimensional space4

CAN Example: Two Dimensional Space

  • Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6)

  • Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5);

7

6

n5

n4

n3

f4

5

4

f1

3

n2

n1

2

f3

1

f2

0

0

2

3

4

6

7

5

1

Slide modified from another presentation


Can example two dimensional space5

CAN Example: Two Dimensional Space

  • Each item is stored by the node who owns its mapping in the space

7

6

n5

n4

n3

f4

5

4

f1

3

n2

n1

2

f3

1

f2

0

0

2

3

4

6

7

5

1

Slide modified from another presentation


Can query example

CAN: Query Example

  • Each node knows its neighbors in the d-space

  • Forward query to the neighbor that is closest to the query id

  • Can route around some failures

    • some failures require local flooding

  • Example: assume n1 queries f4

7

6

n5

n4

n3

f4

5

4

f1

3

n2

n1

2

f3

1

f2

0

0

2

3

4

6

7

5

1

Slide modified from another presentation


Can query example1

7

6

n5

n4

n3

f4

5

4

f1

3

n2

n1

2

f3

1

f2

0

0

2

3

4

6

7

5

1

CAN: Query Example

Slide modified from another presentation


Can query example2

7

6

n5

n4

n3

f4

5

4

f1

3

n2

n1

2

f3

1

f2

0

0

2

3

4

6

7

5

1

CAN: Query Example

Slide modified from another presentation


Can query example3

7

6

n5

n4

n3

f4

5

4

f1

3

n2

n1

2

f3

1

f2

0

0

2

3

4

6

7

5

1

CAN: Query Example

Slide modified from another presentation


Node failure recovery

Node Failure Recovery

  • Simple failures

    • know your neighbor’s neighbors

    • when a node fails, one of its neighbors takes over its zone

  • More complex failure modes

    • simultaneous failure of multiple adjacent nodes

    • scoped flooding to discover neighbors

    • hopefully, a rare event

Slide modified from another presentation


Document routing chord

N5

N10

N110

K19

N20

N99

N32

N80

N60

Document Routing – Chord

  • MIT project

  • Uni-dimensional ID space

  • Keep track of log N nodes

  • Search through log N nodes to find desired key


Document routing chord 2

Each node and key is assigned an id.

If a node needs a key,

searches in its table of

n nodes for the key.

If fails, goes to the last node of its table and repeats until it finds the key.

Search through log N nodes to find desired key

N5

N10

N110

K19

N20

N99

N32

N80

N60

Document Routing – Chord(2)


Doc routing tapestry pastry

43FE

993E

13FE

73FE

F990

04FE

9990

ABFE

239E

1290

Doc Routing – Tapestry/Pastry

  • Global mesh of meshes

  • Suffix-based routing

  • Uses underlying network distance in constructing mesh


Naming in tapestry

Every node has a 4 bit

name similar to IP address

Each bit in the name

can hold 16 types

Keys present at the node are in accordance with

the node name.

43FE

993E

13FE

73FE

F990

04FE

9990

ABFE

239E

1290

Naming in Tapestry


Tapestry routing

7598

B4F8

Msg to 4598

4598

9098

6789

B437

Tapestry Routing


Remaining problems

Remaining Problems?

  • Hard to handle highly dynamic environments

  • Methods don’t consider peer characteristics


Measurement studies

Measurement Studies

Gnutella vs. Napster


Presentation by nanda kishore lella lella 2 wright

R

R

Q

Q

P

napster.com

P

P

D

S

S

Q

P

P

P

P

R

S

S

P

Q

P

P

P

Q

D

P

P

Q

Napster

Gnutella

Q

peer

query

P

D

file download

R

response

server

S


Methodology

Methodology

2 stages:

  • periodically crawl Gnutella/Napster

    • discover peers and their metadata

  • feed output from crawl into measurement tools:

    • bottleneck bandwidth – SProbe

    • latency – SProbe

    • peer availability – LF

    • degree of content sharing – Napster crawler


Crawling

Crawling

  • May 2001

  • Napster crawl

    • query index server and keep track of results

    • query about returned peers

    • don’t capture users sharing unpopular content

  • Gnutella crawl

    • send out ping messages with large TTL


Measurement study

Measurement Study

  • How many peers are server-like…client-like?

  • Bandwidth, latency

  • Connectivity

  • Who is sharing what?


Graph results

Graph results

  • CDF: cumulative distribution function

  • From this graph, we see that while 78% of the participating peers have downstream bottleneck

    bandwidths of at least 1000Kbps

  • Only 8% of the peers have upstream bottleneck bandwidths of at least 10Mbps.

  • 22% of the participating peers have upstream bottleneck bandwidths of 100Kbps or less.


Reported bandwidth

Reported Bandwidth


Graph results1

Graph results

  • The percentage of Napster users connected with modems (of 64Kbps or less) is

  • About 25%, while the percentage of Gnutella users with similar connectivity is as low as 8%.

  • 50% of the users in Napster and 60% of the users in Gnutella use broadband connections

  • only about 20% of the users in Napster and 30% of the users in Gnutella have very high bandwidth connections

  • Overall, Gnutella users on average tend to have higher

    downstream bottleneck bandwidths than Napster users.


Graph results2

Graph results

  • Approximately 20% of the peers have latencies of at least 280ms,

  • Another 20% have latencies of at most 70ms


Graph results3

Graph results

  • This graph illustrates the presence of two clusters; a smaller one situated at (20-60Kbps, 100-1,000ms) and a larger one at over (1,000Kbps, 60-300ms).

  • horizontal lines in the graph they predicate that the latency also depends on the location of the peer for measuring system.


Measured uptime

Measured Uptime


Number of shared files

Number of Shared Files


Correlation of free riding with b w

Correlation of Free-Riding with B/W


Power law

Power law

  • A connected cluster of peers that spans the entire network survives even in the presence of a large percentage p of random peer breakdowns, where p can be as large as:

where m is the minimum node degree and K is the maximum node degree.

α <3.


Gnutella

Gnutella

  • Popular sites:

  • 212.239.171.174

  • adams-00-305a.Stanford.EDU

  • 0.0.0.0

1771 hosts

Fri Feb 16 05:21:52-05:23:22 PST


30 random failures

30% random failures

1771 – 471 – 294 hosts

Fri Feb 16 05:21:52-05:23:22 PST


4 orchestrated failures

4% orchestrated failures

1771 - 63 hosts

Fri Feb 16 05:21:52-05:23:22 PST


Results overview

Results Overview

  • Lots of heterogeneity between peers

    • Systems should consider peer capabilities

  • Peers lie

    • Systems must be able to verify reported peer capabilities or measure true capabilities


Points of discussion

Points of Discussion

  • Is it all hype?

  • Should P2P be a research area?

  • Do P2P applications/systems have common research questions?

  • What are the “killer apps” for P2P systems?


Conclusion

Conclusion

  • P2P is an interesting and useful model

  • There are lots of technical challenges to be solved


  • Login