Exploiting content localities for efficient search in p2p systems
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

Exploiting Content Localities for Efficient Search in P2P Systems PowerPoint PPT Presentation


  • 41 Views
  • Uploaded on
  • Presentation posted in: General

Exploiting Content Localities for Efficient Search in P2P Systems. Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary, USA 2 Los Alamos National Laboratory, USA 3 Michigan State University, USA. Network manager. Don’t be so greedy, the Internet

Download Presentation

Exploiting Content Localities for Efficient Search in P2P Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Exploiting content localities for efficient search in p2p systems

Exploiting Content Localities for Efficient Search in P2P Systems

Lei Guo1 Song Jiang2 Li Xiao3 and

Xiaodong Zhang1

1College of William and Mary, USA

2Los Alamos National Laboratory, USA

3Michigan State University, USA


Peer to peer search

Network manager

Don’t be so greedy, the Internet

is shared by all the people!

Peer-to-Peer Search

  • Two Performance Objectives

    • Individual peer: improve the search quality

    • Internet management: minimize the search cost

Fast, fast, fast, and

the more the better!

P2P user


Existing solutions

Existing Solutions

  • Generally aim to one of the two objectives and have performance limits to the other

  • Flooding:

    • Most effective for user’s experience

    • Least efficient for network resource utilization

  • Random walk:

    • Traffic efficient, but

    • Long response time and limited number of search results


Super node architecture

Super-Node Architecture

  • Super-node

    • Index server for its leaf nodes

  • Problems

    • Index based search has limits

      • Hard for full-text search

      • Impossible for encrypted content search

    • Not responsible for the content quality of its leaf nodes

    • The structure becomes large and inefficient.

      • A leaf node has to connect to multiple super-nodes to avoid single point failure

      • Generating an increasingly large number of super-nodes


Gnutella population in one day 2003

Gnutella Population in One Day (2003)

number of peers

number of super peers

One super node only connects to 3-4 peers in average!


Outline

Outline

  • Our Measurement Study

  • CAC: Constructing Content Abundant Cluster

  • SPIRP: Selectively Prefetching Indices from Responding Peers

  • CAC-SPIRP: Combining CAC and SPIRP

  • Performance Evaluation

  • Conclusion


Our measurement study

Our Measurement Study

  • Existing measurement studies

    • A small percentage of popular files account for most shared storage and transmissions in P2P systems

    • A small amount of peers contribute majority number of files in P2P.

    • They are only the indirect evidence of content locality

      • Some files may be never accessed, or accessed rarely

  • Our purpose

    • Fully understand the localities in the peer community and individual peers

    • Get first-hand traces for our simulation study


Trace collection

Trace Collection

  • Four-day crawling on the Gnutella network

    • Open source code of LimeWire Gnutella

    • Session based collection (for the whole life time of peers)

  • Query sending traces by different peers

    • 25,764 peers

    • 409,129 queries

  • Content indices of different peers

    • Full indices of 18,255 peers

    • 37% free riders


Content locality in the peer community

Queries Replied by Top Query Responders (%)

Results Replied by Top Result Providers (%)

100

100

100

100

Percentage of Peers (%)

80

80

60

60

80

80

40

40

20

20

60

60

0

0

100 101 102 103 104

Number of Queries

40

40

Percentage of Peers (%)

20

20

0

0

0 20 40 60 80 100

Top Content Providers (in percentage)

100 102 104 106

Number of Results

Content Locality in the Peer Community

A small group of peers can reply nearly all queries and provide most of results


The localities of search interests of individual peers

100

60

50

80

40

60

30

40

20

20

10

0

0

The Localities of Search Interests of Individual Peers

  • A peer can get search results from a small number of its top query responders: they share the same search interests

  • Similar to the idea in Locality of Interest scheme, but our conclusion is based on real P2P systems

Result Contributions (%)

Query Contributions (%)

top 1 top 10 top 5% top 10% top 20%

top 1 top 10 top 5% top 10% top 20%

Top Query Responders

Top Result Providers


Reorganizing the p2p management structure

Reorganizing the P2P Management Structure

  • Clustering those small number of content abundant peers

  • Prefetching indices from those top query responders


Cac constructing c ontent a bundant c luster

CAC: Constructing Content Abundant Cluster

  • Objectives

    • Clustering those small number of content abundant peers in P2P overlay

    • Providing high quality and fast service

  • Content Abundant Cluster

    • An overlay on top of P2P network

    • Self-evaluate, self-identify, and self-organize

    • Persistent public service for all peers in the system

    • Strong content-based (not index-based)


Cac system structure

2

3

2

1

2

1

2

1

1

3

0

0

2

0

0

2

1

3

0

0

0

2

3

3

1

1

1

2

2

3

2

2

2

3

2

3

CAC: System Structure

Clustering

Leveling

Dynamic Update

C A C

X

4


Cac search operations

CAC: Search Operations

  • Queries are sent to CAC first

    • Up-flowing operation

    • Flooding in CAC

  • Unsatisfied queries are propagated from CAC to the whole system

    • Down-flooding operation

    • Propagated from low levels to high levels


Up flowing

2

3

2

1

2

1

2

1

1

3

0

0

2

0

0

2

1

3

0

0

0

3

1

1

1

2

2

2

2

2

3

2

3

Up-flowing

C A C

4


Down flooding

2

3

2

1

2

1

2

1

1

3

0

0

2

0

0

2

1

3

0

0

0

3

1

1

1

2

2

2

2

2

3

2

3

Down-flooding

Unused links

C A C

4


Spirp selectively prefetching indices from responding peers

SPIRP: Selectively Prefetching Indices from Responding Peers

  • Basic operations

    • Peer I initiates a query q

      • Query hits: displays the results

      • Misses: sends q

    • Peer R responds query q

      • sends query results as well as

      • piggybacks indices of all shared files

    • Peer I receives response

      • Display the searching results as well as

      • stores piggybacked indices

  • Indices updating

    • Active updating indices by responding peers

    • Updating indices demanded by requesting peers

  • Replacement of file indices


Spirp technique

Where are these files?

SPIRP Technique

Classic music

R1

I

Pop music

R2

Query = “Beethoven mp3”


Exploiting content localities for efficient search in p2p systems

Where are these files?

SPIRP Technique

classic

R1

I

pop

NULL

R2

Query = “Beetle mp3”


Exploiting content localities for efficient search in p2p systems

SPIRP Technique

classic

R1

I

pop

R2

Query = “Beetle mp3”


Exploiting content localities for efficient search in p2p systems

SPIRP Technique

classic

R1

No enough space to

save indices

I

pop

R2

Query = “Beetle mp3”


Exploiting content localities for efficient search in p2p systems

SPIRP Technique

classic

R1

Replace complete

I

pop

R2

Query = “Beetle mp3”


Cac spirp

CAC-SPIRP

  • CAC: application level infrastructure

    • Significantly reducing bandwidth consumption

    • Good response time when queries success in CAC

    • Long response time when queries fail in CAC

  • SPIRP: client-oriented and overlay independent

    • Significantly reducing response time

    • Small traffic when queries can be satisfied in cache

    • Same traffic as flooding when cache misses

  • CAC-SPIRP

    • Easy to combine the two techniques

    • Consider the trade-off between the two performance objectives

    • Has both merits of search quality and search cost


Simulation environment

Simulation Environment

  • Content trace and query trace

    • 4 day Gnutella crawling in our measurement

  • Overlay topology

    • Traces by Clip2 Distributed Search Solutions

  • Session duration

    • Pareto distribution fitted from measurement results

      P(x) = 14.5311 * x -1.8598


Evaluation metrics

Evaluation Metrics

  • Query success rate

    • CAC: success rate in CAC (normalized to flooding)

    • SPIRP: success rate in local cache (normalized to flooding)

  • Overall network traffic

    • accumulated communication traffics for all queries, responses, and index transferring (normalized to flooding)

  • Average response time

    • use the number of routing hops (normalized to flooding)

  • Evaluate for different query satisfactions

    • 1, 10, 50 results, representing different user demands


Performance evaluation for cac

1

1

0.8

0.8

0.6

0.6

0.4

0.2

0.4

0

0 10 20 30 40 50

0 10 20 30 40 50

0 10 20 30 40 50

Cluster Size (In Percentage of P2P Network Size)

Cluster Size (In Percentage of P2P Network Size)

Cluster Size (In Percentage of P2P Network Size)

0.2

2

1.5

0

0 10 20 30 40 50

1

Cluster Size (In Percentage of P2P Network Size)

0.5

0

Performance Evaluation for CAC

Overall Traffic (Normalized)

Success Rate in CAC (normalized)

Minimum Results = 1

Minimum Results = 10

Minimum Results = 50

Minimum Results = 1

Minimum Results = 10

Minimum Results = 50

Avg Response Time (Normalized)

5% top content abundant peers are good

enough for cluster construction

Minimum Results = 1

Minimum Results = 10

Minimum Results = 50


Cac member selection

1

1

2

0.8

0.8

1.5

0.6

1

0.6

0.4

0.5

0.2

0.4

0

0

0.2

0

CAC Member Selection

Avg Response Time (Normalized)

Success Rate in CAC (normalized)

Minimum Results = 1

Minimum Results = 10

Minimum Results = 50

0 0.01 0.02 0.03 0.04

Success Response Rate of CAC Peers

Minimum Results = 1

Minimum Results = 10

Minimum Results = 50

Overall Traffic (Normalized)

Minimum Results = 1

Minimum Results = 10

Minimum Results = 50

0 0.01 0.02 0.03 0.04

Success Response Rate of Content-Abundant Peers

  • Overall traffic is not sensitive to CAC member quality

  • Traffic can be significantly reduced even for

  • randomly selected CAC members

  • CAC down flooding is very efficient

0 0.01 0.02 0.03 0.04

Success response rate of CAC Peers


Cac spirp overall performance

Peers having 1 to 5 queries satisfied

Peers having 10 to 20 queries satisfied

Peers having 30 to 40 queries satisfied

Peers having at least 50 queries satisfied

Peers having 1 to 5 queries satisfied

Peers having 10 to 20 queries satisfied

Peers having 30 to 40 queries satisfied

Peers having at least 50 queries satisfied

0 2 4 6 8 10

Size of Incoming Index Set Buffer (in M Bytes)

Query Satisfaction = 1

Query Satisfaction = 10

Query Satisfaction = 50

0 2 4 6 8 10

Size of Incoming Index Set Buffer (in M Bytes)

CAC-SPIRP Overall Performance

1

Success Rate in Local Cache

2

Average Response Time (Normalized)

0.8

1.6

0.6

0.4

1.2

0.2

0.8

0

0.4

1

Overall Traffic (Normalized)

0.8

0

0 2 4 6 8 10

0.6

Size of Incoming Index Set Buffer (in M Bytes)

0.4

CAC-SPIRP reduces both the overall traffic

and response time significantly

0.2

0


Conclusion

Conclusion

  • CAC-SPIRP fundamentally addresses the P2P search problem by a re-organization.

    • Exploiting organizational content locality

      • CAC: a content abundant cluster provides high quality and fast services.

    • Exploiting user content locality

      • SPIRP: a client prefetching technique to speed up search by avoiding unnecessary queries


  • Login