Improving Search in Peer-to-Peer Networks

Improving Search in Peer-to-Peer Networks Beverly Yang Hector Garcia-Molina Presented by Shreeram Sahasrabudhe (sas4@lehigh.edu)

Goals Basically - just trying to reduce nodes that handle a query. • Three search techniques: • Iterative Deepening • Directed BFS • Local Indices • Evaluation and extensive measurements of these techniques on the Gnutella network. • Ready-to-use results and recommendations.

Current Techniques • Gnutella –Breadth First Search (BFS) with depth limit D (typically 7). • Disadvantages • Wastage of resources • Inefficient • Freenet: Depth First Search (DFS) • Disadvantages • Poor Response Time

(TTL b-a) … Iterative Deepening • Required • System Wide policy P={a,b,c} • Time between successive iterations W. P = {a,b ,c} F r e e z e Resend [(TTL a) + query_id] S 1 a b Wait = W

Directed BFS • Send queries to a subset of nodes • Subset nodes selected by heuristics like : Select node … • That has highest number of results for provided queries • Whose response messages have taken lowest avg number of hops. • Who has forwarded most messages to our client • Who has the shortest messages queue

1 5 process process Local Indices • Each node n maintains an index of data for nodes within r hops • So a node can process a query on behalf of every node within r hops • small r = less storage. (e.g. for r(1)=70KB) P= {1,5} S 2 3 4

More work • Node Join • Sends join message with TTL of r, containing metadata over its collection • A node receiving a join messages sends a return join message with its metadata • Periodic refreshes • Cost ?? • QueryJoinRatio = Average ratio of queries to join messages • QueryUpdateRatio = Average ratio of queries to update messages

Experiment • Data Collection • Observed Gnutella network traffic for 1 month • Determined some general statistics like average number of files shared /user, query strings etc. • Iterative Deepening • For each query Q sent: log response message arriving in 2min. • Ping messages to all neighbors: hops and IP addr. • Same data used for Local Indices • Directed BFS • Same as above, but each query sent to single node.

Cost Nodes at depth N Size of query message Redundant edges between n-1 and n • Bandwidth Cost in BFS: • Processing Cost Size of Record Response messages from nodes n Total Records Size of header

Results • Iterative Deepening • Neighbors = 8 • Desired number of results Z=50 • Policies P={Pd = {d, d+1, … D} for d=1,2,3..D} COST • d = cost • W = cost • “overshooting” • W = time • d = time

Directed BFS • Studied 8 heuristics • ‘Random neighbor’ is baseline for comparison COST

Local Indices

Conclusions • Three new search systems specified and tested. • Recommend: Local Indices with r=1. Savings: 61% bandwidth 49% processing

Improving Search in Peer-to-Peer Networks

Improving Search in Peer-to-Peer Networks

Presentation Transcript

Peer-To-Peer Networks

Peer-to-Peer Networks

Gossip-based Search Selection in Hybrid Peer-to-Peer Networks

Self Regulated Search in Unstructured Peer-to-Peer Networks

Efficient Search in Peer to Peer Networks

Peer-to-Peer Networks

Peer-to-peer networks

Peer to peer search engine

Search and Replication in Unstructured Peer-to-Peer Networks

Peer to Peer Networks

Search and Replication in Unstructured Peer-to-Peer Networks

Peer-to-Peer Networks

Streaming in Peer-to-peer Networks

Peer-to-Peer Networks

Peer-to-peer networks

Peer-to-Peer Distributed Search

Peer-to-Peer Networks

Peer-to-Peer Networks

Peer-to-peer networks

Efficient Search in Peer to Peer Networks

Peer-to-Peer Search Algorithms