1 / 26

“A Local Search Mechanism for Peer-to-Peer Networks”

“A Local Search Mechanism for Peer-to-Peer Networks”. Vana Kalogeraki, Dimitrios Gunopulos & Demetris Zeinalipour (University of California – Riverside) < vana@cs.ucr.edu , dg@cs.ucr.edu , csyiazti@cs.ucr.edu >.

erich-bond
Download Presentation

“A Local Search Mechanism for Peer-to-Peer Networks”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “A Local Search Mechanism for Peer-to-Peer Networks” Vana Kalogeraki, Dimitrios Gunopulos & Demetris Zeinalipour (University of California – Riverside) < vana@cs.ucr.edu, dg@cs.ucr.edu, csyiazti@cs.ucr.edu > CIKM 2002 – Eleventh International Conference on Information and Knowledge Management November 4-9, Mclean VA http://www.cs.ucr.edu/~csyiazti/publications.html

  2. Presentation Outline • Introduction: Information Retrieval (I.R) in Peer-to-Peer networks. • Techniques for Distributed I.R. • Breadth-First Search. • Random Breadth-First Search. • Intelligent Search with profiling. • Experimental Evaluation. • Related Work. • Conclusions & Future Work.

  3. The virtual P2P topology The physical topology Introduction to Peer-to-Peer • Peer-to-Peer Computing definition: “Sharing of computer resources and information through direct exchange” • Clients (downloaders) are also servers • Clients may join or leave the network at any time => highly fault-tolerant but with a cost! • Searches are done within the virtual network while actual downloads are done offline (with HTTP).

  4. Introduction to Peer-to-Peer • Peer-to-Peer (P2P) systems are increasingly becoming popular. • P2P file-sharing systems, such as Gnutella, Napster and Freenet realized a distributed infrastructure for sharing files. • Traditionally, files were shared using the Client-Server model (e.g. http). Not scalable since they are centralized services. • P2P uncover new advantages in simplicity of use, robustness, self organization and scalability.

  5. keywords Information Retrieval in P2P Problem: “How to efficiently retrieve Information in P2P systems where each node shares a collection of documents?” • Documents consists of keywords. • Resembles Information Retrieval but resources are distributed now. • Primary Data Structures such as Global Inverted Indexes can’t be maintained efficiently.

  6. Solutions for P2P Information Retrieval 1) Centralized Approaches • Centralized Indexes • e.g. Napster, SETI@HOME 2) Purely Distributed Approaches • Each node has only local knowledge. • I.R is done using Brute force mechanisms • e.g. Gnutella, Fasttrack (Kazaa) 3) Hybrid Approaches • One or more peers have partial indexes of the contents of others. • e.g. Limewire's Ultrapeers Centralized Index 1) Upload Index 2) Query/QueryHit 3) Download (offline) 1 2 3 1) Connect 2) Query/QueryHit 3) Download (offline) 1,2 3 1) Connect 2) IntelligentQuery/QueryHit 3) Download (offline) 1,2 3

  7. Motivation • On 1st June we crawled the Gnutella P2P Network for 5 hours with 17 workstations. • We analyzed 15,153,524 query messages. • Observation: High locality of specific queries. • We try to exploit this property for more efficient searches?

  8. Presentation Outline • Introduction: Information Retrieval (I.R) in Peer-to-Peer networks. • Techniques for Distributed I.R. • Breadth-First Search. • Random Breadth-First Search. • Intelligent Search with profiling. • Experimental Evaluation. • Related Work. • Conclusions & Future Work.

  9. Techniques for Distributed I.R. • Breadth-First Search (Gnutella) • Each Query Message is propagated along all outgoing links of a peer using TTL (time-to-live). • TTL is decremented on each forward until it becomes 0 • Technique for I.R in P2P systems such as Gnutella. • Results? • The physical network comes to its knees • Long Delays for search results. P2P Network N A QUERY 1 QUERYHIT 2 Peer q Peer d

  10. Peer q Techniques for Distributed I.R. 2. Modified Random BFS • Each Query Message is forwarded to only a fraction of outgoing links (e.g. ½ of them). • TTL is again decremented on each forward until it becomes 0. • Results? • Fewer Messages but possibly less results • This algorithm is probabilistic. • Some segments may become unreachable unreachable B A QUERY 1 P2P Network N QUERYHIT C 2 Peer d

  11. Peer q Techniques for Distributed I.R. 3. Intelligent Search Mechanism (ISM) • Idea: Each Query Message is forwarded intelligently based on what queries a peer answered in the past. • Components of ISM (for each node u) • Profile Mechanism, for eachneighborN(u). • Peer Ranking Mechanism, for ranking peers locally and send a search query only to the ones that most likely will answer. • Similarity Function, for finding similar search queries. • Search Mechanism, for propagating queries based on local indexes A QUERY 1 profiles QUERYHIT 2 ? Peer d

  12. Techniques for Distributed I.R. 3. Intelligent Search Mechanism (ISM) a) Profile mechanism. • Maintains a list of past queries routed through that host. • Every time a QueryHit is received the table is updated • The profile manager uses a Least Recently Used policy to keep most recent queries in repository. • Profiles are kept for neighbors only so the cost for maintaining this cost is O(Td),Tis a limiting factor per profile, dis the degree of a node Size: T*d }

  13. Example Assume host Pkneeds to forward a query q=“italy disaster” to two of its peers {P1, P2, P3}.Pkmaintains queries {q1 ,q2,. ,q5}in its profile. => PsimP(P1, q) = 0.81 = 0.8 P1 Sim(q, q1) = 0.8 Sim(q, q2) = 0.6 Sim(q, q3) = 0.5 Sim(q, q4) = 0.4 Sim(q, q5) = 0.4 P2 { } => PsimP(P2, q) = 0.61 + 0.51 = 1.1 P3 { } => PsimP(P3, q) = 0.41 + 0.31 = 0.7 Techniques for Distributed I.R. 3. Intelligent Search Mechanism (ISM) b) Peer Ranking Mechanism. • Before forwarding a Query Message a peer performs an on-the-fly ranking of its peers to determine the best paths. • We use the Aggregate Similarity of peer Pi to a query q, computed by a peer Pk as:

  14. Techniques for Distributed I.R. 3. Intelligent Search Mechanism (ISM) c) Similarity Function – The cosine similarity. • Assume that Lis a set of all words (in Profile Manager)\ e.g. L={elections, bush, clinton, super, bowl, san, diego, … ,italy, earthquake, disaster} • We define an |L|-dimensional space where each query is a vector. If q=“italy disaster” => q (vector of q) = [0,0,0,…,1,0,1] • Recall that we have a vector for each qi stored in the Profile Manager ( i.e. qi)

  15. Peer q Techniques for Distributed I.R. 3. Intelligent Search Mechanism (ISM) d) Search Mechanism • Utilizes the Peer Ranking Mechanism to forward Queries to nodes that will potentially contain the info we are looking for Peer d profiles ? QUERY 1 ?

  16. Presentation Outline • Introduction: Information Retrieval (I.R) in Peer-to-Peer networks. • Techniques for Distributed I.R. • Breadth-First Search. • Random Breadth-First Search. • Intelligent Search with profiling. • Experimental Evaluation. • Related Work. • Conclusions & Future Work.

  17. mexico Data-Peer (e.g. usa) argentina Routing Structures (Profiles) u.k china italy XQL PDOM-XML P2P Network Manager Module india france greece germany usa.graph XML Data Files Experimental Evaluation • We use a decentralized Newspaper application built on top of the REUTERS dataset (22,531 documents grouped by 84 countries). • Random Network of 100 peers • Each peer has documents from 3 countries • The average degree of a node is 7 ~= log2100 (connected graph)

  18. Experimental Evaluation • We perform 400 sequential queries with a delay of 4 sec. • We compare Doc. Ratio (recall rate) vs. Num. of messages • BFS (Gnutella Message Flooding) (forward to degree nodes). • Modified BFS (randomly forward to degree/2 nodes). • Intelligent Search Mechanism (forward to M=3 highest rank nodes + 1 random).

  19. Experimental Evaluation • We measure Doc. Ratio (recall rate) vs. Num. of messages with Time-to-Live (TTL)=4 • BFS (Gnutella) uses ~763 messages w/ recall rate 100% • Random BFS(degree/2) uses ~120 (16%) msgs w/ recall rate 42% • Intelligent Search uses ~131 (17%) msgs w/ recall rate ~55% • Recall Rate improves over time with Intelligent Search since Peer Profiles get more knowledge.

  20. Experimental Evaluation • We again measure Doc. Ratio (recall rate) vs. Num. of messages by increasing Time-to-Live (TTL) = 5 • BFS (TTL=4) uses ~763 messages w/ recall rate 100% • Random BFS(degree/2) uses ~28% msgs w/ recall rate ~72% • Intelligent Search uses ~35%(of BFS msgs) w/ recall rate ~90% ! • A large number of peers receive unnecessary messages. • We get almost identical recall (90%) with only 35% of msgs

  21. Presentation Outline • Introduction: Information Retrieval (I.R) in Peer-to-Peer networks. • Techniques for Distributed I.R. • Breadth-First Search. • Random Breadth-First Search. • Intelligent Search with profiling. • Experimental Evaluation. • Related Work. • Conclusions & Future Work.

  22. Related Work • Improving Search in P2P B.Yang et al. (Stanford) • Iterative Deepening, until Z results are returned • Directed BFS based on aggregate statistics (e.g. num of results a peer returned, shortest queue, forwarded the most data) • Local Indexes, each node maintains an index over the data of peers r hops away. • Routing Indices for P2P Crespo et al. (Stanford) • Compound Indices, each node sends a clustered summary of its topic to its neighbors. (e.g. 100 databases, 4 theory, 10 OS) • Might be too costly for Highly dynamic P2P systems.

  23. Related Work • Freenet (Clark et al.) Search by Identifiers. uses SHA1 hashes of resources and information is retrieved based on the key closeness in a DFS manner. • Others such as Chord. Systems that focus on scalable object location, which becomes feasible by hashing and distributing objects in the P2P system. (Searches are by Identifier).

  24. Conclusions • P2P systems offer several advantages such as scalability, robustness and simplicity of use. • Efficient P2P Information Retrieval is not feasible with the current Search Algorithms. • We propose an Intelligent Search Mechanism that uses local knowledge to improve Information Retrieval in P2P. • Our mechanism achieves 90% recall rate while using only 35% of the initial messaging.

  25. Future Work • We plan to deploy our middleware infrastructure on a larger P2P network with more Queries. • We want to probe different Network Topologies such as ASMap with PowerLaws. • We want to probe different Peer-Profile maintenance policies at peers. • Compare the performance of our method with different proposed algorithms (iterative deepening, local indexes, etc).

  26. “A Local Search Mechanism for Peer-to-Peer Networks” Vana Kalogeraki, Dimitrios Gunopulos & Demetris Zeinalipour (University of California – Riverside) < vana@cs.ucr.edu, dg@cs.ucr.edu, csyiazti@cs.ucr.edu > CIKM 2002 – Eleventh International Conference on Information and Knowledge Management November 4-9, Mclean VA http://www.cs.ucr.edu/~csyiazti/publications.html

More Related