1 / 21

Bookmark-driven Query Routing in Peer-to-Peer Web Search

27th Annual International ACM SIGIR Conference - Workshop on Peer-to-Peer Information Retrieval - Sheffield July 29. Bookmark-driven Query Routing in Peer-to-Peer Web Search. Matthias Bender, Sebastian Michel , Gerhard Weikum, Christian Zimmer. Max-Planck-Institute for Computer Science

jbair
Download Presentation

Bookmark-driven Query Routing in Peer-to-Peer Web Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 27th Annual International ACM SIGIR Conference - Workshop on Peer-to-Peer Information Retrieval - Sheffield July 29 Bookmark-driven Query Routing in Peer-to-Peer Web Search Matthias Bender, Sebastian Michel, Gerhard Weikum, Christian Zimmer Max-Planck-Institute for Computer Science Saarbrücken, Germany smichel@mpi-sb.mpg.de

  2. Overview • Motivation • Peer-to-Peer Systems • Related Work • Design Fundamentals • Bookmark-driven Query Routing • Conclusion/Summary • Ongoing and Future Work Sebastian Michel, Max-Planck-Institute for Computer Science

  3. Motivation • “Why ask one if you can ask thousands?” • Break information monopolies. • Intellectual input from a large number of users. • Use bookmarks to find relevant peers. Peer-to-Peer Web Search Sebastian Michel, Max-Planck-Institute for Computer Science

  4. P2P Systems • peer: • “one that is of equal standing with another” • “one belonging to the same societal group especially based on age, grade, or status ” source: Merriam-Webster Online Dictionary • Benefits • no single point of failure • resource/data sharing • Problems/Challenges: • authority/trust/incentives • high dynamics • … Sebastian Michel, Max-Planck-Institute for Computer Science

  5. Structured P2P-Systems • Distributed Hashtable (DHT) Highly efficient support of one “simple” method lookup(key) + robustness to load skew, failures, dynamics in O(log n) routing hops! • Chord: I. Stoica et al. • CAN: S. Ratnasamy et al. • P-Grid: K. Aberer Sebastian Michel, Max-Planck-Institute for Computer Science

  6. p1 p56 p8 k54 k10 p51 p48 p14 p21 p42 k24 p38 p32 k38 k30 Chord • Peers and keys are mapped to the same cyclic ID space using a hash function • Key k (e.g., hash(file name)) is assigned to the node with key p (e.g., hash(IP address)) such that k  p and there is no node p‘ with k  p‘ and p‘<p Sebastian Michel, Max-Planck-Institute for Computer Science

  7. p8 + 1 p14 p8 + 2 p14 p8 + 4 p14 p8 + 8 p21 Lookup(54) p8 + 16 p32 p8 + 32 p42 k54 p51 + 1 p42 + 1 p48 p56 p42 + 2 p51 + 2 p48 p56 p51 + 4 p42 + 4 p56 p48 p51 + 8 p42 + 8 p1 p51 p42 + 16 p51 + 16 p8 p1 p51 + 32 p42 + 32 p14 p21 Chord • Using finger tables to speed up lookup process • Store pointers to few distant peers • Lookup in O(log n) steps fingertable p8 fingertable p51 p1 p56 Chord Ring p8 p51 p48 p14 fingertable p42 p42 p21 p38 p32 Sebastian Michel, Max-Planck-Institute for Computer Science

  8. Related Work • Distributed IR: • CORI (Callan et al., 1995) • GLOSS (Gravano et al., 1999) • “A decision-theoretic approach to db selection in networked IR” (Fuhr 1999) • P2P Search: • GALANX (DeWitt et al., 2003) • Odissea (Suel et al., 2003) • PlanetP (Cuenca-Acuna et al., 2002) • Metasearch Engines Sebastian Michel, Max-Planck-Institute for Computer Science

  9. P2 P2 P1 Distributed Directory Term  List of Peers Distributed Directory Term  List of Peers P1 P3 P1 P3 P2 P5 P3 P6 P4 P6 P4 P4 P5 P5 P6 Step 0:Post per-termsummaries of local indexes Step 1:Retrieve list of peersfor each query term Step 2:Retrieve and combine local query results from peers Design Fundamentals a: P1 P6 P4 b: P5 P3 P1 P6 ... Query Routing Sebastian Michel, Max-Planck-Institute for Computer Science

  10. System Architecture Architecture of a single peer Event Handler Local QProcessor Communicator PeerList Processor Term  PeerList Local Index Poster Global QProcessor Chord Ring Connector Sebastian Michel, Max-Planck-Institute for Computer Science

  11. Why bookmark driven QR? Sebastian Michel, Max-Planck-Institute for Computer Science

  12. Bookmark-driven Query Routing • bookmarks reflect user interests • “Tell me what books you read and I tell you who you are” use bookmarks to find “relevant” peers bookmarks you have Sebastian Michel, Max-Planck-Institute for Computer Science

  13. Optimize Relevance • Notion of “relevant” • Similar content • No or small overlap Sebastian Michel, Max-Planck-Institute for Computer Science

  14. Bookmark-driven Query Routing • Similarity between two peers? • Compare bookmarks instead of comparing local indexes (too expensive). • Assumption: Index has been created by focused crawling using bookmarks as crawl seeds. • We can compare bookmark lists by • comparing the URLs • comparing term distributions of the documents referenced by the URLs Sebastian Michel, Max-Planck-Institute for Computer Science

  15. Information Similarity • Kullback-Leibler distance (relative entropy) where f and g are probability distributions. • Measure for information inequality. Sebastian Michel, Max-Planck-Institute for Computer Science

  16. Overlap and Benefit this is query independent: can be precomputed, cached,… Sebastian Michel, Max-Planck-Institute for Computer Science

  17. Query based peer assessment • Calculate similarity between the query and the bookmarks. • Use term distribution of the top-k local results Sebastian Michel, Max-Planck-Institute for Computer Science

  18. Conclusion/Summary • P2P approach for collaborative search • Scalable Search engine • Extensible system architecture • Bookmark-driven query routing Sebastian Michel, Max-Planck-Institute for Computer Science

  19. Ongoing and Future Work • Complete the implementation • Experiments on real web data • Replication Sebastian Michel, Max-Planck-Institute for Computer Science

  20. Prototype Sebastian Michel, Max-Planck-Institute for Computer Science

  21. Thanks for your attention. Questions? Sebastian Michel, Max-Planck-Institute for Computer Science

More Related