1 / 20

Kjetil Nørvåg

Taxonomy Caching: A Scalable Low-Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems. Kjetil Nørvåg Norwegian University of Science and Technology Trondheim, Norway Christos Doulkeridis and Michalis Vazirgiannis Athens University of Economics and Business Athens, Greece.

tilden
Download Presentation

Kjetil Nørvåg

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Taxonomy Caching: A Scalable Low-Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and Technology Trondheim, NorwayChristos Doulkeridis and Michalis Vazirgiannis Athens University of Economics and Business Athens, Greece

  2. Outline • Motivation and example application • Taxonomies and taxonomy-based querying • Taxonomy-based query routing • Taxonomy caching: architecture and maintenance • Experimental results • Summary and further work ICPS'2006

  3. Motivation • Mobile devices high storage capacity & wireless support • Contain multimedia documents that can be shared • Possibly other data/services: • Temperature or other environmental data • Important challenge: find the files & services! • Problem: • Dynamic contents, location, and visibility • Limited bandwidth  Centralized indexing/search engines not applicable  P2P network & search ICPS'2006

  4. Example application: MobiShare • Devices share resources by hosting web services • Device connected to a CAS • CASs connected P2P • [More details in Valavanis et al., Web Intelligence’2003] ICPS'2006

  5. Outline of basic idea 1) Describe contents according to taxonomy 2) Taxonomy info cached at remote peers 3) Use cached knowledge to route queriesto appropriate peers Why? 1) Should reduce latency 2) Increase recall with same cost ICPS'2006

  6. Resource description • Taxonomy-based resource description • Also applicable for audio/video • More than one taxonomy might exist in system • Resource description: Taxonomy ID and set of categories ICPS'2006

  7. Taxonomy-based querying Query: 1) Request for all resources belonging to category Cj or 2) Request for all resources belonging to category Cjand satisfying some additional property Example properties: Text contents, metadata ICPS'2006

  8. Searching in unstructured P2P networks • Basic search technique: Local execution of query then forwarding if TTL>0 • Naïve flooding (all neighbors) • Normalized flooding (only K neighbors) • Random walks: only one random neighbor, but W walks initiated • Problem: Only a limited # of peers can be searched (query horizon) • Possible improvements: • Routing indices • Summary indexing (bloom filters etc) • Result caching • However: Still limited scalability and coverage ICPS'2006

  9. Taxonomy caching • Basic idea: • Maintain taxonomic of remote contents in a taxonomy cache (TCache) • Mapping from taxonomic concept to set of peers • Advantages: • Cheaper to maintain than full-text index • More applicable to multimedia data • More robust wrt. changes in contents • Used to improve query routing  Higher recall and reduced latency ICPS'2006

  10. Query routing using taxonomy cache (TCache) • Basis: one of traditional routing strategies • Query forward peers: PF • Starting point: PF = neighbors=PN={PN1,…,PNn} • Lookup in TCache: Lookup(category) PC={PC1,…,PCm} • PF = PN+PC • Query forwarded to (subset of) PF ICPS'2006

  11. Query forwarding alternatives (1) • Query forward peers: PF • # of neighbors (excl. previous): Nn • # matches from lookup: Nc • Ranking of peers in PC: • Based on # of resources within a category • High # of resources: considered experts • TCB: • Highest ranked in PC + the Nn neighbors in {PN1,…,PNn} • Forwarding to peer in PC called jump • Jump can be to peer beyond query horizon! • TCA: • If Nc≥ Nn: forward to Nn highest ranked peers in PC • If Nc < Nn: forward to all Nc peers in PC + (Nn-Nc) randomly selected neighbors ICPS'2006

  12. Query forwarding alternatives (2) • TCCN: • If Nc≥ Nn: forward to allNc peers in PC • If Nc < Nn: forward to all Nc peers in PC + (Nn-Nc) neighbors • TCDN: • If Nc≥ Nn: forward to Nn/2 highest ranked peers in PC + random selection of Nn/2 other peers in PC • If Nc < Nn: forward to all Nc peers in PC + (Nn-Nc) neighbors ICPS'2006

  13. Distributing taxonomic information • Basic mechanism: piggyback matching category with query result • Rsult returned through original path, possibly involving jumps • Makes revalidation of contents intermediate TCaches possible • Coverage will be gradually extended (beyond query horizon) • Lazy distribution by gossiping also possible ICPS'2006

  14. TCache architecture and maintenance • Aim: Provide efficient mapping C {PC1,…,PCm} • For each category: Peers, # of resources, and TTL • TTL: • Regularly decremented • Reset to start value at revalidation • Caching policy: Aggressive vs. selective • Compacting techniques: Peer upgrade&non-expert pruning ICPS'2006

  15. Experimental setup • Simulations • Excerpts of DMOZ taxonomy • Synthetic network topologies • Resource allocation: 80/20 rule • Queries are taxonomic categories • A number of peers have role as querying peers • Measured: Contacted peers, messages, recall and latency • In this presentation: Results using flooding and TCDN query routing ICPS'2006

  16. Improvements in recall ICPS'2006

  17. Primary reason for improvement:More intelligent query forwarding ICPS'2006

  18. Improvement and scalability ICPS'2006

  19. Latency reduction • TCache results in very fast retrieval of first results • Finding all results approximately similar performance because flooding in both techniques ICPS'2006

  20. Summary and further work • Presented motivation and context • Taxonomy-based querying and query routing • TCache architecture and maintenance • Experimental results proving our claims • Future/ongoing work: • Employing the techniques for XML/XPath querying in P2P context (to appear at IEEE P2P’2006) • Integration of different taxonomies ICPS'2006

More Related