1 / 24

Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR

Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR. Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept. of Electronic & Computer Engineering Technical University of Crete, Greece http://www.intelligence.tuc.gr. Overview. Motivation

lilli
Download Presentation

Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept. of Electronic & Computer Engineering Technical University of Crete, Greece http://www.intelligence.tuc.gr

  2. Overview • Motivation • Distributed resource sharing • The DHTrie protocols • Local filtering algorithms • Conclusions

  3. Motivation • Resource sharing is at the core of today’s computing (Web, P2P, Grid). • One-time as well as continuous querying functionality is needed. • Data models and languages based on Information Retrieval are useful for annotating and querying resources. • Many nice technologies to build on (e.g., overlay networks, agents etc.)

  4. Related work • Distributed information retrieval • p-Search, PlanetP, [Li et.al. 2003], [Cohen et.al. 2003], [Reynolds et.al. 2002], … • Publish/subscribe • Non DHT-based • SIFT, SIENA, Le Suscribe, Gryphon, P2P-DIET, … • DHT-based • Scribe, Bayeux (topic-based) • [Tam et.al. 2003], [Pietzuch et.al. 2003], [Terpstra et.al. 2003], [Triantafillou et.al. 2004] (content-based)

  5. Distributed resource (file) sharing • Two kinds of basic functionality are expected: • One-time querying • A user poses a query “I want photos of Euro 2004 champions”. • The system returns a list of pointers to matching resources. • Publish/subscribe • A user posts a continuous query to receive a notification when a photo of “Euro 2004 champions” is published. • The system notifies the subscriber with a pointer to the peer that published the video clip.

  6. Distributed resource sharing • One-time query scenario

  7. Distributed resource sharing • Publish/subscribe scenario

  8. Achievements in the context of DIET • Languages and data models from IR (emphasis on textual information). • Efficient filtering algorithms. • The system P2P-DIET • A super-peer based P2P system. • Implemented on top of the lightweight mobile agent platform DIET Agents. DIET project: www.dfki.uni-kl.de/IVSWEB/DIET DIET Agents: http://diet-agents.sourceforge.net/ P2P-DIET: http://www.intelligence.tuc.gr/p2pdiet • Current work: • Solve the pub/sub problem using ideas from DHTs.

  9. Distributed Hash Tables (DHTs) • Created to solve the object location problem in a distributed (dynamic) network of nodes. • Support only one operation: Given a key, map the key onto a node • Many existing systems (Chord, CAN, Pastry, Tapestry, P-Grid, DKS, Viceroy, …). • Needs logarithmic number of messages to locate a node.

  10. Data model… • Publications are attribute-value pairs (A,s), where A is a named attribute and s is a text value. • An example of a publication in model AWP {(AUTHOR, “John Smith”), (TITLE, “Information dissemination in P2P systems”), (ABSTRACT, “In this paper we show …”)}

  11. …and query language • Examples of continuous queries in model AWP

  12. Distributed resource sharing revisited • Publish/subscribe scenario

  13. The DHTrie protocols • Subscribing with a continuous query • Assume query q of the form: • Then for a random attribute Ai and a random word wj contained in either si or wpi , we create the string Aiwj and use it as the key to forward the query to peer with ID = H(Aiwj).

  14. The DHTrie protocols (cont’d) • Publishing a resource • Assume a publication p of the form: • Obtain a list of peer IDs by hashing string Aiwj for all words, and all attributes in p (necessary to ensure correctness). Use indirect message passing and the DHT infrastructure to forward the message. • The receiver node, contacts neighbors included in the recipients list, removes them from it and forwards the message.

  15. Direct message passing • Traditional way to handle a message forwarding to more than one recipients. • Send a lookup() message for each recipient. • For k recipients we need O(k log(N)) lookup messages. • Multicast techniques not applicable, since group of peers to be contacted is not known a priori.

  16. Indirect message passing • Incorporate recipient list into message • Avoid asking the same routing question more than once • Opportunistic forwarding • Increase in message size due to: • publication size • process publication (remove stopwords, stemming) • use inverted (and compressed) index • receipient list size • use gap compression (avoid peer IDs)

  17. The DHTrie protocols • Notifying interested subscribers • To find all matching queries in a peer, we use filtering algorithm BestFitTrie. [Tryfonopoulos, Koubarakis, Drougas, SIGIR 2004] • Once all matching queries are found, a notification message is created and forwarded to peers using indirect message passing.

  18. Some (preliminary) results

  19. Filtering algorithms at each super-peer • Query clustering algorithm BestFitTrie • Data structure is a hash table of tries • Hash table is used for fast access to trie roots • We search for the best place to store query q, in two phases: • Best position trie-wise • Best position forest-wise • Matching procedure examines only tries with roots contained in the incoming document

  20. PrefixTrie: Prefix-based clustering (handle a query as a sequence of words) BestFitTrie: Set-based clustering (handle a query as a set of words) Filtering algorithms at each super-peer

  21. Filtering algorithms at each super-peer

  22. BestFitTrie 1M PrefixTrie 1M BestFitTrie 3M PrefixTrie 3M Filtering algorithms at each super-peer

  23. Other interesting issues • Load balancing • Frequency of occurrence of words may overload certain peers. • Index queries under infrequent words. • Use controlled replication. • Word frequency computation • Also useful in other types of queries (VSM). • Global vs Local ranking schemes. • Propose a hybrid ranking scheme, with updating and estimation mechanisms.

  24. Thank you Funding sources: IST/FET project DIET (www.dfki.uni-kl.de/IVSWEB/DIET) IST/FET project Evergrow (http://www.evergrow.org) Heraclitus Ph.D. Fellowship Program (Greece)

More Related