1 / 41

Content-Based Publish-Subscribe Over Structured P2P Networks

Content-Based Publish-Subscribe Over Structured P2P Networks. Peter Triantafillou and Ioannis Aekaterinidis Research Academic Computer Technology Institute and Department of Computer Engineering and Informatics, University of Patras , Greece. Agenda. Introduction/Goal

tomai
Download Presentation

Content-Based Publish-Subscribe Over Structured P2P Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Content-Based Publish-Subscribe Over Structured P2P Networks Peter Triantafillou and IoannisAekaterinidis Research Academic Computer Technology Institute and Department of Computer Engineering and Informatics, University of Patras, Greece

  2. Agenda • Introduction/Goal • Publish-Subscribe Systems • Publish-Subscribe over Chord • Processing Subscriptions • Processing Events • Improving Performance • Conclusion

  3. Introduction • Publish Subscribe systems are becoming very popular for building large scale distributed systems and applications • Anonymity between publisher and subscriber • Centralized: • Adv: Global image of system making matching algorithm easy to implement • Dis: Scalability • Decentralized: • Adv: Scalability • Challenge: development of efficient distributed matching algorithm

  4. Goal • Chose to use Chord because: • Simplicity • Popularity • Scalable • Self-Organizing • Well Performing • Challenge: to develop a strategy for using DHTs to provide good support for range predicates • Which are popular when specifying subscription attributes

  5. Agenda • Introduction/Goal • Publish-Subscribe Systems • Publish-Subscribe over Chord • Processing Subscriptions • Processing Events • Improving Performance • Conclusion

  6. Publish-Subscribe Systems • Asynchronous messaging paradigm • Senders (publishers) of messages are not programmed to send their messages to specific receivers (subscribers) • Published messages are characterized into classes (without knowledge of what subscribers there may be) • Subscribers express interest in one or more classes and only receive messages that are of interest (without knowledge of what publishers there are) • This Decoupling of publishers and subscribers allows for greater scalability and a more dynamic network topology

  7. Pub-Sub Message Filtering • Subscribers receive only a subset of the total messages published • 2 Main Types • Topic Based • Content Based • Hybrid • Coupling of topic and content based systems

  8. Topic Based Pub/Sub Systems • Much like newsgroups • Users join a group (topic) • All messages related to that topic are broadcasted to all users participating in the specific group

  9. Content Based Pub/Sub Systems • Preferable • Give users the ability to express their interest by specifying predicates over the values of a number of well defined attributes • Matching of publications (events) to subscriptions (interest) is done based on the content (values of attributes)

  10. Hybrid System • Publishers post messages to a topic while subscribers register content-based subscriptions to one or more topics • Publications and subscriptions are automatically classified in topics (using an application-specific schema) • Drawbacks: • Design of the domain schema plays fundamental role in the system’s performance • Likely many false positives may occur.

  11. Agenda • Introduction/Goal • Publish-Subscribe Systems • Publish-Subscribe over Chord • Processing Subscriptions • Processing Events • Improving Performance • Conclusion

  12. Event Schema • Set of typed attributes • Each attribute ai consists of: • Type – belong to predefined set of primitive data types • Name - string • Value v(ai) • Any range defined by the minimum and maximum values (vmin(ai), vmax(ai)) along with the attribute’s precision Vpr(ai)

  13. Subscription Schema • Contains all interesting subscription-attribute data types (integers, strings, etc.) and all common operators (=, ≠, <, >, etc.) • Event matches subscription iff all the subscription’s attribute predicates/constraints are satisfied • Can have two or more constraints for the same attribute

  14. Subscription Identifier • Concatenation of 3 parts: • C1: id of the node receiving the subscription • Size: m bits in a Chord ring with an m-bit address space • C2: id of the subscription itself • Size: bits equal to the rounded-up base-2 logarithm of the maximum number of outstanding subscriptions a node can have • C3: number of attributes on which constraints are declared • Size: max value = total number of attributes supported by the system

  15. Subscription ID Example • Assume Chord ring with a 3-bit identifier address space • Each node can support 8 outstanding subscriptions with an attribute schema including 7 attributes • Depicts subscription 3 (C2=3), belonging to node 4 (C1=4), comprised of constraints on 5 attributes (C3=5)

  16. Storing Subscriptions • Done using the hash function provided by Chord (SHA-1) • Returns an identifier uniformly distributed in the address space • k=h(v(ai)) • Following the Chord API, the subID is placed at node: successor(k)

  17. Storing Subscriptions • Procedure:

  18. Storing Example • Attributes are processed one at a time • Subscription ID is stored at: • Successor(h(“NYSE”)), in the list dedicated for attribute Exchange • Successor(h(“OTE”)), in the list dedicated for attribute Symbol • Since the Price attribute is over a range of 8.30<Price<8.70 with a precision of .01 the subscription ID is stored at: • Successor(h(Price)); for the values 8.31,8.32, …, 8.69.

  19. Updating Subscriptions • Updating attributes of a subscription with equality only 2 nodes are affected: • Delete the Subscription ID from: • nodeID = successor(h(vstale_value(ai))) • Add the subscription ID to node: (appropriate list) • nodeID = successor(h(vupdated_value(ai)))

  20. Updating Subscriptions • The procedure for updating a range value depends on the new values of the range bounds (vlow_NEW(ai) and vhigh_NEW(ai)) compared to the old values • If vlow_NEW(ai) < vlow(ai) store the subID to the nodes that cover [vlow_NEW(ai), vlow(ai)) range • If vhigh_NEW(ai) > vhigh(ai) store the subID to the nodes that cover (vhigh(ai), vhigh_NEW(ai)] range

  21. Updating Subscriptions • If vlow_NEW(ai) > vlow(ai) delete the subID from the nodes that cover [vlow(ai), vlow_NEW(ai)) range. • If vhigh_NEW(ai) < vhigh(ai) delete the subID from the nodes that cover (vhigh_NEW(ai), vhigh(ai)] range

  22. Processing Events: Matching Algorithm

  23. Matching Events with Subscriptions Example • Suppose we have Subscriptions 1 and 2 generated by two clients connected to a Chord node and Event 1 • First, the algorithm will collect all the subIDs lists in which the values of the event attributes satisfy the corresponding constrains of the subscriptions

  24. Matching Events with Subscriptions Example (continued) • The algorithm starts with attribute Exchange = “NYSE” and retrieves the subID list (LExchange) from node successor(h(“NYSE”)) • This list contains only the subID1 • LExchange -> subID1

  25. Matching Events with Subscriptions Example (continued) • Next attribute Symbol = “OTE”; subID list (LSymbol) from node successor(h(“OTE”)) is retrieved • LSymbol -> subID1, subID2 • Since both subscriptions are satisfied for the event

  26. Matching Events with Subscriptions Example (continued) • Next attribute Price = 8.40; subID list (LPrice) from node successor(h(8.40)) is retrieved • LPrice -> subID1 • Since only subscription 1 has a price that falls within this range.

  27. Matching Events with Subscriptions Example (continued) • Lastly attribute Low = 8.22; subID list (LLow) from node successor(h(8.22)) is retrieved • LLow -> subID2 • Since only subscription 2 has an attribute Low

  28. Matching Events with Subscriptions Example (continued) • After this phase of the matching process the collected subscription ID lists are: • LExchange -> subID1 • LSymbol -> subID1, subID2 • LPrice -> subID1 • LLow -> subID2 • Subscription1 was found in 3 lists while subscription2 was found in 2 • By processing the subIDs of the subscriptions (c3 part) we can find out that both subscriptions have constraints over 3 attributes.

  29. Matching Events with Subscriptions Example (continued) • Since subscription 1 was found in 3 lists, a match is implied and it’s subID is kept in order to inform the node which generated the subscription about the matched event. • While holding metadata info for subID1 in order to locate the IP address of the client that generated the subscription • The node storing the subscription is contacted (using nodeID equal to c1 field of the subID1) and the event is delivered to the interested client

  30. Expected Performance • Subscription Storage Procedure: • Average number of hops needed to store a subID depends on the type of constraints over the attributes • Equality: ½ log(N) • subID is stored in a single node • Range Constraint: Nodes affected which leads to r*1/2 log(N) hops on average to store the subID

  31. Expected Performance • Update/Deletion of Subscription • Again, depends on the type of constraints over the attributes • Equality: update performed by contacting Log(N) nodes • Ranges: number of nodes is k*log(N) on average • K depends on whether the new range is smaller or wider than the old range

  32. Expected Performance • Event-Processing (matching) • Involves contacting nodes to collect the subscription id lists • Reminder: a Chord network with N nodes and a 2m-bit address space, the average number of nodes that must be contacted to find a successor is: • ½ log(N) hops • By design, this proposal leads to fast and scalable event matching.

  33. Agenda • Introduction/Goal • Publish-Subscribe Systems • Publish-Subscribe over Chord • Processing Subscriptions • Processing Events • Improving Performance • Conclusion

  34. Improving Performance • Currently storing a subscription over the Chord ring takes r* ½ * log(N) hops on average for every attribute • r depends on precision, high, and low values • Using an order preserving hash function we can optimize to r+ ½ *log(N) hops

  35. Order Preserving Chord • Using a 2m- order preserving hash function • Expected performance: • ½ log(N) hops to locate node storing minimum value of the range (vlow(ai)) • Then, perform r hops to store remaining values in the range to lead to r+ ½ log(N) total hops

  36. Order Preserving Hash Function • Suppose every attribute is characterized by • vmin(ai): minimum value ai can take • vmax(ai): maximum value ai can take • vpr(ai): precision of ai • vj(ai) is any value in [vlow(ai), vhigh(ai)] • OPHF is:

  37. Subscription and Event Processing with OPHF • Example: Storing Subscription • Consider Chord ring with 3-bit ids and 8 nodes • Subscription of a single integer attribute a arriving at node 3 with constraint 0<v(a)<4 • Using Chord requires O(r*log(N)) hops to store the subID at three nodes

  38. Subscription and Event Processing with OPHF • Using the OPHF with Chord: • Perform O(log(N)) hops only once to reach the first node (node 6) • Storing the subID at nodes 7 and 0 requires 2 more hops

  39. Agenda • Introduction/Goal • Publish-Subscribe Systems • Publish-Subscribe over Chord • Processing Subscriptions • Processing Events • Improving Performance • Conclusion

  40. Conclusion • Not Addressed • Load Balancing • Small Domain Problem • Able to support equality and range attributes while leveraging Chord to build a scalable, self-organizing, well performing content based publish-subscribe system.

  41. Sources Cited • http://en.wikipedia.org/wiki/Publish/subscribe

More Related