1 / 27

Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks. Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson, and Helen J. Wang Microsoft Research DISC 2002 Toulouse, France. Motivation. The increasing popularity of event notification

aadi
Download Presentation

Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson, and Helen J. Wang Microsoft Research DISC 2002 Toulouse, France

  2. Motivation • The increasing popularity of event notification • Yahoo! Alerts, MSN Mobile, AOL anywhere, InfoSpace, … • Complements the traditional polling model in Web • Examples: stock quotes, sport scores, weather, news, … • Event Distribution Network (EDN) • Distributed and scalable event distribution • Parallel the idea of Content Distribution Network (CDN) for event distribution • Built on top of a self-configuring overlay network of servers • Content-based publish/subscribe systems through in-network processing of aggregated subscription filters • Versus simply extending topic-based pub/sub with all filtering processing at end servers

  3. Flat dispatcher-based model

  4. Subscription Partitioning • Basic idea: similarity-based clustering for reducing total event traffic • Event Space Partitioning(ESP) • Filter Set Partitioning (FSP)

  5. Equality Predicates • Hash predicates to get uniform distribution • Treat the hashed domain as the event space • Use Event Space Partitioning • Subscription is a point; does not intersect multiple sub-spaces • Use over-partitioning for better load balancing • Use offline greedy algorithm to assign buckets to servers for load balancing • Use indirection table to dynamically map buckets to servers for load re-balancing • Use bloom filters to further reduce traffic • Fast detection of true negatives at the expense of (very low) false-positive rate

  6. Simulation Results • Actual Notification Money log • 1.48M subscriptions with 0.29M unique filters over 21,741 stock symbols • Zipf-like distribution

  7. Simulation Results (Cont.) • Simulate 100M new subscriptions from 43,734 symbols • Scaled-up Zipf-like distribution • Perturbation and permutation • Uniform distribution • 50 servers with over-partitioning ratio = 10 • Without load re-balancing • Load imbalance (max/min) ranged from 1.41 to 6.66 (Uniform case) • With imbalance threshold of 2.0 • Re-balancing was triggered only 5 times, each time involving re-assignment of up to 3 buckets and migration of up to 0.7% subscriptions.

  8. Range Predicates • Use Filter Set Partitioning • K-Mean clustering • Use center point to represent a rectangle • R-tree-based clustering • R-tree: dynamic index structure for multi-dimensional data rectangles • Offline R-tree algorithm • Exhaustively and recursively search for partitions that minimize sum of bounding rectangle volumes • Online R-tree algorithm • Insert from root down the path that greedily minimizes the increase in bounding rectangle volume • Simulation results • Off-line R-tree > On-line R-tree > K-Mean > Random

  9. Related Work • Pub/Sub systems • Echo, Elvin, Gryphon, Herald, Hierarchical Proxy Architecture, Information Bus, JEDI, Keryx, Ready, Scribe, Siena, … • Clustering in the pub/sub • All the previous work focus on reducing # multicast groups [OAA+00, RLW+02, WKM00]

  10. Summary • Proposed two subscription partitioning and routing approaches • Event Space Partitioning • Filter Set Partitioning • Evaluated performance via simulations • Subscription partitioning reduces network traffic • Over-partitioning helps to achieve good load balancing dynamically • Bloom filter further reduces event traffic

  11. Simulation Results • 10,000 random subscriptions per server on average • Offline R-tree performs the best; reduces event traffic by 20% to 60%

  12. Model of Content-based Pub/Sub • Content-based filtering • Event schema with d attributes, supporting equality and range predicates • Event: a point in the d–dimensional space • Subscription: a rectanglein that space • Match: a rectangle contains the point • Content-based routing • Based on a subset of attributes • Consider d’-dimensional points and rectangles where d’ ≤ d

  13. EDN Network Architecture • Submit subscriptions • Subscription routing • Content-based route updates • Peer exchange of route updates • Content-based event routing • Notification delivery Event Src. 5 EDN nodes 3 3 2 5 4 1 Notification Routing Services 6 subscriber

  14. Backup Slides

  15. Imprecise Summary Precise Summary • Optimize various performance metrics, subject to load-balancing constraints • Minimize total event traffic • Volume of union of rectangles • Maximize overall system throughput • Minimize end-to-end latency Subscription rectangles

  16. 4 3 2 Partition Existing Subscriptions Route Events Summary Reporting 1 5 Route New Subscriptions The EDN Optimization Problem Centralized Architecture Distributed Architecture Event Sources Notification Routing Service Server Subscribers

  17. Three Research Directions • Theoretical Study • Optimal or approximation algorithms for simplified versions • System Design and Simulation • Subscription partitioning for reducing event traffic • Summary-based routing for enhancing system throughput • Indigo-based Implementation • Extensible routing & pub/sub architecture

  18. An R-tree-based EDN pub/sub system

  19. System Design and Simulation:Summary-based Routing • Basic idea: summary precision-based load balancing for enhancing system throughput

  20. If dispatcher is not the bottleneck, use precise summary. • Otherwise, reduce summary precision until either the outgoing link or the servers are about to become the bottleneck. • Throughput increasing • Further reduction of summary precision would generate excessive false-positive traffic to throttle back the dispatcher • Throughput decreasing

  21. Simulation results • Imprecise summaries enhance throughput

  22. Imprecise summaries combined with R-tree-based partitioning further enhance throughput

  23. Dispatcher-to-link and dispatcher-to-sever bottleneck ratios

  24. EDN on Herald • Piggyback subscription routing & summary reporting on multicast tree forming process • Need to additionally consider notification traffic (because subscribers are now part of multicast tree) Subscription Routing Subscriber

  25. Indigo-based Implementation • Indigo M2 routing & pub/sub architecture was not extensible • EDN used M2 messaging and built a WS-compliant, extensible routing & pub/sub architecture on top of it • Close collaboration with Indigo • Extensibility proposals to Indigo • Some appeared in M3 • But most sealed for security for now • Some being considered for M4

  26. EDN Extensible Routing and Pub/Sub Namespace Binding Layer EDN Subscription Manager EDN Route Manager MS Route Manager WS-Eventing Subscription Manager WS-Routing Route Manager EDN R-tree Matcher XPath Filter Matcher Indigo Messaging

  27. Other XML-Messaging/Indigo interactions • State dependency management • Design tool for new features involving “state transplant” • E.g., System Restore (across time), Intellimirror (across space) • Repair tool providing consistent undo • System Restore + rollback of “atomic units” • GoBack3 + roll-forward of “atomic units” • Troubleshooting tool • Trace-diff & state-diff approaches • Our automatic, bottom-up, black-box discovery approach complements their manual, top-down, logical declaration approach (TravisM) • Install-time and run-time information augments the authoring-time information • Targeted problem spaces help identify things to declare for manageability

More Related