1 / 11

Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks. Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson, and Helen J. Wang Microsoft Research DISC 2002 Toulouse, France. Motivation. Phenomenal growth in Web usage Future trends

Download Presentation

Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson, and Helen J. Wang Microsoft Research DISC 2002 Toulouse, France

  2. Motivation • Phenomenal growth in Web usage • Future trends • Switch from polling to notifications • Example: stock quotes, sports scores, weather, news, … • Yahoo! Alerts, MSN Mobile, AOL anywhere, InfoSpace, … • Complements the traditional polling model in Web • Event Distribution Network (EDN) • Distributed and scalable event distribution • Parallel the idea of Content Distribution Network (CDN) for event distribution • Built on top of a self-configuring overlay network of servers • Content-based publish/subscribe systems through in-network processing of aggregated subscription filters

  3. Dispatcher-based model

  4. Model of Content-based Pub/Sub • Content-based filtering/routing • Event schema with d attributes, supporting equality and range predicates • Event: a point in the d–dimensional space • Subscription: a rectanglein that space • Match: a rectangle contains the point

  5. Subscription Partitioning • Basic idea: similarity-based clustering for reducing total event traffic • Event Space Partitioning(ESP) • Filter Set Partitioning (FSP)

  6. Equality Predicates • Hash predicates to get uniform distribution • Treat the hashed domain as the event space • Use Event Space Partitioning • Subscription is a point; does not intersect multiple sub-spaces • Use over-partitioning for better load balancing • Use offline greedy algorithm to assign buckets to servers for load balancing • Use indirection table to dynamically map buckets to servers for load re-balancing • Use bloom filters to further reduce traffic • Fast detection of true negatives at the expense of (very low) false-positive rate

  7. Simulation Results • Actual Notification Money log • 1.48M subscriptions with 0.29M unique filters over 21,741 stock symbols • Zipf-like distribution

  8. Simulation Results (Cont.) • Simulate 100M new subscriptions from 43,734 symbols • Scaled-up Zipf-like distribution • Perturbation and permutation • Uniform distribution • 50 servers with over-partitioning ratio = 10 • Without load re-balancing • Load imbalance (max/min) ranged from 1.41 to 6.66 (Uniform case) • With imbalance threshold of 2.0 • Re-balancing was triggered only 5 times, each time involving re-assignment of up to 3 buckets and migration of up to 0.7% subscriptions.

  9. Range Predicates • Use Filter Set Partitioning • K-Mean clustering • Use center point to represent a rectangle • R-tree-based clustering • R-tree: dynamic index structure for multi-dimensional data rectangles • Offline R-tree algorithm • Exhaustively and recursively search for partitions that minimize sum of bounding rectangle volumes • Online R-tree algorithm • Insert from root down the path that greedily minimizes the increase in bounding rectangle volume • Simulation results • Off-line R-tree > On-line R-tree > K-Mean > Random

  10. Related Work • Pub/Sub systems • Echo, Elvin, Gryphon, Herald, Hierarchical Proxy Architecture, Information Bus, JEDI, Keryx, Ready, Scribe, Siena, … • Clustering in the pub/sub • All the previous work focus on reducing # multicast groups [OAA+00, RLW+02, WKM00]

  11. Summary • Proposed two subscription partitioning and routing approaches • Event Space Partitioning • Filter Set Partitioning • Evaluated performance via simulations • Subscription partitioning reduces network traffic • Over-partitioning helps to achieve good load balancing dynamically • Bloom filter further reduces event traffic

More Related