Subscription partitioning and routing in content based publish subscribe networks
This presentation is the property of its rightful owner.
Sponsored Links
1 / 11

Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks PowerPoint PPT Presentation


  • 51 Views
  • Uploaded on
  • Presentation posted in: General

Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks. Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson, and Helen J. Wang Microsoft Research DISC 2002 Toulouse, France. Motivation. Phenomenal growth in Web usage Future trends

Download Presentation

Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Subscription partitioning and routing in content based publish subscribe networks

Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson, and Helen J. Wang

Microsoft Research

DISC 2002

Toulouse, France


Motivation

Motivation

  • Phenomenal growth in Web usage

  • Future trends

    • Switch from polling to notifications

    • Example: stock quotes, sports scores, weather, news, …

    • Yahoo! Alerts, MSN Mobile, AOL anywhere, InfoSpace, …

    • Complements the traditional polling model in Web

  • Event Distribution Network (EDN)

    • Distributed and scalable event distribution

      • Parallel the idea of Content Distribution Network (CDN) for event distribution

      • Built on top of a self-configuring overlay network of servers

    • Content-based publish/subscribe systems through in-network processing of aggregated subscription filters


Dispatcher based model

Dispatcher-based model


Model of content based pub sub

Model of Content-based Pub/Sub

  • Content-based filtering/routing

    • Event schema with d attributes, supporting equality and range predicates

    • Event: a point in the d–dimensional space

    • Subscription: a rectanglein that space

    • Match: a rectangle contains the point


Subscription partitioning

Subscription Partitioning

  • Basic idea: similarity-based clustering for reducing total event traffic

    • Event Space Partitioning(ESP)

    • Filter Set Partitioning (FSP)


Equality predicates

Equality Predicates

  • Hash predicates to get uniform distribution

    • Treat the hashed domain as the event space

  • Use Event Space Partitioning

    • Subscription is a point; does not intersect multiple sub-spaces

  • Use over-partitioning for better load balancing

    • Use offline greedy algorithm to assign buckets to servers for load balancing

    • Use indirection table to dynamically map buckets to servers for load re-balancing

  • Use bloom filters to further reduce traffic

    • Fast detection of true negatives at the expense of (very low) false-positive rate


Simulation results

Simulation Results

  • Actual Notification Money log

    • 1.48M subscriptions with 0.29M unique filters over 21,741 stock symbols

    • Zipf-like distribution


Simulation results cont

Simulation Results (Cont.)

  • Simulate 100M new subscriptions from 43,734 symbols

    • Scaled-up Zipf-like distribution

    • Perturbation and permutation

    • Uniform distribution

  • 50 servers with over-partitioning ratio = 10

  • Without load re-balancing

    • Load imbalance (max/min) ranged from 1.41 to 6.66 (Uniform case)

  • With imbalance threshold of 2.0

    • Re-balancing was triggered only 5 times, each time involving re-assignment of up to 3 buckets and migration of up to 0.7% subscriptions.


Range predicates

Range Predicates

  • Use Filter Set Partitioning

  • K-Mean clustering

    • Use center point to represent a rectangle

  • R-tree-based clustering

    • R-tree: dynamic index structure for multi-dimensional data rectangles

    • Offline R-tree algorithm

      • Exhaustively and recursively search for partitions that minimize sum of bounding rectangle volumes

    • Online R-tree algorithm

      • Insert from root down the path that greedily minimizes the increase in bounding rectangle volume

  • Simulation results

    • Off-line R-tree > On-line R-tree > K-Mean > Random


Related work

Related Work

  • Pub/Sub systems

    • Echo, Elvin, Gryphon, Herald, Hierarchical Proxy Architecture, Information Bus, JEDI, Keryx, Ready, Scribe, Siena, …

  • Clustering in the pub/sub

    • All the previous work focus on reducing # multicast groups [OAA+00, RLW+02, WKM00]


Summary

Summary

  • Proposed two subscription partitioning and routing approaches

    • Event Space Partitioning

    • Filter Set Partitioning

  • Evaluated performance via simulations

    • Subscription partitioning reduces network traffic

    • Over-partitioning helps to achieve good load balancing dynamically

    • Bloom filter further reduces event traffic


  • Login