1 / 22

Revisiting Content-Based Publish/Subscribe

Revisiting Content-Based Publish/Subscribe. Costin Raiciu, David Rosenblum, Mark Handley University College London. Problem. Why has Large Scale Content-Based Publish/Subscribe not been adopted in the real world? Intense research efforts - many solutions exist

cynara
Download Presentation

Revisiting Content-Based Publish/Subscribe

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Revisiting Content-Based Publish/Subscribe Costin Raiciu, David Rosenblum, Mark Handley University College London

  2. Problem • Why has Large Scale Content-Based Publish/Subscribe not been adopted in the real world? • Intense research efforts - many solutions exist • Gryphon, Siena, Hermes, Jedi, Medym, … • Plenty possible applications • RSS Dissemination, Online Games, Stock Quotes,… • Is this problem important? • Are we exploring the wrong side of the solution space? • What to do next?

  3. Contributions • Reasons for lack of adoption: • Complexity • Application Diversity • Lack of Deployment • Research Agenda: • Layering CBPS • Creating Configurable Solutions • Proof of Concept

  4. Complexity of CBPS • Content-Based Publish/Subscribe is composed of two sub-problems: • Content-Based Matching - for each notification, find the set of matching subscribers • Event Routing - deliver each notification to the matching subscribers • Both sub-problems are difficult on their own! • Event Routing • Optimal delivery tree changes with (almost) every notification • Computing it is equivalent to computing the minimum Steiner tree, which is NP-complete for certain cost metrics

  5. Complexity of Content-Based Matching • Related to search in large-scale networks • Active research field – structured overlays, etc. • Theoretically not scalable • Consider the following static system • N – nodes in the system • R – replication rate for subscriptions • H – (max) number of nodes a notification visits • IS – storage load balancing • IR – routing load balancing

  6. Complexity of Content-Based Matching(2) • We can easily show that: • If a notification matches all subscriptions, then • If a subscription matches all notifications, then • Generic content-based matching solutions cannot be scalable on all directions • Either replicate subscriptions to all the nodes (R=N) • Broadcast notifications to all the nodes (H=N) • Create bottlenecks, for either storage or routing (IS=N, IR=N) • Select a trade-off, e.g. N1/3

  7. Application Diversity • Survey of applications suitable for CBPS, 5 applications selected: • Online Games, RSS Feeds, Stock Quotes, Security Alerts, Location Based Services • Tolerable message latency – 1ms – 1min • Number of publishers – 1 - 106 • Number of subscribers – 102 - 106 • Notification frequency – 10-2/s – 104/s

  8. Application Diversity (2) • High Diversity – can a single solution accommodate all applications? No. • Current solutions – not built with specific applications in mind • Embed optimizations based on expected properties of applications, rather than particular examples • Siena – clustering of subscriptions based on geographic proximity • Hermes – distributing event load using message types + clustering • Applications do not seem to benefit the optimizations of any single architecture!

  9. Lack of Deployment • CBPS is a trade-off between broadcast and publisher-side filtering of messages • If CBPS solutions cannot be easily used, application developers will use alternative solutions! • Advantages • Use current solutions to deploy applications • Find out the impact of different optimizations on the performance of the application • Could develop a research agenda • Difficulties • No single deployment can accommodate all applications! • Multiple solutions should be made available

  10. Our Proposal • Layer Content-Based Publish/Subscribe • Solve Content-Based Matching and Event Routing Separately • Compose full CBPS solutions from pieces • Create configurable solutions for the two sub-problems • Provide parameters that allow a solution to be tuned for a specific application • Supporting a new application • Tune the event routing and content-based matching algorithms • Compose them into a full solution • Deploy them using a predefined infrastructure

  11. Event Routing Content Based Matching s s s Layering Content-Based Publish/Subscribe • Benefits of Layering • Solving CBPS is easier • Modularity • Interoperability between layers • Independent development • Testing • Adaptability • We can combine solutions to support new applications • Existing work can be leveraged n

  12. Benefits of Layering (cont’d) • Supports different scale for content based matching and event routing • In some cases, simple solutions are enough – centralized matching and point to point routing is sufficient • Example: stock quote matching • Different companies could provide the two services • Separates the domains of trust • Confidential content-based publish/subscribe is not cheap • Requires subscribers and publishers share a secret key • Separation minimizes the number of trusted servers • Content-Based Matching nodes need access to subscription and notification payload • Event routing nodes only need the addresses of the subscribers

  13. Drawbacks of Layering • Suboptimal solutions • Independent optimizations of the two layers • Increased costs • Duplicate data structures maintained in the two layers • Inter layer signalling costs – when nodes are specialized in content-based matching or event routing • Our hope • Costs are moderate • Resulting solutions have good-enough performance • Argument similar to SQL?

  14. Configurable Solutions • Optimizations must be application specific • Layering allows us to reutilize solution parts to accommodate new applications • The solution parts must be easily configurable to support new applications • Cut down the process of supporting a new application to fine tuning – similar to creating indexes in DBs. • Composition of different solutions • Focus on content-based matching

  15. Configurable Solutions (2) • What parameters should we control? • Application requirements: notification latency, throughput • Solution parameters: R, H, load balancing • Can we use low-level parameters to control high-level parameters? • Simple model • Latency can be improved by minimizing H • Throughput can be improved by increasing H and optimizing storage road balancing • Bottlenecks can be alleviated with increased replication and routing load balancing • Real dependencies – cannot be inferred from this model

  16. Cluster 1 Cluster X 2160 0 Cluster 2 Cluster X-1 Proof of Concept: Configurable Solution • R, H parameters • N nodes – uniformly distributed identifiers in a circular space • Divide the space into X regions, such that N = X*R • Full mesh of nodes • Cluster membership protocol is distributed • Each node knows R, N

  17. Cluster 1 Cluster X 2160 0 Cluster 2 Cluster X-1 Proof of Concept: Configurable Solution (2) • Rendezvous function f • S -> {1, …, X} • N -> {1, …,X} • Rendezvous function f • S -> {1, …, X} • N -> {1, …,X} • Subscribe process • f(s) = 2 • rnd{-H/2,…,H/2} = -1 • Store s on cluster 1 = 2-1 • Rendezvous function f • S -> {1, …, X} • N -> {1, …,X} • Subscribe process • f(s) = 2 • rnd{-H/2,…,H/2} = -1 • Store s on cluster 1 = 2-1 • Publish process • f(N) = 2 • Route to Clusters 1, 2 and 3

  18. Analysis of the solution • Fine grained trade-off: • Routing Hops: H+1 • Replication Rate: R • Load Balancing: IR = N/R*H, IS = N/R*H • Fine tuning • Choose R and Hops minimal such that Imbalance is not bottleneck for desired performance • Flexibility • Different applications on the same architecture • Rendezvous functions • Primary key attributes • Hashes of attribute type

  19. Analysis of the solution (2) • Some issues • Full mesh of nodes – stretch paths with log N, for log N routing tables • Computing N through sampling – what are the implications on consistency? • Drawbacks • Average Case = Worst Case • It would be pleasant to have average routing hops a lot smaller • Can mitigate to some extent by using optimizations • Assumes all distributions are uniform • Replication is the same for all subscriptions

  20. Deployment Issues • Testbed: PlanetLab • Freely available to the scientific community • ~400 nodes scattered throughout the world • Deploying a single solution is not enough • We would like • A common API for content-based matching and event routing • A common code base • Networking functions • Logging • The ability to • Configure layers easily • Compose full CBPS solutions

  21. Summary • We have analyzed reasons for the lack of adoption of large-scale Content-Based Publish/Subscribe • Complexity • Application Diversity • Lack of deployment of current solutions • We have proposed two techniques to mitigate this state of affairs • Layering Content-Based Publish/Subscribe • Content-Based Matching • Event Routing • Building Configurable Solutions • Proof of concept for Content-Based Matching

  22. Questions? Costin Raiciu c.raiciu@cs.ucl.ac.uk David Rosenblum d.rosenblum@cs.ucl.ac.uk

More Related