Revisiting Content-Based Publish/Subscribe

Revisiting Content-Based Publish/Subscribe Costin Raiciu, David Rosenblum, Mark Handley University College London

Problem • Why has Large Scale Content-Based Publish/Subscribe not been adopted in the real world? • Intense research efforts - many solutions exist • Gryphon, Siena, Hermes, Jedi, Medym, … • Plenty possible applications • RSS Dissemination, Online Games, Stock Quotes,… • Is this problem important? • Are we exploring the wrong side of the solution space? • What to do next?

Contributions • Reasons for lack of adoption: • Complexity • Application Diversity • Lack of Deployment • Research Agenda: • Layering CBPS • Creating Configurable Solutions • Proof of Concept

Complexity of CBPS • Content-Based Publish/Subscribe is composed of two sub-problems: • Content-Based Matching - for each notification, find the set of matching subscribers • Event Routing - deliver each notification to the matching subscribers • Both sub-problems are difficult on their own! • Event Routing • Optimal delivery tree changes with (almost) every notification • Computing it is equivalent to computing the minimum Steiner tree, which is NP-complete for certain cost metrics

Complexity of Content-Based Matching • Related to search in large-scale networks • Active research field – structured overlays, etc. • Theoretically not scalable • Consider the following static system • N – nodes in the system • R – replication rate for subscriptions • H – (max) number of nodes a notification visits • IS – storage load balancing • IR – routing load balancing

Complexity of Content-Based Matching(2) • We can easily show that: • If a notification matches all subscriptions, then • If a subscription matches all notifications, then • Generic content-based matching solutions cannot be scalable on all directions • Either replicate subscriptions to all the nodes (R=N) • Broadcast notifications to all the nodes (H=N) • Create bottlenecks, for either storage or routing (IS=N, IR=N) • Select a trade-off, e.g. N1/3

Application Diversity • Survey of applications suitable for CBPS, 5 applications selected: • Online Games, RSS Feeds, Stock Quotes, Security Alerts, Location Based Services • Tolerable message latency – 1ms – 1min • Number of publishers – 1 - 106 • Number of subscribers – 102 - 106 • Notification frequency – 10-2/s – 104/s

Application Diversity (2) • High Diversity – can a single solution accommodate all applications? No. • Current solutions – not built with specific applications in mind • Embed optimizations based on expected properties of applications, rather than particular examples • Siena – clustering of subscriptions based on geographic proximity • Hermes – distributing event load using message types + clustering • Applications do not seem to benefit the optimizations of any single architecture!

Lack of Deployment • CBPS is a trade-off between broadcast and publisher-side filtering of messages • If CBPS solutions cannot be easily used, application developers will use alternative solutions! • Advantages • Use current solutions to deploy applications • Find out the impact of different optimizations on the performance of the application • Could develop a research agenda • Difficulties • No single deployment can accommodate all applications! • Multiple solutions should be made available

Our Proposal • Layer Content-Based Publish/Subscribe • Solve Content-Based Matching and Event Routing Separately • Compose full CBPS solutions from pieces • Create configurable solutions for the two sub-problems • Provide parameters that allow a solution to be tuned for a specific application • Supporting a new application • Tune the event routing and content-based matching algorithms • Compose them into a full solution • Deploy them using a predefined infrastructure

Event Routing Content Based Matching s s s Layering Content-Based Publish/Subscribe • Benefits of Layering • Solving CBPS is easier • Modularity • Interoperability between layers • Independent development • Testing • Adaptability • We can combine solutions to support new applications • Existing work can be leveraged n

Benefits of Layering (cont’d) • Supports different scale for content based matching and event routing • In some cases, simple solutions are enough – centralized matching and point to point routing is sufficient • Example: stock quote matching • Different companies could provide the two services • Separates the domains of trust • Confidential content-based publish/subscribe is not cheap • Requires subscribers and publishers share a secret key • Separation minimizes the number of trusted servers • Content-Based Matching nodes need access to subscription and notification payload • Event routing nodes only need the addresses of the subscribers

Drawbacks of Layering • Suboptimal solutions • Independent optimizations of the two layers • Increased costs • Duplicate data structures maintained in the two layers • Inter layer signalling costs – when nodes are specialized in content-based matching or event routing • Our hope • Costs are moderate • Resulting solutions have good-enough performance • Argument similar to SQL?

Configurable Solutions • Optimizations must be application specific • Layering allows us to reutilize solution parts to accommodate new applications • The solution parts must be easily configurable to support new applications • Cut down the process of supporting a new application to fine tuning – similar to creating indexes in DBs. • Composition of different solutions • Focus on content-based matching

Configurable Solutions (2) • What parameters should we control? • Application requirements: notification latency, throughput • Solution parameters: R, H, load balancing • Can we use low-level parameters to control high-level parameters? • Simple model • Latency can be improved by minimizing H • Throughput can be improved by increasing H and optimizing storage road balancing • Bottlenecks can be alleviated with increased replication and routing load balancing • Real dependencies – cannot be inferred from this model

Cluster 1 Cluster X 2160 0 Cluster 2 Cluster X-1 Proof of Concept: Configurable Solution • R, H parameters • N nodes – uniformly distributed identifiers in a circular space • Divide the space into X regions, such that N = X*R • Full mesh of nodes • Cluster membership protocol is distributed • Each node knows R, N

Cluster 1 Cluster X 2160 0 Cluster 2 Cluster X-1 Proof of Concept: Configurable Solution (2) • Rendezvous function f • S -> {1, …, X} • N -> {1, …,X} • Rendezvous function f • S -> {1, …, X} • N -> {1, …,X} • Subscribe process • f(s) = 2 • rnd{-H/2,…,H/2} = -1 • Store s on cluster 1 = 2-1 • Rendezvous function f • S -> {1, …, X} • N -> {1, …,X} • Subscribe process • f(s) = 2 • rnd{-H/2,…,H/2} = -1 • Store s on cluster 1 = 2-1 • Publish process • f(N) = 2 • Route to Clusters 1, 2 and 3

Analysis of the solution • Fine grained trade-off: • Routing Hops: H+1 • Replication Rate: R • Load Balancing: IR = N/R*H, IS = N/R*H • Fine tuning • Choose R and Hops minimal such that Imbalance is not bottleneck for desired performance • Flexibility • Different applications on the same architecture • Rendezvous functions • Primary key attributes • Hashes of attribute type

Analysis of the solution (2) • Some issues • Full mesh of nodes – stretch paths with log N, for log N routing tables • Computing N through sampling – what are the implications on consistency? • Drawbacks • Average Case = Worst Case • It would be pleasant to have average routing hops a lot smaller • Can mitigate to some extent by using optimizations • Assumes all distributions are uniform • Replication is the same for all subscriptions

Deployment Issues • Testbed: PlanetLab • Freely available to the scientific community • ~400 nodes scattered throughout the world • Deploying a single solution is not enough • We would like • A common API for content-based matching and event routing • A common code base • Networking functions • Logging • The ability to • Configure layers easily • Compose full CBPS solutions

Summary • We have analyzed reasons for the lack of adoption of large-scale Content-Based Publish/Subscribe • Complexity • Application Diversity • Lack of deployment of current solutions • We have proposed two techniques to mitigate this state of affairs • Layering Content-Based Publish/Subscribe • Content-Based Matching • Event Routing • Building Configurable Solutions • Proof of concept for Content-Based Matching

Questions? Costin Raiciu c.raiciu@cs.ucl.ac.uk David Rosenblum d.rosenblum@cs.ucl.ac.uk

Revisiting Content-Based Publish/Subscribe

Revisiting Content-Based Publish/Subscribe

Presentation Transcript

Smartfinds Internet Marketing Content Marketing

A Knowledge-Based Framework for Unifying Content-Area Reading Comprehension and Reading Comprehension Strategies

How to publish a paper in Nature

Creating an Effective and Safe Learning Environment

The Publish/Subscribe Communication Paradigm and its Application to Mobile Systems

Database evaluation:

Network Payload-based Anomaly Detection and Content-based Alert Correlation

Semantic Content-based Access To Hypervideo Databases

Content Marketing “Everything Becomes an Inspiration”

Outline

TEKS-Based Assessment

Content Prepared by Roland H. Good III, University of Oregon

Chapter 14 Query Optimization

Technology in the Classroom – Publish to the World!

Content-based Image Retrieval (CBIR)

Content Marketing vs. Inbound Marketing: Inextricably Linked but Not the Same

Content Marketing: How To Sell It In

What Is The Future of Content Marketing [Trends and Predictions] #BtoBLive

Content Strategy and Touchpoint Mapping (A Journey with the Brand)

The 60-Second Content Marketing Plan Ebook

How to Publish Your Blog in the New iOS9 Apple News App