1 / 21

Mercury: Building Distributed Applications with Publish-Subscribe

Mercury: Building Distributed Applications with Publish-Subscribe. Ashwin Bharambe Carnegie Mellon University Monday Seminar Talk. Quick Terminology Recap. Basics Publishers: inject data/events/publications Subscribers: register interests/subscriptions

sienna
Download Presentation

Mercury: Building Distributed Applications with Publish-Subscribe

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mercury: Building Distributed Applications with Publish-Subscribe Ashwin Bharambe Carnegie Mellon University Monday Seminar Talk

  2. Quick Terminology Recap • Basics • Publishers: inject data/events/publications • Subscribers: register interests/subscriptions • Brokers: match subscriptions with publications and deliver to subscribers • Mercury: distributed publish-subscribe system • Performs matching and content routing in distributed fashion • Data model Name = ashwin Age = 23 X = 192.3 Y = 223.4 Name = * Age > 35 X > 100 X < 180 Publication Subscription

  3. x 100 y 200 x ≥ 50 x ≤ 150 y ≥ 150 y ≤ 250 Virtual reality example Events (50,250) (100,200) User Arena (150,150) Interests Virtual World

  4. Mercury goals • Implement distributed publish-subscribe • Support range queries • Avoid hot-spots in the system • Flooding anything is bad • Avoid publication flooding completely • Avoid subscription flooding as much as is possible • Consider queries like SELECT * from RECORDS • Peer-to-peer scenario • No dedicated brokers • Highly dynamic network

  5. Talk Contents • Mercury Architecture • Overlay construction • Routing guarantees • Overlay properties • How randomness is useful • Load balancing; histogram maintenance • Application Design

  6. 0 150 1000 X - hub 250 age 900 name x y 700 450 Hubs in the system Structure of a single hub Attribute Hubs • Each attribute range is divided into bins • A node responsible for range of attribute values • Assigned when the node joins; can change dynamically

  7. Routing y Generating point S • Send a subscription to one hub • Which one? Interesting question in itself! • Determine query selectivity – send to “highest selective” hub age S name Name = * X > 100 X < 180 Subscription x

  8. Routing (contd.) age Generating point • We must send publications to all hubs • Ensures matching x P P P name y Name = ashwin Age = 23 Publication

  9. Subscription [240, 320) 50 ≤ x ≤ 150 150 ≤ y ≤ 250 [0, 105) [0, 80) Hx [160, 240) Hy Publication [105, 210) x 100 y 200 [210, 320) Rendezvous point [80, 160) Routing illustrated

  10. Choose this link with probability: P(x) = 1/(x ln n) x Hub structure and routing (~Symphony) • Naïve routing along the circle scales linearly • Utilize the small-world phenomenon [Kleinberg 2000] • Know thy neighbors and one random person; and you can contact anybody quickly • Routing policy: choose the link which gets you closest to destn • Performance • Average hop length = O(log2 (n)/k) with k “random” links Need to be careful when node ranges are not uniform

  11. Caching • O(log2 (n)) is good, but each hop is still an application level hop • Latency can be quite large if overlay non-optimized • For distributed applications like games, this is way off from optimal • Exploit locality in the access patterns of an application • In addition to k “random” links, have cached links • Store nodes which were the rendezvous points for recent publications

  12. Performance (Uniform workload) #long links = 6 #cache links = log(n) Publications were generated from a uniform distribution

  13. Performance (Skewed workload) #long links = 6 #cache links = log(n) Publications were generated from a high skew Zipf distribution

  14. Performance (Memory reference trace) #long links = 6 #cache links = log(n) Publications were generated from memory references of SPEC2000 benchmark

  15. Pr(X=x) x Two Problems 1. Load Balancing • Concern because publication values need not follow a uniform, or a priori known, distribution • Node ranges are assigned when the nodes join

  16. Name = * X > 100 X < 180 Sending to Name hub vs. X hub Problems (contd.) 2. Hub Selectivity • Recall: subscription is sent to one “randomly” chosen hub! • Ideally, it should be sent to the “highest selective” hub • Need to estimate selectivity of a subscription

  17. Hail randomness • Randomized construction of the network gives additional benefits! • Turns out, this network is an Expander with high probability • Random walks mix rapidly – i.e., they approach the stationary distribution rapidly • Uniform sampling non-trivial • Node ranges are not uniform across nodes • Random walks: efficient way of sampling • No explicit hierarchy required (as in RANSUB [USITS ’03]) • In general, several statistics about a very dynamic network can be efficiently maintained

  18. Hub Selectivity (ideas) • Use sampling to build approximate histograms • Approach 1: (Push) • Each “Rendezvous point” selects publications with a certain probability and sends them off with specific TTL • log2(n) length random walk ensures good mixing • Traffic overhead / #publications • Approach 2: (Pull) • Perform uniform random sampling periodically • Each sample = histogram of sampled node • Question: how to combine histograms?

  19. Load balancing (ideas) • Sample “average” load in the system • Utilize the histograms to quickly know high/low load areas • Strategy 1: • A “light” load gracefully leaves the overlay • Re-inserts itself into a “high” load area • Strategy 2: • Use load “diffusion” – “heavy” nodes shed load to neighbors • Only if the neighbor is “light”

  20. Distributed Game Design • Current implementation: Distributed version of the Asteroids game! • Questions: • How is state distributed across the system? • How is consistency handled in the system? • Cheating???

  21. Conclusion • Distributed publish-subscribe system supporting • Range queries • Scalable routing and matching • Randomized network construction • Provides routing guarantees • Also yields an elegant way of sampling in a distributed system • Exports an API for applications • Implemented; deployed on Emulab • Distributed game using Mercury • Almost done • To be deployed on Planetlab soon

More Related