1 / 32

Scalable Self-Repairing Publish/Subscribe

Scalable Self-Repairing Publish/Subscribe. Robbert van Renesse Ken Birman Werner Vogels Cornell University. Background. ISIS, Horus, Ensemble systems Strong properties (for replicated data) Adaptive (changing network/app behavior) Problems… as fast as slowest receiver “Jim Gray effect”

meghan
Download Presentation

Scalable Self-Repairing Publish/Subscribe

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Self-Repairing Publish/Subscribe Robbert van Renesse Ken Birman Werner Vogels Cornell University

  2. Background • ISIS, Horus, Ensemble systems • Strong properties (for replicated data) • Adaptive (changing network/app behavior) • Problems… • as fast as slowest receiver • “Jim Gray effect” • no IP Multicast

  3. New Direction • Probabilistically Strong Guarantees • Randomized protocols • Compartmentalization • No reliance on IP multicast, clock sync • Auto-configuration, self-repair  JBI

  4. Three Main Components • Astrolabe • Aggregation Service • SelectCast • Dissemination Service • Bimodal Multicast • End-to-end reliability

  5. Aggregation • Ability to summarize information from distributed sources. • aka data fusion in sensor networks. • The basis for scalability! • Standard service in databases. • Why not in distributed systems?

  6. Examples • Barrier Synchronization • Voting • Resource Location • Multicast Routing F

  7. Astrolabe • Astrolabe takes continuous snapshots of the global state of a distributed system, and aggregates this information in user-specified ways.

  8. Four Design Principles • Scalability through Hierarchy • Flexibility through Mobile SQL • Robustness through p2p Gossip • Security through Certificates

  9. DNS-like Domain Hierarchy Attribute list Domains identified by path names

  10. MIB • Each domain has an attribute list called “MIB” (management information base). • MIBs of internal domains generated by aggregating child domains’ MIBs.

  11. Domain Table • No servers for any domain: a MIB is replicated on all hosts in its domain! • Each host maintains not only the MIBs of its own domains, but also those of its sibling domains. • Sibling MIBs organized in “domain tables”.

  12. Domain Table Example

  13. Aggregation Dynamically changing query output is visible domain-wide (like spreadsheet) SQL query “summarizes” data Domain2 Domain1

  14. Example queries • SELECT SUM(nmembers) AS nmembers • SELECT MAX(depth) + 1 AS depth • SELECT MIN(minl) AS minl • (minimum load) • … • Functions gossiped with everything else.

  15. Aggregation Domain2 Domain1

  16. Aggregation Domain2 Domain1 O(log n) info per host

  17. Other Examples • Which are the three lowest loaded hosts? • Which domains contain hosts with an out-of-date virus database? • Do >30% of hosts measure elevated radiation? • Which domains contain subscribers interested in some topic? • Where is the nearest logging server?

  18. Epidemic or Gossip Protocols • Used to keep domain tables up-to-date • Randomized Communication between (nearby) hosts: • Fast (latency grows O(log n)) • Hard to stop (robust even in the face of Denial-of-Service attacks) • Probabilistically Real-Time guarantees on latency (based on epidemiological analysis).

  19. SQL How it works… gossip

  20. SelectCast • Disseminate messages through Astrolabe hierarchy • (Application-level) Routers selected through domain aggregation: SELECT FIRST(3, routers) AS routers, MIN(minload) AS minload ORDER BY minload Exploit heterogeneity, don’t hide it!

  21. Multicast Tree

  22. Fault Masking

  23. Filtering (Pub/Sub) • SQL condition on each message • For example: • MIN(version) < 3 • MAX(radiation) > 300 • OR(subject) // BLOOM FILTERS • TRUE • Generalization of topic based publishing

  24. Filtering Example

  25. Scalability • Latency, memory use, CPU load, load on network links, all grow O(log N), and independent of update rate. • Highly robust to omission and crash failures. • Confirmed by analysis, simulation, and experiment. • O(1) lookup for most useful queries.

  26. Emulab topology (U. Utah)

  27. Experiments

  28. Real vs. Simulation The real thing Simulation

  29. Membership • Domain failure detected when its attributes are no longer being updated. • Domains discovered (and partitions repaired) through • gossip • occasional broadcast and multicast • configuration • Special precautions for domains separated by firewalls and NAT boxes

  30. Security • Integrated PKI • integrity, no confidentiality • prevents “Sybil” Attacks • Remove outliers • Summarize in a robust way • Compartmentalize • Exploit domain hierarchy

  31. Bimodal Multicast • Probabilistic end-to-end reliability • Uses IP Multicast or SelectCast for initial dissemination • Runs a background gossip protocol to do repairs of message loss • Performance improves with scale • share buffering load

  32. Work in Progress • Evaluate Scalability and Performance • emulation, simulation, deployment • Improve support for low power apps • self configuration • Improve expressiveness • pattern matching

More Related