1 / 41

The Horus and Ensemble Projects: Accomplishments and Limitations

The Horus and Ensemble Projects: Accomplishments and Limitations. Ken Birman, Robert Constable, Mark Hayden, Jason Hickey, Christoph Kreitz, Robbert van Renesse, Ohul Rodeh, Werner Vogels Department of Computer Science Cornell University.

denis
Download Presentation

The Horus and Ensemble Projects: Accomplishments and Limitations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Horus and Ensemble Projects:Accomplishments and Limitations Ken Birman, Robert Constable, Mark Hayden, Jason Hickey, Christoph Kreitz, Robbert van Renesse, Ohul Rodeh, Werner Vogels Department of Computer Science Cornell University

  2. Reliable Distributed Computing:Increasingly urgent, still unsolved • Distributed computing has swept the world • Impact has become revolutionary • Vast wave of applications migrating to networks • Already as critical a national infrastructure as water, electricity, or telephones • Yet distributed systems remain • Unreliable, prone to inexplicable outages • Insecure, easily attacked • Difficult (and costly) to program, bug-prone Cornell Presentation at DISCEX

  3. A National Imperative • Potential for catastrophe cited by • DARPA ISAT study commissioned by Anita Jones (1985, I briefed the findings, became basis for refocusing of much of ITO under Howard Frank) • PCCIP report, PTAC • NAS study of trust in cyberspace • Need a quantum improvement in technologies, packaged in easily used, practical forms Cornell Presentation at DISCEX

  4. Quick Timeline • Cornell has developed 3 generations of reliable group communication technology • Isis Toolkit: 1987-1990 • Horus System: 1990-1994 • Ensemble System: 1994-1999 • Today engaged in a new effort reflecting a major shift in emphasis • Spinglass Project: 1999- Cornell Presentation at DISCEX

  5. Questions to consider • Have these projects been successful? • What did we do? • How can impact be quantified? • What limitations did we encounter? • How is industry responding? • What next? Cornell Presentation at DISCEX

  6. Timeline Isis Horus Ensemble • Introduced reliability into group computing • Virtual synchrony execution model • Elaborate, monolithic, but adequate speed • Many transition successes • New York, Swiss Stock Exchanges • French Air Traffic Control console system • Southwestern Bell Telephone network mgt. • Hiper-D (next generation AEGIS) Cornell Presentation at DISCEX

  7. G0={p,q} G1={p,q,r,s} G2={q,r,s} G3={q,r,s,t} p q r s t crash r, s request to join p fails r,s added; state xfer t requests to join t added, state xfer Virtual Synchrony Model Cornell Presentation at DISCEX

  8. Why a “model”? • Models can be reduced to theory – we can prove the properties of the model, and can decide if a protocol achieves it • Enables rigorous application-level reasoning • Otherwise, the application must guess at possible misbehaviors and somehow overcome them Cornell Presentation at DISCEX

  9. Virtual Synchrony • Became widely accepted – basis of literally dozens of research systems and products worldwide • Seems to be the only way to solve problems based on replication • Very fast in small systems but faces scaling limitations in large ones Cornell Presentation at DISCEX

  10. How Do We Use The Model? • Makes it easy to reason about the state of a distributed computation • Allows us to replicate data or computation for fault-tolerance (or because multiple users share same data) • Can also replicate security keys, do load-balancing, synchronization… Cornell Presentation at DISCEX

  11. French ATC system (simplified) Onboard Radar X.500 Directory Controllers Air Traffic Database (flight plans, etc) Cornell Presentation at DISCEX

  12. A center contains... • Perhaps 50 “teams” of 3-5 controllers each • Each team supported by workstation cluster • Cluster-style database server has flight plan information • Radar server distributes real-time updates • Connections to other control centers (40 or so in all of Europe, for example) Cornell Presentation at DISCEX

  13. Process groups arise here: • Cluster of servers running critical database server programs • Cluster of controller workstations support ATC by teams of controllers • Radar must send updates to the relevant group of control consoles • Flight plan updates must be distributed to the “downstream” control centers Cornell Presentation at DISCEX

  14. Role For Virtual Synchrony? • French government knows requirements for safety in ATC application • With our model, we can reduce their need to a formal set of statements • This lets us establish that our solution will really be safe in their setting • Contrast with usual ad-hoc methodologies... Cornell Presentation at DISCEX

  15. More Isis Users • New York Stock Exchange • Swiss Stock Exchange • Many VLSI Fabrication Facilities • Many telephony control applications • Hiper-D – an AEGIS rebuild prototype • Various NSA and military applications • Architecture contributed to SC-21/DD-21 Cornell Presentation at DISCEX

  16. Timeline Isis Horus Ensemble • Simpler, faster group communication system • Uses a modular layered architecture. Layers are “compiled,” headers compressed for speed • Supports dynamic adaptation and real-time apps • Partitionable version of virtual synchrony • Transitioned primarily through Stratus Computer • Phoenix product, for telecommunications Cornell Presentation at DISCEX

  17. Layered Microprotocols in Horus Interface to Horus is extremely flexible Horus manages group abstraction group semantics (membership, actions, events) defined by stack of modules ftol vsync Ensemble stacks plug-and-play modules to give design flexibility to developer filter encrypt sign Cornell Presentation at DISCEX

  18. Layered Microprotocols in Horus Interface to Horus is extremely flexible Horus manages group abstraction group semantics (membership, actions, events) defined by stack of modules ftol vsync Ensemble stacks plug-and-play modules to give design flexibility to developer filter encrypt sign Cornell Presentation at DISCEX

  19. Layered Microprotocols in Horus Interface to Horus is extremely flexible Horus manages group abstraction group semantics (membership, actions, events) defined by stack of modules ftol Ensemble stacks plug-and-play modules to give design flexibility to developer vsync filter encrypt sign Cornell Presentation at DISCEX

  20. Group Members Use Identical Multicast Protocol Stacks ftol ftol ftol vsync vsync vsync encrypt encrypt encrypt Cornell Presentation at DISCEX

  21. ftol ftol ftol ftol ftol ftol vsync vsync vsync vsync vsync vsync encrypt encrypt encrypt encrypt encrypt encrypt With Multiple Stacks, Multiple Properties Cornell Presentation at DISCEX

  22. Timeline Isis Horus Ensemble • Horus-like stacking architecture, equally fast • Includes an group-key mechanism for secure group multicast and key management • Uses high level language, can be formally proved, an unexpected and major success • Many early transition successes • DD-21, Quorum via collaboration with BBN • Nortel, STC: commercial users • Discussions with MS (COM+): could be basis of standards. Cornell Presentation at DISCEX

  23. Proving Ensemble Correct • Unlike Isis and Horus, Ensemble is coded in a language with strong semantics (ML) • So we took a spec. of virtual synchrony from MIT’s IOA group (Nancy Lynch) • And are actually able to prove that our code implements the spec. and that the spec captures the virtual synchrony property! Cornell Presentation at DISCEX

  24. Why is this important? • If we use Ensemble to secure keys, our proof is a proof of security of the group keying protocol… • And the proof extends not just to the algorithm but also to the actual code implementing it • These are the largest formal proofs every undertaken! Cornell Presentation at DISCEX

  25. Why is this feasible? • Power of the NuPRL system: a fifth generation theorem proving technology • Simplifications gained through modularity: compositional code inspires a style of compositional proof • Ensemble itself is unusually elegant, protocols are spare and clear Cornell Presentation at DISCEX

  26. Other Accomplishments • An automated optimization technology • Often, a simple modular protocol becomes complex when optimized for high performance • Our approach automates optimization: the basic protocol is only coded once and we work with a single, simple, clear version • Optimizer works almost as well as hand-optimization and can be invoked at runtime Cornell Presentation at DISCEX

  27. Optimization ftol vsync encrypt Original code is simple but inefficient Optimized is provably the same yet inefficiencies are eliminated Cornell Presentation at DISCEX

  28. Other Accomplishments • Real-Time Fault-Tolerant Clusters • Problem originated in AEGIS tracking server • Need a scalable, fault-tolerant parallel server with rapid real-time guarantees Cornell Presentation at DISCEX

  29. AEGIS Problem Emulate this… With this… TrackingServer TrackingServer TrackingServer TrackingServer TrackingServer TrackingServer TrackingServer TrackingServer TrackingServer TrackingServer TrackingServer TrackingServer TrackingServer 100ms deadline TrackingServer TrackingServer TrackingServer TrackingServer Cornell Presentation at DISCEX

  30. Other Accomplishments • Real-Time Fault-Tolerant Clusters • Problem originated in AEGIS tracking server • Need a scalable, fault-tolerant parallel server with rapid real-time guarantees • With Horus, we achieved 100ms response time, even when nodes crash, scalability to 64 nodes or more, load balancing and linear speedups • Our approach emerged as one of the major themes in SC-21, which became DD-21 Cornell Presentation at DISCEX

  31. Other Accomplishments • A flexible, object-oriented toolkit • Standardizes the sorts of things programmers do most often • Programmers are able to work with high level abstractions rather than being forced to reimplement common tools, like replicated data, each time they are needed • Embedding into NT COM architecture Cornell Presentation at DISCEX

  32. Security Architecture • Group key management • Fault-tolerant, partitionable • Currently exploring a very large scale configuration that would permit rapid key refresh and revocation even with millions of users • All provably correct Cornell Presentation at DISCEX

  33. Transition Paths? • Through collaboration with BBN, delivered to DARPA QUOIN effort • Part of DD-21 architecture • Strong interest in industry, good prospects for “major vendor” offerings within a year or two • A good success for Cornell and DARPA Cornell Presentation at DISCEX

  34. What Next? • Continue some work with Ensemble • Research focuses on proof of replication stack • Otherwise, keep it alive, support and extend it • Play an active role in transition • Assist standards efforts • But shift in focus to a completely new effort • Emphasize adaptive behavior, extreme scalability, robustness against local disruption • Fits “Intrinisically Survivable Systems” initiative Cornell Presentation at DISCEX

  35. Throughput Stability: Achilles Heel of Group Multicast • When scaled to even modest environments, overheads of virtual synchrony become a problem • One serious challenge involves management of group membership information • But multicast throughput also becomes unstable with high data rates, large system size, too. • A problem in every protocol we’ve studied including other “scalable, reliable” protocols Cornell Presentation at DISCEX

  36. Thoughput Scenario Most members are healthy…. … but one is slow Cornell Presentation at DISCEX

  37. Virtually synchronous Ensemble multicast protocols 250 200 32 group members 64 group members 96 group members 150 Group throughput (healthy members) 100 50 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Degree of slowdown Throughput as one member of a multicast group is "perturbed" by forcing it to sleep for varying amounts of time. Cornell Presentation at DISCEX

  38. Bimodal Multicast in Spinglass • A new family of protocols with stable throughput, extremely scalable, fixed and low overhead per process and per message • Gives tunable probabilistic guarantees • Includes a membership protocol and a multicast protocol • Ideal match for small, nomadic devices Cornell Presentation at DISCEX

  39. Throughput with 25% Slow processes 200 200 180 180 160 160 140 140 120 120 average throughput average throughput 100 100 80 80 60 60 40 40 20 20 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 slowdown slowdown Cornell Presentation at DISCEX

  40. Spinglass: Summary of objectives • Radically different approach yields stable, scalable protocols with steady throughput • Small footprint, tunable to match conditions • Completely asynchronous, hence demands new style of application development • But opens the door to a new lightweight reliability technology supporting large autonomous environments that adapt Cornell Presentation at DISCEX

  41. Conclusions • Cornell: leader in reliable distributed computing • High impact on important DoD problems, such as AEGIS (DD-21), QUOIN, NSA intelligence gathering, many other applications • Demonstrated modular plug-and-play protocols that perform well and can be proved correct • Transition into standard, off the shelf O/S • Spinglass – the next major step forward Cornell Presentation at DISCEX

More Related