1 / 54

Virtual Synchrony

Virtual Synchrony. Justin W. Hart CS 614 11/17/2005. Papers. The Process Group Approach to Reliable Distributed Computing . Birman. CACM, Dec 1993, 36(12):37-53. Understanding the Limitations of Causally and Totally Ordered Communication .  Cheriton and Skeen.  14th SOSP, 1993. Background.

Download Presentation

Virtual Synchrony

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtual Synchrony Justin W. Hart CS 614 11/17/2005

  2. Papers • The Process Group Approach to Reliable Distributed Computing. Birman. CACM, Dec 1993, 36(12):37-53. • Understanding the Limitations of Causally and Totally Ordered Communication.  Cheriton and Skeen.  14th SOSP, 1993.

  3. Background • Chandy-Lamport Logical Clocks • Consistent Cuts • Distributed Snapshots • Publish/Subscribe • Fail-Stop

  4. Fail Stop • Group Membership Service • Processes appear to fail by halting • How does this affect the FLP result?

  5. Motivation • Information Backplane • Customization • Hierarchical Structure • Fault-Tolerance • Reliability

  6. Types of groups Anonymous groups Explicit groups Implementation Requirements Group communication Group membership as input Synchronization Process Groups

  7. Anonymous Groups • Group addressing • Messages sent exactly once to all or no recipients • Ordering • Logging

  8. Explicit Groups • Group members cooperate directly • May execute algorithms based on membership knowledge • Communication is sensitive to membership changes

  9. Building groups over conventional technology • Conventional message passing technologies • Group addressing • Logical time & causal dependency • Message delivery ordering • State transfer • Fault tolerance

  10. Close Synchrony • Close Synchrony • 100% lock-step execution model

  11. A synchronous execution p q r s t u • With true synchrony executions run in genuine lock-step.

  12. So… what’s wrong with that? • Under close synchrony, execution is limited by the slowest process in the group!

  13. Virtual Synchrony • Relax synchronization requirements where possible • Benefit by allowing for asynchronous interactions • Do this where the result is identical to close synchrony

  14. A few protocols… • fbcast • cbcast • abcast • gbcast

  15. Four protocols!?!? • …but Justin. The paper only discussed 2 protocols… you’re getting off-topic!

  16. A few protocols… • fbcast • Simple protocol upon which we’ll build the others. • Delivery is FIFO ordered, with respect to the original sender • Accomplished easily with a logical timestamp • cbcast • abcast • gbcast

  17. Single updater • If p is the only update source, the need is a bit like the TCP “fifo” ordering • fbcast is a good choice for this case 1 2 3 4 p r s t

  18. A few protocols… • fbcast • cbcast • Receipt is causally ordered • Protocol in paper uses token passing • Another simple protocol uses vector timestamps • abcast • gbcast

  19. Causally ordered updates • Simple protocol based on token passing

  20. Causally ordered updates • Example: messages from p and s arrive out of order at t VT(b)=[1,0,0,1] c is early: VT(c) = [1,0,1,1] but VT(t)=[0,0,0,1]: clearly we are missing one message from s p VT(c) = [1,0,1,1] When b arrives, we can deliver both it and message c, in order r s t VT(a) = [0,0,0,1]

  21. Causally ordered updates • Each thread corresponds to a different lock • In effect: red “events” never conflict with green ones! 2 5 p 1 r 3 s t 2 1 4

  22. Hey… that sped things up! • Now I get it! Processes only have to wait for processes that they depend on. Not the slowest in the group!

  23. A few protocols… • fbcast • cbcast • abcast • Atomic delivery ordering • With respect to other abcasts • More costly than cbcast, but with a stronger ordering property • ISIS builds abcast over cbcast • gbcast

  24. A few protocols… • fbcast • cbcast • abcast • gbcast • Atomic delivery ordering • With respect to everything

  25. Three Round Multicast

  26. As a time-line picture Phase 1 Phase 2 Vote? Commit! 2PC initiator p q r s t All vote “commit”

  27. Just one more…

  28. Flush protocol • We say that a message is unstable if some receiver has it but (perhaps) others don’t • For example, q’s message is unstable at process r • If q fails we want to “flush” unstable messages out of the system

  29. Styles of groups • Peer Groups • Processes cooperate closely • Client-Server Groups • Group acts as a server • Client multicasts repeatedly to the group • Diffusion Groups • Group serves information • Clients connect to receive data from group • Hierarchical Groups • Offer scalability through a hierarchy of connected groups

  30. Historical Aside • Two major classes of real systems • Virtual synchrony • Weaker properties – not quite “FLP consensus” • Much higher performance (orders of magnitude) • Requires that majority of system remain connected. Partitioning failures force protocols to wait for repair • Quorum-based state machine protocols are • Closer to FLP definition of consensus • Slower (by orders of magnitude) • Sometimes can make progress in partitioning situations where virtual synchrony can’t

  31. Names of some famous systems • Isis was first practical virtual synchrony system • Later followed by Transis, Totem, Horus • Today: Best options are Jgroups, Spread, Ensemble • Technology is now used in IBM Websphere and Microsoft Windows Clusters products! • Paxos was first major state machine system • BASE and other Byzantine Quorum systems now getting attention from the security community • (End of Historical aside)

  32. Sounds good… what’s wrong with it? • Tries to solve state problems at communication level • This violates the end-to-end argument! • Consistency requirements are typically stated with respect to application state

  33. Stable vs Durable • Stable – messages are buffered until received by all group members • Durable – message will be delivered, even if the sender dies

  34. Ordering semantics • Incidental Ordering • Semantic Ordering • Prescriptive Ordering

  35. The problem with CATOCS • It can’t say “for sure” • It can’t say the “whole story” • It can’t say “together” • It can’t say it efficiently

  36. It can’t say “for sure” • Processes communicating over a “hidden” channel • Common database • Shared memory • Two threads reacting to external event

  37. It can’t say “together” • Standard solution – locking • Transaction models allow for abort and rollback • Higher level conditions… what happens if a message arrives, but is not successfully processed

  38. Stock trading example

  39. Not everything can be expressed through the “happens-before” relationship Semantic ordering constraints Causal memory, the weakest of these, cannot be expressed in causal multicast Total ordering helps some of these, but is far too expensive Inexpensive, state-level protocols with logical clocks can solve these Can’t say the “whole story”

  40. It can’t say it efficiently • False causality • Potential causality != Actual causality • Memory requirements for buffering “unstable” messages • Ordering information during transmission and reception

  41. And… what of the end to end argument? • All of this considers our communication channels… isn’t the application-level check far more important?

  42. Classes of distributed applications • Data dissemination • Netnews • Trading application example • Global predicate evaluation • Transactional applications • Replicated data • Replication in the large • Distributed real-time applications

  43. Implementing only part of the messaging? • Can you cut down on overhead by implementing only part of the messaging using CATOCS?

  44. Semantics • Are the semantics of state-based approaches superior to those of virtual synchrony?

  45. Scalability • N Processes • Time T to propagate a message across the system • Grows roughly proportional with the square root of the number of processes • Arcs in the active causal graph grow quadratically • Quadratic causal graph

  46. Buffering grows • Quadratic arcs • Linear communication of causal dependencies • Linear growth in required buffering • Changing topologies doesn’t help • CATOCS would require separate process groups for read and write to accomplish optimization of updates vs queries

  47. Group membership protocols • Must enforce atomic delivery semantics • Run our most expensive protocol… gbcast • Failures increase with the size of the system, increasing load on the GMS

  48. Who uses ISIS? • Brokerage • Database replication and triggers

  49. ISIS-based utilities • NEWS • A pub/sub application with that will replay histories • NMGR • Manages batch-style jobs and performs load sharing • Parallel make

  50. ISIS-based utilities • DECEIT • NFS compatible file system • META/LOMITA • Sensors & actuators • Abstract sensors • Specify control actions in high-level terms • SPOOLER/LONG-HAUL FACILITY

More Related