Aida Omerovic 4. March 2008 Seminar on Dependable and Adaptive Distributed Systems

Total Order Broadcast and Multicast Algorithms: Taxonomy and Survey(Paper by X. Défago, A. Schiper, and P. Urbán)ACM computing Surveys, Vol. 36,No 4, Dec 2004, pp. 372-421 Aida Omerovic 4. March 2008 Seminar on Dependable and Adaptive Distributed Systems

Outline • Background • Problem specification • Classes of ordering mechanisms • Failure related concepts • Fault tolerance • Discussion

Background Total order broadcast and multicast algorithms • Both synchronous and asynchrous system models Lack of a roadmap for use of the algorithms. Lack of generality of existing comparissons.

Notions, terms… Broadcast (messages are sent to all processes) vs. Multicast (messages are sent to a subset of processes) Closed vs. open groups (belonging of the sender) Single vs. multiple groups (disjoint/overlapping) • ensuring total order at intersection of groups Dynamic groups • Processes join and leave at runtime Partitionabe groups • Splitting of groups into subgroups through primary partition membership or partitionable membership

Motivation • Concurrency and global control in distributed systems • Total order broadcast: a group communication primitive • Ensures that messages sent to a set of processes are delivered by all those processes in the same order • Important in: clock synchronisation, active replication, distributed shared memory, distributed mutual exclusion, cooperative writing, replicated databases performance…

Main contributions • Classification w.r.t. ordering mechanisms • Characteristic with the strongest influence on the behavior • Definition of five classes of ordering mechanisms • Survey of approx. 60 published total order broadcast algorithms. • Study of properties and behaviour

A correct process def. A correct process never expresses any of the faulty behaviors: • Crash failures (stops performing any activity) • Omission failures (omits performing some actions) • Timing faulures (violation of system time). Applies only to synchronous systems. • Byzantine failures. Performs arbitrary faulty behaviour.

The problem specification The total order broadcast problem specification Two primitives: • TO-broadcast(m) For eny message, and any run: executed at most once! • TO-deliver(m) Properties of total order broadcast: • Validity (if a correct p TO-broadcasts m->p TO-delivers m) • Uniform agreement (if a p TO-delivers m->all correct p’s TO-deliver m) • Uniform integrity (every p TO-delivers m at most once and only if m was previously TO-broadcast by sender) • Uniform total order (if processes p and q both TO-deliver m and m’ then p TO-delivers m before m’ iff q TO-delivers m before m’)

The problem specification cont. Properties 1, 2 and 3 satisfied -> ”reliable broadcast”. Properties 1 and 2: ”liveness properties”. (Property may eventually hold, regardless.) Properties 3 and 4: ”safety properties”. (Once the property does not hold, it never will). Properties 2 and 4: uniform. (Apply to both correct and faulty processes.) Costly. Algorithms tolerant to Byzantine failures can not guarantee any of the uniform properties above. Nonuniform: Neither 2 nor 4 hold. Apply only to correct processes, no restr. on the faulty ones. Voting can be a measure.

The problem specification cont. Alternative: uniform processes are those enforced by honest processes, correct or not. (Honest process: behaves according to its specification.) An issue: contamination. (A faulty process in an inconsistent state ”legally” TO broadcasts a message, prior to crashing, thus contaminating the correct processes.) Note: satisfies even the strongest specification so far. This is disallowed by • ”gap-free uniform total order” (no gaps in the delivery sequence.) • ”prefix order” (history of ane process is a prefix of the history of the other.) However, contamination can not be avoided in case of arbitrary failures (e.g. correct delivery by faulty process.)

The problem specification cont. Other ordering properties include: • FIFO order. Delivery of messages in the order in which they are sent (not guaranteed by total order). • Causal order (m precedes m’ if sending event of m precedes the sending event of m’). Generally: broadcast of m before m’, implies delivery of m before m’ by correct processes. Note: these two properties further restrict total order property definition by properties related to SENDERS. Causal order <-> FIFO order + Local order

Classes of ordering mechanisms … according to how the ordering (e.g. timestamp, sequence number) is performed and by whom (type of role). Process roles: sender, destination, sequencer. Five classes of total order broadcast algorithms: • Fixed sequencer (sequencer) • Moving sequencer (sequencer) Token • Privilege based (sender) Token • Communication history (sender) Timestamp • Destinations agreement (destination) Timestamp Another distinction is between time-free and time-based (physical time) ordering.

Classes of ordering mechanisms cont. Neither of the five is failure tolerant!!!

Failure related conceptual issues Synchronous system: a system where upper bounds on process speed interval and communication delay, are set. Asynchronous system: the two parameters are unbounded. Timed asynchronous model: asynchronous model with notion of physical time and assumption that ”most of the messages are likely to reach their destination within a delay δ”.

Failure related conceptual issues cont. Concensus in asynchronous systems if just a single process can crash, has no deterministic solutiuon. Total order broadcast can be transformed into concensus -> the impossibility holds also here! Solution: extent the asynchronous system with oracles. An oracle provides information that processes can use to guide their choices.

Failure related conceptual issues cont. Process controlled crash: the ability to artificially force the crash of a process. Useful in crashing incorrect or suspect processes. However, a process tolerant algoriths can only tolerate the crash of a bounded number of processes. Failures: provoked + genuine => provoking failures degrades the actual fault tolerance of the system.

Fault tolerance mechanisms The main fault-tolerance mechanisms algorithms rely on: • Failure detection • Formalized by completness (prevents blocking) and accuracy (prevents algorithms from running forever without solving the problem) • Group membership service (manages membership of groups of services) • Provides consistent failure notification • Resilient communication pattern (avoids any potential blocking pattern) • Message stability (at least one process is correct…) • Concensus • Mechanisms for lossy channels (tokens, acknowledgnents…)

Conclusion • Problem specification • Five classes of total order broadcast algorithms • Failure related concepts • Fault tolerance mechanisms • The paper also offers a survey of approx. 60 algorithms

Discussion topics • Adaptability of the algorithms (e.g. total order multicast in dynamic, partitionable groups) • Synchrony and timeliness • Performance in the different algorithms • Fairness in the different algorithms (e.g. privilege based) • Suitability of algorithms for open vs. closed groups (e.g. processes have to know of each other in priviledge based algorithms) • Is this approach comprehensive and adequate? • Not covered yet relevant issues? • A reflection of this approach in relation to some earlier seminar seminar topics? Can the principles be adopted elsewhere?

That’s it, folks!

Aida Omerovic 4. March 2008 Seminar on Dependable and Adaptive Distributed Systems

Aida Omerovic 4. March 2008 Seminar on Dependable and Adaptive Distributed Systems

Presentation Transcript

Dependable Computing Systems

Dependable Software Systems

Seminar on Scalability of Distributed Systems

Distributed Systems Seminar

Architecture and Design of Distributed Dependable Systems TI-ARDI

Dependable Software Systems

March 4, 2008

Aida Omerovic 4. March 2008 Seminar on Dependable and Adaptive Distributed Systems

4 March 2008

Dependable Software Systems

Dependable Software Systems

Dependable communication synthesis for distributed embedded systems

March 4, 2008

Morning Seminar – March 10, 2008

Dependable, Self-Adaptive, Self-Healing, Distributed Systems through Reflection

Developing Dependable Systems

Announcing INF5360 : Seminar on Dependable and Adaptive Distributed Systems

Dependable Software Systems

Dependable systems

Reliable Adaptive Distributed Systems