Exploring Fault-Tolerant Distributed Computing: Atomic Broadcast and Its Applications
This presentation delves into the concepts of fault-tolerant distributed computing with a focus on atomic broadcast. Discover why distributed computing is essential for spreading computational loads and resolving larger issues locally. Understand the atomic broadcast system, including definitions, properties like validity and total order, and its relevance to e-textiles such as data synchronization and lightweight processing. We'll discuss future directions, including scalability challenges and alternative approaches like gossip protocols. Join us for an engaging Q&A session!
Exploring Fault-Tolerant Distributed Computing: Atomic Broadcast and Its Applications
E N D
Presentation Transcript
Fault-Tolerant Distributed Computing: Atomic Broadcast
Outline • Why distributed computing? • Atomic Broadcast • The atom system • Relevance for e-textiles • What’s next? • Q&A
Why Distributed Computing? • Spread and balance the computational weight of applications • Solve bigger problems • Deal with problems locally instead of centralizing all the data
Example • Space filtering vs. raw consensus • Acoustic Beam Forming: master collects information from slaves and decides according to the relevance of data • Consensus: no master, all processes decide upon one common value
Atomic Broadcast: Definition (1) • Atomic Broadcast = the same set of messages is delivered by all the processes in the same order • Consensus = all processes decide upon one common value among those proposed
Atomic Broadcast: Definition (2) • Validity: If a correct process broadcasts a message m it will eventually receive it • Uniform agreement: If a process delivers a message m then every correct process will deliver it • Uniform integrity: Every message m is delivered at most once and only if it was reliably broadcasted by sender(m) • Total order: If 2 correct processes p and q deliver 2 messages m and m’ then p delivers m before m’ iff q delivers m before m’
Atomic Broadcast: Bad News • Impossibly to achieve in a totally asynchronous system [Fisher, Lynch, Patterson 85]
Atomic Broadcast: Good News • Can be done using unreliable failure detectors • Based on a Consensus algorithm described in [Chandra, Toueg 96]
Atom • Open source Atomic Broadcast system
Producer Atom A-broadcast AB task1 FD suspect transmission do_Consensus R-broadcast start AB task 3 One_run start do_decide cancel AB task 2 RB A-deliver Consumer FD trust
Relevance to E-textiles • Synchronization of data • Coordination of decisions and actions • Light-weight process • Buffer sizes can be predicted
What’s Next? • Scalability is a problem for classic fault-tolerant distributed algorithms • Bimodal Multicast[Ken Birman, Mark Hayden, Oznur Ozkasap, Zhen Xiao, Mihai Budiu, Yaron Minsky – 1998] • Gossip protocol • Relaxes the “strong” reliability guarantees replacing them with probabilistic guarantees • Converges to “strong” reliability in the absence of failures • Scalable with steady throughput