1 / 55

Broadcast Variants

Broadcast Variants. why broadcasts?. distributed systems are inherently group oriented and hence it is more useful to talk about one-to-all or one-to-many communication, that is broadcast and multicast within the broader context of group communication

nova
Download Presentation

Broadcast Variants

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Broadcast Variants

  2. why broadcasts? • distributed systems are inherently group oriented and hence it is more useful to talk about one-to-all or one-to-many communication, that is broadcast and multicast within the broader context of group communication • most useful in database replication and in the general case of state machine replication – where every server replica is expected to respond to the same sequence of requests Distributed Systems (DNR)

  3. compared to unicast communication, the problems are made complex by message ordering (at the receiving end) and reliability (sending process crashes) issues in broadcast • message ordering and reliability are orthogonal to each other, with often hybrid models existing Distributed Systems (DNR)

  4. *p1, p2 with p1 FIFO order broadcast and receive in misorder *P2 crashing in the midst Distributed Systems (DNR)

  5. message ordering definitions: • FIFO order –if a process p sends m1 before it sends m2, then m2 is not delivered at a process q before m1 (easily implemented using message sequence numbers) • total order– if a process (correct or faulty) p delivers a message m1 before m2, then every process delivers m2 only after it has delivered m1 • causal order – for every process p, if m1 happens before m2, then m2 is not delivered at q before m1 is Distributed Systems (DNR)

  6. causal ordering  single source FIFO ordering • total ordering  FIFO or causal ordering • a combination of FIFO-total order broadcast (which enforces single source FIFO), or, causal-total order broadcast (which preserves causality) is possible Distributed Systems (DNR)

  7. m1m2 (FIFO) and m1m3 (causal) is maintained in the total order p1 m1 m3 m2 p2 m3 m2 m1 p3 Distributed Systems (DNR)

  8. we will discuss: • best effort broadcast (BEBcast) • reliable broadcast (RBcast) • terminating reliable broadcast (TRBcast) • uniform reliable broadcast (URBcast) • (uniform reliable) causal order broadcast (COBcast) • (uniform reliable) total order broadcast (ABcast, or atomic broadcast) Distributed Systems (DNR)

  9. assumptions • groups are static: dynamic groups are not addressed here • processes will not have access to stable storage (no fail-recovery) • asynchronous and at the network level, point-to-point communication • fail-stop processes unless otherwise stated Distributed Systems (DNR)

  10. Channels- two interpretations of liveness criterion: • reliable channel – a reliable channel between processes p and q ensures the following: if p executes send(m) and q is correct, then q eventually receives m • quasi reliable channel – a quasi reliable channel between processes p and q ensures the following: if p and q are correct and p executes send(m), then q eventually receives m Distributed Systems (DNR)

  11. reliable vs. quasi-reliable: • let process q be correct; a reliable channel implies if p executes send(m) at time t, and crashes at time t+1, then q must eventually receive m, a useful model of a shared persistent space • a quasi reliable channel is weaker – both p and q must be correct at the same time, a useful model of TCP with error recovery Distributed Systems (DNR)

  12. Best effort broadcast (BEBcast) • burden of ensuring reliability is only on the sender: as long as the sender of a message does not crash, the properties of a quasi reliable channel ensure that all correct processes eventually deliver message • operations: • at p,BEBcast(m): for every process qp, send (m) by reliable unicast • on receive(m) at q : BEBdeliver(m) at q Distributed Systems (DNR)

  13. transport level mechanisms: reliable unicast by TCP (ack-implosion problem) or IP multicast Distributed Systems (DNR)

  14. properties: • validity (a liveness property)– for any two correct processes p and q, every message broadcast by p is eventually delivered by q • integrity (a safety property)– for any message m, every correct process q delivers m at most once, and only if m was previously broadcast by some process p Distributed Systems (DNR)

  15. Distributed Systems (DNR)

  16. Reliable broadcast (RBcast) • in best effort broadcast, if the sender fails immediately after broadcasting to all, as end to end error recovery is not possible in such a case, the correct processes might disagree on whether or not to deliver the message • reliable broadcast ensures that correct process agree on the messages they deliver even when the sender crashes, i.e., adheres to the properties of a reliable channel Distributed Systems (DNR)

  17. reliable broadcast is built on top of best-effort broadcast + failure detector abstraction Distributed Systems (DNR)

  18. operations: • at p,RBcast(m)BEBcast(m) • at qBEBdeliver(m)RBdeliver(m) • if q unreliably detects that p has crashed then BEBcast(m) • note – retransmission received by other correct processes must handle duplicates properly Distributed Systems (DNR)

  19. properties: • validity – if a correct process p broadcasts a message m, then p eventually delivers m • integrity – for a message m, a correct process q delivers m at most once and only if m was previously broadcast by some process p • agreement (a liveness property)– if a correct process p delivers a message m, then m is eventually delivered by every correct process q Distributed Systems (DNR)

  20. Is the following run acceptable? • process p executes RBcast(m) and later crashes; some process qRBdeliversm and then crashes; all other processes are correct, but none of them RBdeliversm • process p executes RBcast(m) and later crashes: validity not violated Distributed Systems (DNR)

  21. uniform reliable broadcast (URBcast) • consider the scenario discussed earlier: process p1 executes RBcast(m) and later crashes; some process p2RBdeliversm and then crashes; all other processes are correct, but none of them RBdeliversm; satisfies reliable broadcast, nevertheless seem to be lacking in some aspect.. Distributed Systems (DNR)

  22. the problem is qRBdeliversm and then only takes a step to rebroadcast if the source failure is detected • URBCAST ensures that a process (correct or not) delivers the message only when it knows that the message has been seen (BEBdeliver) by all correct processes • URB property is important, say if processes are interacting with outside world; a fact that a process has delivered a message is important, even if it has crashed afterwards; because before it had got crashed it might have communicated with external world; other processes must be aware of this situation Distributed Systems (DNR)

  23. agreement property replaced by uniform agreement – if some process (correct or not) p delivers a message m, then m is eventually delivered by every correct process q • reliable channel assumption holds – where, if p executes send(m) to q, q is correct, then eventually q receives m Distributed Systems (DNR)

  24. operations: • at p,URBcast(m)BEBcast(m) • at qBEBdeliver(m); if m received by q for the first time and qp, then BEBcast(m)URBdeliver(m) Distributed Systems (DNR)

  25. Causal order broadcast (COBcast) • reliable broadcast does not guarantee any ordering among messages delivered by different processes • single source FIFO ordering is a special case of causal ordering where messages from the same process should be delivered in the order they were broadcast Distributed Systems (DNR)

  26. practical scenario: • on a publish-subscribe whiteboard p1 broadcasts m1 proposal to all which p2 (sees and) replies with comment m2 to all • here m1  m2 • due to arbitrary delay p3 delivers m2 before m1 and has to withhold m2 • a suitable ‘middleware’ for causal ordering would relieve the programmer from performing such a task Distributed Systems (DNR)

  27. we say that a message m1 may potentially have caused another message m2 (or m1 m2), if any of the following applies • m1 and m2 were broadcast by the same process p and m1 was broadcast before m2 • m1 was delivered by process p, m2 was broadcast by process p, m2 was broadcast after the delivery of m1 • there exist some message m’ such that m1 m’ and m’  m2 Distributed Systems (DNR)

  28. Distributed Systems (DNR)

  29. additional property: • causal delivery – no process p delivers a message m2 unless p has already delivered every message m1 such that m1 m2 • causally ordered broadcast can be achieved in the presence of crash failures • when RBcast is replaced by URBcast, we get a reliable causally ordered broadcast • two implementations discussed: Distributed Systems (DNR)

  30. no-waiting causal broadcast • whenever a process RBdeliver(m), it COdeliver(m) without waiting for other messages to be RBdelivered • algorithm outline: • each message m carries a control field pastm which includes all messages that causally precede m Distributed Systems (DNR)

  31. when a message m is RBdelivered, pastm is first inspected where all messages in pastm that have not been COdelivered must be done so before m it self is COdelievered • each process memorises all messages it has COBcast or COdelivered in a variable past_list • past_list and pastm are ordered sets Distributed Systems (DNR)

  32. at pi: init: past_list = delivered_list = empty; upon <COBcast(m)> { RBcast(m, past_list); past_list = past_list  m;} upon <RBdeliver(pj, pastm, m)> if (mdelivered_list) then { for all messages m’pastm not delivered so far { COdeliver() in deterministic order; delivered_list= delivered_list  m’; past_list= past_list  m’;} COdeliver (pj, m); delivered_list = delivered_listm; past_list=past_list m;} Distributed Systems (DNR)

  33. in the figure above, p4RBdeliver m2 first but since the message carries m1 in its pastm, m1 and m2 are COdelivered in order; finally when m1 is RBdelivered from p1, it is discarded • weakness: long message size due to past casual history carried Distributed Systems (DNR)

  34. waiting causal order broadcast • instead of keeping a record of all past messages, history is now represented by vector clocks • vector clocks essentially capture the causal precedence between messages • waiting COBcast relies on as before, underlying RBcast and RBdeliver primitives Distributed Systems (DNR)

  35. every process p maintains a vector clock that represents the number of messages that p has COdelivered from every other process, i.e., VCp[j], j=1..n, jp, and the number of messages it has itself COBcast, i.e., VCp[p] • this vector is then attached to every message m that pCOBcast • a process q that RBdeliverm interprets this vector time stamp to determine how many messages are missing (if any), and from which process Distributed Systems (DNR)

  36. as far as all previous messages from p are concerned this is VCp[p]-1 and then, all messages received by p before it had sent m, that is VCp[k], kp • process q needs to COdeliever all these missing messages before it can COdeliverm Distributed Systems (DNR)

  37. at p2, interpretation of the vector time stamp [0,2,0] implies that there is one message pending from p1, one message from p1 already RBdelivered but pending COdeliver and, none from p0 Distributed Systems (DNR)

  38. at pi: init: pending = empty;  i,j VCi[j] =0; pending list ordered in increasing order of vector time upon COBcast(m) { COdeliver(pi, m); /receive locally RBcast(VCi, pi, m); VCi[i]++;} upon RBdeliever(VCj, pj, m) { for i  j augment pending with (VCj, pj, m); /ignore messages from self wait until VCj[j]=VCi[j]+1 and ki VCj[k]  VCi[k]; { remove (VCj, pj, m) from pending; COdeliever(pj ,m); VCi[j]++;} } Distributed Systems (DNR)

  39. Total order broadcast (TOBcast) • causal order broadcast enforces a global ordering for all messages that are causally depended on each other • messages that are no so, are said to be concurrent and could be delivered in any order • a total order abstraction orders all messages, even those that are concurrent • it is some times possible to have a total order that does not respect causal order • a convenient abstraction for managing replicated state machines (e.g., in fault tolerant servers) Distributed Systems (DNR)

  40. totally ordered reliable broadcast cannot be achieved in the presence of crash failures when the underlying communication is asynchronous • this is because totally ordered broadcast  consensus; recall that consensus cannot be solved in an asynchronous system with failures (FLP result) • assumptions: asynchronous with no process failures, or synchronous with fail-stop processes • how do we achieve causal-total order broadcast? Distributed Systems (DNR)

  41. properties: • validity – if a correct process p broadcasts a message m, then p eventually delivers m • integrity – for a message m, a correct process q delivers m at most once, and only if m was previously broadcast by some process p • uniform agreement (atomicity in delivery) – if a process p delivers a message m, then m is eventually delivered by every correct process q • uniform total order (an order property) – if a process (correct or faulty) p delivers a message m1 before m2, then every process delivers m2 only after it has delivered m1. Distributed Systems (DNR)

  42. algorithm 1 – asynchronous with no process failures • assume reliable (stronger condition under no failure assumption) and single source FIFO channel (each process stamps sequence numbers) • each process maintains an increasing counter, a time stamp, which is tagged with the message it broadcasts • each process also maintains a vector with estimates of the time stamps of all others Distributed Systems (DNR)

  43. suppose ts[j] is the vector element that corresponds to pj on pi; it says that pi will never again receive a message from pj with a smaller time stamp than or equal to this value • processes use special update time stamp messages to keep up the estimates • RBdelivered messages are queued in a pending list in the order of increasing <time stamp-ts(m): pid> pairs, say ts(m)^; pid used to break a tie • ABdeliver can be done for any message in pending list that has a time stamp greater than all of the elements of the current vector time of a process Distributed Systems (DNR)

  44. at pi: (0  i  n-1) init ts[j] = 0; (0  j  n-1); pending = empty; ABcast (m) { ts[i]++; add (m,ts(i),pi) to pending; RBcast(m,ts[i],pi);} upon RBdeliver(m,ts(msg),pj),ji ignore self msg{ ts[j] = ts(msg); add (m,ts(msg),pj) to pending; if (ts(msg) > ts[i]) then { ts[i] = ts(msg); RBcast(new_ts,ts[i],pi);}} upon RBdeliver(new_ts,ts(new_ts),pj),ji ignore self msg ts[j] = ts(new_ts); delivery_test() /at any time while (m,ts(msg),pj) at head of pending list { k ts(msg) ts[k] { remove(m,ts(msg),pj) from pending; ABdeliver(m);}} Distributed Systems (DNR)

  45. total order broadcast with time stamps Distributed Systems (DNR)

  46. Total order broadcast by consensus • uses reliable broadcast and consensus as building blocks • messages are first disseminated using a reliable broadcast primitive and are stored in a bag of unordered messages at every process • processes then use consensus to order the messages in the bag Distributed Systems (DNR)

  47. algorithm works in rounds • there is one consensus instance per round • messages to be delivered in a round are agreed upon before proceeding to next round • RBcast can be replaced with URBcast to give ‘uniform total order broadcast’ • algorithm 2 – synchronous with fail-stop processes Distributed Systems (DNR)

  48. Distributed Systems (DNR)

  49. init: unordered = delivered = empty; round = 1; wait = false; TOBcast (m) { RBcast(m);} upon RBdeliver(m){ if (mdelivered) then unordered = unordered  m;} upon ((unorderedempty)  (wait = false)) { wait = true; propose(round, unordered); }/ propose() and decide() are consensus primitives upon (m’decide(round)) { / may take f+1 rounds in case of failures delivered = delivered  m’; unordered = unordered \ m’; TOdeliever(m’); round++; wait = false;} Distributed Systems (DNR)

  50. Terminating reliable broadcast (TRBcast) • uniform reliable broadcast says that if some process (correct or not) p delivers a message m, then m is eventually delivered by every correct process q • however, q cannot decide whether it should wait for m or not; q has no means to distinguish the case where some process has delivered m, and where q can indeed wait for m, from the case where no process will ever deliver m, in which case q should definitely not keep waiting for m Distributed Systems (DNR)

More Related