1 / 33

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms Author: Ozalp Babaoglu and Keith Marzullo Distributed Systems: 526 U1580 Professor: Ching-Chi Hsu. Introduction.

tamyra
Download Presentation

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms Author: Ozalp Babaoglu and Keith Marzullo Distributed Systems: 526 U1580 Professor: Ching-Chi Hsu

  2. Introduction • Many problems in distributed computing can be cast as executing some notification or reaction when the state of the system satisfies a particular condition • Global Predicate Evaluation (GPE): to establish the truth of a Boolean expression whose variables may refer to the global systems state • A global state may not be consistent • Asynchronous system: • no bounds on the relative speeds of processes and message delays • Impossible to maintain synchronized local clocks • Communication remains the only possible mechanism for synchronization • channels are reliable but may deliver messages out of order

  3. Outline • Two Class of solutions to the GPE problem: • A reactive-architecture: each process, when executing an event, notify P0 by sending it a message describing the event • A snapshot architecture: the monitor P0 sends each process a ‘state enquiry’ message.

  4. Definitions (1) • distributed systems: a collection of sequential processes p1, p2, ..., pn networked by unidirectional communication channels • events: the activity of each sequential process, which can be internal events or communications: send(m) or receive(m) with another process • local history of process pi : hi = ei1ei2... • global history: H = h1h2... hn • cause-effect relation '->': • If eik, eilhi and k<l, then eikeil • If ei = send(m) and ej = receive(m), then ei ej • If e e' and e' e'', then e e'' • Concurrent e||e': neither e e' nor e' e

  5. Definitions (2) • distributed computation: a partially ordered set defined by the pair (H, ) • space-diagram: representation of a distributed computation e11 e12 e13 e14 e15 e16 p1 e22 p2 e21 e23 p3 e31 e32 e33 e34 e35 e36

  6. Definitions (3) • local state of pi immediately after executing event eik is denoted by ik • global state: (, ..., n) • a cut C(c1,...,cn) is a subset of global history H and contains an initial prefix of each of the local histories, i.e. C  h1c1hncn • a run R is a total ordering of all events in H and is consistent with each local history • Example: pp6 • Note that a single distributed computation may have many runs

  7. Example • Insistent cut and phantom deadlock e11 e12 e13 e14 e15 e16 p1 resp req req resp e22 p2 e21 e23 req req p3 e31 e32 e33 e34 e35 e36 C C’

  8. Consistency • A consistent cut C, is such that •  e and e', (e C)(e' e) => e' C • A consistent global state is one corresponding to a consistent cut • Aconsistent run R, is such that •  e and e', (e e') => e appears before e' in R • Example: pp6 • If the run is consistent then all the global states in the sequence will be consistent as well

  9. Observing Distributed Computations • A monitor p0 will assume a passive role in that it will not send any messages of its own • The application processes notify p0 by sending it a message whenever they execute an event • The monitor p0 constructs an observation of the underlying distributed computation as the events arrived • Due to the variability of message delays, an observation can correspond to a consistent run, an inconsistent run or no run at all • O1 = e21e11e31e32e34e12e22e33e13e14e35.... => not a run • O2 = e11e31e21e32e12e33e34e13e22e35e36.... => inconsistent run • O3 = e31e21e11e12e32e33e13e34e14e22e15.... => consistent run • To restore order of messages by defining a delivery rule for deciding when received messages are to be presented to the application process

  10. FIFO delivery • First-In-First-Out(FIFO) delivery • for all messages m and m' from pi to pj • if sendi(m) sendi(m') => deliverj(m) deliverj(m') • FIFO can be implemented by adding sequence numbers to messages • While FIFO delivery is sufficient to guarantee that observations correspond to runs, it is not sufficient to guarantee consistent observations

  11. Observing Distributed Computations with Real-Time Clocks • Environment: • message delays are bounded by  • channels are FIFO • existence of a global real-time clock • each message includes RC(e), the global real-time clock when event e occurs, as its timestamp • DR1: • At time t, deliver all received messages with timestatmps up to t- in increasing timestamp order • Observation is consistent iff the following is satisfied • Clock condition: e e' => RC(e) < RC(e')

  12. Observing Distributed Computations with Logical Clocks • Environment: • channels are FIFO • asynchronous communication • implementation of logical clocks • each message includes LC(e), the logical clock when event e occurs, as its timestamp • DR2: • Deliver all messages that are stable at p0 in increasing timestamp order • Note: a message m is stable at p if no future messages with timestamp < TS(m) • Given FIFO channels, m is stable at p0 when p0 has received at least one message with timestamp>TS(m) from all other processes

  13. Logical Clocks • Logical Clock • each process pi maintains a local variable LCi • when a new event ei occurs, pi modifies LCi to • LCi + 1 if ei is an internal or send event • max{ LCi, TS(m)} + 1 if ei = receive(m) 1 2 4 5 6 7 p1 5 p2 1 6 p3 1 2 3 4 5 7

  14. Observing Distributed Computations with Causal Delivery • Causal Delivery (CD): • sendi(m) sendj(m') => deliverk(m) deliverk(m') • If p0 uses a delivery rule satisfying CD, then all of its observations will be consistent

  15. Efficient Delivering • For implementing causal delivery, what is really needed is an effective procedure for deciding: • given events e,e' that are causally related and their clock values, does there exists some other event e'' such that e e'' e' • Given RC(e) <RC(e') (or LC(e)<LC(e')), it may be that • e e' or e|| e', i.e. e' e) • The above observations suggest a timing mechanism TC whereby causal precedence relations between events can be deduced from their timstamps • Stong Clock Condition: • e e' TC(e) < TC(e')

  16. Causal History (1) • Causal history of event e • (e) = { e' H | e' e} {e} • That is, (e) is the smallest consistent cut that includes e e11 e12 e13 e14 e15 e16 p1 e22 p2 e21 e23 p3 e33 e34 e35 e36 e31 e32 Causal history of event e14

  17. Causal Histories (2) • Maintaining Causal History • Each process pi initializes local variable i to be  • Each message m contains a timestamp TS(m) which is the causal history of its send event • Scheme • If ei is internal or send event, • then i={ei} the causal history of the previous local event • If ei is the receive of message m by process pi from pj • then i={ei} the causal history of the previous local event of pi • the causal history of the corresponding send event at pj • The strong clock condition is satisfied if clock comparison is interpreted as set inclusion • e e' (e) (e') or e e' e (e') if e  e' • Problem: the causal histories will grow rapidly

  18. Vector Clocks • The causal history of an event can be represented as a fixed-dimensional vector VC(e)[1..n] rather than a set, where • VC(e)[i] = k, iff i(e) = hik for i = 1,2,...,n (1,0,0) (2,1,0) (3,1,3) (4,1,3) (5,1,3) (6,1,3) p1 (1,2,4) p2 (0,1,0) (4,3,4) p3 (0,0,1) (1,0,2) (1,0,3) (1,0,4) (1,0,5) (1,0,6)

  19. Maintaining Vector Clocks • Maintaining Vector clock • Each process pi maintains a local vector VCi[1..n] • Each message m contains a timestamp TS(m) which is the vector clock value VC(e)of its send event e • Scheme • if ei is an internal or send event • VCi [i]= VCi [i] + 1, and VC(ei)=VCi • if ei = receive(m) • VCi = max { VCi , TS(m) } • VCi [i] = VCi [i] + 1 • VC(ei)[j] number of events of pj that causally precede event ei of pi • V < V'  (VV')k: 1kn: V[k] V'[k])

  20. Properties of Vector Clocks • Properties of Vector Clocks • Strong Clock Condition  Simple Strong Clock Condition • e e' VC(e) < VC(e') ei ejVC(ei)[i] VC(ej)[i] • Concurrent • ei||ej VC(ei)[i] VC(ej)[i]) (VC(ej)[j] VC(ei)[j]) • Pairwise Inconsistent • i j, VC(ei)[i] VC(ej)[i])  (VC(ej)[j] VC(ei)[j]) • Consistent Cut (c1,c2, ..., cn) iff • i, j: 1 i,j  n, VC(eici)[i] VC(ejcj)[i] • Counting: the number of events precedes e is givent by #(e) • #(e) =nj=1 VC(e)[j] -1 • Weak Gap-Detection: Given ei and ej • if VC(ei)[k] < VC(ej)[k] for some k  j, • then ek such that (ek ei)  (ek ej)

  21. Implementing Causal Deliberywith Vector Clocks • Babaoglu & Marzullo • monitor p0 maintains an array D[1..n] where D[i] contains TS(mi)[i] where mi is the last message delivered from process pi • DR3: • Deliver message m from process pj when both of the following is satisfied • D[j] = TS(m)[j] -1 => guarantee FIFO • D[k]  TS(m)[k], k  j => guarantee Causal Relation • DR4: • Monitor p0 maintains an counter D • Deliver message m of event ei as soon as • D = #(ei) - 1

  22. Causal Delivery with vector ClockExamples (1,0) (1,1) (1,2) (2,2) (3,2) p0 [0,0] (1,0) (2,2) p1 (0,0) (3,2) p2 (0,0) (1,1) (1,2)

  23. Distributed Snapshots • In this strategy, p0 will request the states of the other processes and then combined them into a global state • Definition: • channel state: for each channel from pi to pj, • i,j = set difference between i and j • incoming channels of process pi :INi • outgoing channels of process pi :OUTi • Snapshot Protocols • Chandy and Lamport [1985] • Morgan[1985]

  24. Snapshot Protocol 1 • Assumption: • existence of a global real-time clock : RC • Each message is attached with timestamp • Message delays are bounded • global clock algorithm • P0 sends [take snapshot at tss] to all processes • When clock RC reads tss, each process pi do the following • records its local state i, • sends an empty message over all its outgoing channels • and starts recording all message received over each incoming channels • For the time pi receives a message from pj with timestamp greater than or equal to tss, pi stops recording messages for that channel

  25. Snapshot Protocol 2 • Assumption: • Bounded message delays • Channels are FIFO • Chandy & Lamport • P0 send [take snapshot] to itself • For each process pi receiving [take snapshot] • If it is the first time • records its local state i • sends each out-going channels [take snapshot] • starts recording messages from other incoming channels • If it is not the first time • stops recording message from that incoming channel

  26. Chandy & Lamport (1985) p0 e11 e12 e13 e14 e15 e16 p1 e1* p2 e21 e22 e23 e24 e25 e2* • Real computation R= e21 e11 e12 e13 e22 e14 e23 e24 e15 e25 e16 • in terms of global state =00 0111 21 31 32 42 43 44 54 55 65

  27. Properties of Snapshots • Definition • a : the global state in which the snapshot protocol is initiated, • f : the global state in which the protocol terminates and • S : the global state constructed • ei* denote the event when pi receives [take snapshot] for the first time, causing pi to start recording its state • let the time be ti when ei* occurs • ei is a prerecordering event if ei ei*, • otherwise it is a post-recording event • Properties • Then there exists a run R' such that a S f • That is to say S could have happened

  28. Argumentation (1) • Chandy & Lamport(1985) • consider any (post-recordering, prerecordering) pair (e, e') • then e  e') • swapping all such events will result in another consistent run R' • swap (e13 , e22 ) r1= e21 e11 e12 e22 e13 e14 e23 e24 e15 e25 e16 • swap (e14 , e23 ) r2= e21 e11 e12 e22 e13 e23 e14 e24 e15 e25 e16 • swap (e13 , e23 ) R'= e21 e11 e12 e22 e23 e13 e14 e24 e15 e25 e16 • the global state after executing the last prerecording event (e23 ) in R' is S (=23), the constructed global state • If the computation goes in this run, S could have happen

  29. Argumentation (2) • Lai & Yang(1987) • Let GSN(ti:piP) be a snapshot taken between 1 and 2, during the computation R. • Let =2-1, construct R' as follows: • R' is the same as R except that every post-recording event in R is now postponed for d units of time, that is • R'(t) =R(t) if R(t) is an event at piand tti • R(t-) if R(t-) is an event at pi and t-ti •  otherwise • Example

  30. Properties of Global Predicates • Stable Predicates • Many system properties one wishes to detect have the characteristic that once they become true, they remain true • If  is a stable predicate, since a S f • ( is true in s ) => ( is true in f ) • ( is false in s ) =>( is false in a ) • Nonstable Predicates • the condition encoded by the predicate may not persist long enough for it to be true when the predicate is evaluated • if a predicate  is found to be true by the monitor, we do not know whether  ever held during the actual run

  31. Nonstable Predicates • Two problems • The condition encoded by the predicate may not persist long enough for it to be true when the predicate is evaluated • If a predicate F is found to be true by the monitor, we do not know whether F ever held during the actual run • The predicate may have held even if it is not detected, and even if it is detected it may have never held. • Extended nonstable global predicate: apply to the entire distributed computation • Possibly(F) • Definitely(F)

  32. Detecting Possibly and Definitely F • Smin (sik) : the global state with the smallest level in the lattice containing sik • Smax(sik) : the global state with the largest level in the lattice containing sik • Examples: Smin (s13) = S31,Smax (s13) = S33 • Smin(sik) = (s1c1,s2c2,…,sncn ): j: VC(sjcj)[j]=VC(sik)[j] • Smax(sik) = (s1c1,s2c2,…,sncn ): j: VC(sjcj)[i]<=VC(sik)[i] and ((sjCj = sjf) or (VC(sjCj+1)[i] > VC(sjk)[i])) • The minimum level containing sjk is the sum of components of the vector timestamp VC(sjk) • An algorithm for detecting Definitely(F): O(kn): k is the maximum number of events a monitored process has executed

  33. Example

More Related