Computer Science 328 Distributed Systems

1 / 30

# Computer Science 328 Distributed Systems - PowerPoint PPT Presentation

Computer Science 328 Distributed Systems Lecture 6 Global States & Causality Detecting Global Properties Process Histories and States For a process P i , history(P i ) = h i = <e i 0 , e i 1 , … > prefix history(P i k ) = h i k = <e i 0 , e i 1 , …,e i k >

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Computer Science 328 Distributed Systems' - niveditha

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Computer Science 328Distributed Systems

Lecture 6

Global States & Causality

Process Histories and States
• For a process Pi,

history(Pi) = hi = <ei0, ei1, … >

prefix history(Pik) = hik = <ei0, ei1, …,eik >

Sik : Pi ‘s state immediately before kth event

• For a set of processes,

global history:H = i (hi)

global state:S = i (Siki)

a cut C  H = h1c1 h2c2 …  hncn

the frontier of C = {eici, i = 1,2, … n}

Consistent cut

Inconsistent cut

Consistent States
• A cut C is consistent if

e  C(if f  e then f  C)

• A global state S is consistent if

it corresponds to a consistent cut

e12

e10

e11

e13

P1

e21

P2

e22

e20

P3

e31

e30

e32

Global States
• ARunis a total ordering of events in H that is consistent with each hi’s ordering
• A Linearization is a run consistent with happens-before () relation in H.
• Linearizations pass through consistent global states.
• A global state Sk is reachable from global state Si, if there is a lineralization, L, that passes through Si and then through Sk.
• A DS evolves as a series of transitions between global states S0 ,S1 , ….
Global State Predicates
• Aglobal-state-predicateis a function from the set of global states to {true, false} , e.g. deadlock, termination
• A stable global-state-predicateis one that once it becomes true, it remains true in subsequent global states, e.g. deadlock
• if P is a gobal-state-predicate of being deadlocked, then

safety S reachable from S0, P(S) = false

• If P is a global-state predicate of reaching termination, then

liveness(P)  L lineralizatoins from S0  SL :passes through SL & P(SL) = true

• We need a way to record global states
The “Snapshot” Algorithm
• Records a set of process and channel states such that the combination is a consistent GS.
• Assumptions:
• No failure, all messages arrive intact, exactly once
• Communication channels are unidirectional and FIFO-ordered
• There is a comm. path between any two processes
• Any process may initiate the snapshot (sends Marker)
• Snapshot does not interfere with normal execution
• Each process records its state and the state of its incoming channels (no central collection)
The “Snapshot” Algorithm (2)

1. Marker sending rule for process Pk

• After Pk has recorded its state
• for each outgoing channel C, send a marker on C

2. Marker receiving rule for a process Pk

on receipt of a marker over channel C

• ifPk has not yet recorded its state
• record Pk’s state
• record the state of C` as “empty”
• turn on recording of messages over other incoming channels

else

• record the state of C` as all the messages received over C` since Pk saved its state
Chandy and Lamport’s ‘Snapshot’ Algorithm
• Marker receiving rule for process pi
• On pi’s receipt of a marker message over channel c:
• if (pi has not yet recorded its state) it
• records its process state now;
• records the state of c as the empty set;
• turns on recording of messages arriving over other incoming channels;
• else
• pi records the state of c as the set of messages it has received over c
• since it saved its state.
• end if
• Marker sending rule for process pi
• After pi has recorded its state, for each outgoing channel c:
• pi sends one marker message over c
• (before it sends any other message over c).

e11,2

e14

e13

M

M

e24

M

M

e21,2,3

M

M

e31

e32,3,4

1- P1 initiates snapshot: records its state (S1); sends Markers to P2 & P3; turns on recording for channels C21 and C31

2- P2 receives Marker over C12, records its state (S2), sets state(C12) = {} sends Marker to P1 & P3; turns on recording for channel C32

3- P1 receives Marker over C21, sets state(C21) = {a}

4- P3 receives Marker over C13, records its state (S3), sets state(C13) = {} sends Marker to P1 & P2; turns on recording for channel C23

5- P2 receives Marker over C32, sets state(C32) = {b}

6- P3 receives Marker over C23, sets state(C23) = {}

7- P1 receives Marker over C31, sets state(C31) = {}

Snapshot Example

e10

e13

P1

a

e23

P2

e20

b

P3

e30

Reachability Between States in Snapshot Algorithm
• Let ei and ej be events occurring at pi and pj, respectively such that ei ej
• the snapshot algorithm ensures that
• if ej is in the cut then ei is also in the cut.
• if ej occurred before pj recorded its state, then ei must have occurred before
• pi recorded its state.

Include(obj1)

obj1.method()

P2 has obj1

Causality Violation

Physical Time

1

2

P1

0

1

P2

0

5

6

2

4

P3

0

4

3

• Causality violation occurs when order of messages causes an action based on information that another host has not yet received.
• In designing a DS, potential for causality violation is important

Violation: (1,0,0) < (2,1,2)

Detecting Causality Violation

Physical Time

1,0,0

2,0,0

0,0,0

P1

(1,0,0)

P2

0,0,0

2,1,2

2,2,2

(2,0,0)

(2,0,2)

P3

0,0,0

2,0,2

2,0,1

• Potential causality violation can be detected by vector timestamps.
• If the vector timestamp of a message is less than the local vector timestamp, on arrival, there is a potential causality violation.
Communication Modes in DS
• Unicast (best effort or reliable)
• Messages are sent from exactly one host to one host
• Best effort guarantees that if a message is delivered it would be intact
• Reliable guarantees delivery of messages.
• Messages are sent from exactly one host to all hosts on the same network.
• Reliable broadcast protocols are not practical
• Multicast
• Messages are sent from exactly one host to several hosts on the same or different networks.
• Multicast can be implemented above a reliable unicast
Basic Multicast
• A basic multicast primitive guarantees a correct process will eventually deliver the message, as long as the multicaster does not crash.
• A straightforward way to implement B-multicast is to use a reliable one-to-one send operation:
• B-multicast(g,m): for each process p in g, send (p,m).
Reliable Multicast
• Integrity: A correct process pdelivers a message m at most once.
• Validity: If a correct process multicasts message m, then it will eventually deliver m.
• Guarantees liveness to the sender.
• Agreement: If a correct process delivers message m, then all the other correct processes in group(m) will eventually deliver m.
• Property of “all or nothing.”
• Validity and agreement together ensure that overall liveness: if one process (the sender) eventually delivers a message m, then, since the correct processes agree on the set of messages they deliver, it follows that m will eventually be delivered to all the group’s correct members.
Ordered Multicast
• FIFO ordering: If a correct process issues multicast(g,m) and then multicast(g,m’), then every correct process that delivers m’ will deliver m before m’.
• Causal ordering: If multicast(g,m)  multicast(g,m’) then any correct process that delivers m’ will deliver m before m’.
• Totally ordering: If a correct process delivers message m before m’, then any other correct process that delivers m’ will deliver m before m’.
Total, FIFO and Causal Ordering
• Totally ordered messages T1 and T2.
• FIFO-related messages F1 and F2.
• Causally related messages C1 and C3
• Total ordering does not imply causal ordering.
• Causal ordering does not imply total ordering.
• Hybrid mode: causal-total ordering, FIFO-total ordering.

os.interesting

Bulletin board:

From

Subject

Item

23

A.Hanlon

Mach

24

G.Joseph

Microkernels

25

A.Hanlon

Re: Microkernels

26

T.L’Heureux

RPC performance

27

M.Walker

Re: Mach

end

Display From Bulletin Board Program
Ordering Guarantees (FIFO)
• Process messages from each host in the order they were sent:
• Each process keeps a sequence number for each host.
• When a message is received,

as expected, accept

higher than expected, buffer in a queue

lower than expected, reject

If Message# is

Implementing FIFO Ordering
• Spg: the number of messages p has sent to g.
• Rqg: the sequence number of the latest group-g message p has delivered from q.
• For p to FO-multicast m to g
• p piggy backs the value Spg onto the message.
• p B-multicasts m to g.
• p increments Spg by 1.
• Upon receipt of m from q with sequence number S:
• p checks whether S=Rqg+1. If so, p FO-delivers m.
• If S > Rqg+1, p places the message in the hold-back queue until the intervening messages have been delivered and S=Rqg+1.

Accept: 2 = 1 + 1

Accept 1 = 0 + 1

Reject: 1 < 1 + 1

2 0 0

Accept 1 = 0 + 1

2 0 0

Buffer 2 > 0 + 1

Accept Buffer 2 = 1 + 1

Example: FIFO Multicast

Physical Time

Reject: 1 < 1 + 1

2 0 0

2 1 0

1 0 0

2 1 0

P1

0 0 0

1

2

2

1

1

1

P2

0 0 0

2 1 0

2 0 0

1 0 0

1

P3

0 0 0

2 1 0

0 0 0

1 0 0

Accept: 1 = 0 + 1

0 0 0

Sequence Vector

ISIS algorithm for total ordering

P

2

1 Message

3

2

P

2

4

2 Proposed Seq

1

3 Agreed Seq

1

2

P

1

3

P

3

Causal Ordering using vector timestamps

The number of group-g messages

from process j that happened before

the current message to be multicast

0,0,0

1,1,0

1,0,0

Accept:

Accept Buffered message

Example: Causal Ordering Multicast

Reject:

Accept

1,0,0

1,1,0

1,1,0

P1

(1,1,0)

(1,1,0)

(1,0,0)

P2

0,0,0

1,1,0

1,0,0

(1,0,0)

(1,1,0)

P3

0,0,0

1,1,0

Accept

Buffer, missing P1(1)

Physical Time