Global Predicate Detection
Sponsored Links
This presentation is the property of its rightful owner.
1 / 116

Global Predicate Detection and Event Ordering PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on
  • Presentation posted in: General

Global Predicate Detection and Event Ordering. Our Problem. To compute predicates over the state of a distributed application. Model. Message passing No failures Two possible timing assumptions: Synchronous System Asynchronous System No upper bound on message delivery time

Download Presentation

Global Predicate Detection and Event Ordering

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


  • Global Predicate Detection

  • and Event Ordering


Our Problem

To compute predicates

over the state of

a distributed application


Model

  • Message passing

  • No failures

  • Two possible timing assumptions:

    • Synchronous System

    • Asynchronous System

      • No upper bound on message delivery time

      • No bound on relative process speeds

      • No centralized clock


Asynchronous systems

  • Weakest possible assumptions

  • Weak assumptions ´ less vulnerabilities

  • Asynchronous  slow

  • “Interesting” model w.r.t. failures


s

c

Client-Server

  • Processes exchange messages using

  • Remote Procedure Call (RPC)

A client requests a service by sending the server a message. The client blocks while waiting for a response


#!?%!

s

c

Client-Server

  • Processes exchange messages using

  • Remote Procedure Call (RPC)

A client requests a service by sending the server a message. The client blocks while waiting for a response

The server computes the response (possibly asking other servers) and returns it to the client


Deadlock!


Goal

  • Design a protocol by which a processor can determine whether a global predicate (say, deadlock) holds


Wait-For Graphs

  • Draw arrow from pi to pj if pj has received a request but has not responded yet


Wait-For Graphs

  • Draw arrow from pi to pj if pj has received a request but has not responded yet

  • Cycle in WFG ) deadlock

  • Deadlock )¦ cycle in WFG


The protocol

  • p0 sends a message to p1 p3

  • On receipt of p0 ‘s message, pi replies with its state and wait-for info


An execution


An execution


Ghost Deadlock!

An execution


We have a problem...

  • Asynchronous system

    • no centralized clock, etc. etc.

  • Synchrony useful to

    • coordinate actions

    • order events


Events and Histories

  • Processes execute sequences of events

  • Events can be of 3 types: local, send, and receive

  • epi is the i-th event of process p

  • The local historyhp of process p is the sequence of events executed by process

    • hpk : prefix that contains first k events

    • hp0 : initial, empty sequence

  • The historyH is the set hp0 [ hp1 [ … hpn-1

NOTE: In H, local histories are interpreted as sets, rather than sequences, of events


time

Ordering events

  • Observation 1:

    • Events in a local history are totally ordered


Ordering events

  • Observation 1:

    • Events in a local history are totally ordered

  • Observation 2:

    • For every message m, send(m) precedes receive(m)

time

time

time


Happened-before(Lamport[1978])

  • A binary relation defined over events

    • if eik,eil2hi and k<l , then eik !eil

    • if ei=send (m) and ej=receive(m), then ei!ej

    • if e!e’ and e’!e‘’, then e!e‘’


Space-Time diagrams

  • A graphic representation of a distributed execution

time


Space-Time diagrams

  • A graphic representation of a distributed execution

time


Space-Time diagrams

  • A graphic representation of a distributed execution

time


Space-Time diagrams

  • A graphic representation of a distributed execution

time


H and impose a partial order

Space-Time diagrams

  • A graphic representation of a distributed execution

time


Space-Time diagrams

  • A graphic representation of a distributed execution

time

H and impose a partial order


Space-Time diagrams

  • A graphic representation of a distributed execution

time

H and impose a partial order


Space-Time diagrams

  • A graphic representation of a distributed execution

time

H and impose a partial order


Runs andConsistent Runs

  • A run is a total ordering of the events in H that is consistent with the local histories of the processors

    • Ex: h1,h2, … ,hn is a run

  • A run is consistent if the total order imposed in the run is an extension of the partial order induced by!

  • A single distributed computation may correspond to several consistent runs!


Cuts

  • A cut C is a subset of the global history of H


Cuts

  • A cut C is a subset of the global history of H

  • The frontier of C is the set of events


Global states and cuts

  • The global state of a distributed computation is an tuple of n local states

  •  = (1 ... n )

  • To each cut (1c1, ... ncn) corresponds a global state


Consistent cuts and consistent global states

  • A cut is consistent if

  • A consistent global state is one corresponding to a consistent cut


What sees


What sees

  • Not a consistent global state: the cut contains the event corresponding to the receipt of the last message by p3 but not the corresponding send event


Our task

  • Develop a protocol by which a processor can build a consistent global state

  • Informally, we want to be able to take a snapshot of the computation

  • Not obvious in an asynchronous system...


Our approach

  • Develop a simple synchronous protocol

  • Refine protocol as we relax assumptions

  • Record:

    • processor states

    • channel states

  • Assumptions:

    • FIFO channels

    • Each m timestamped with with T(send(m))


Snapshot I

  • 1. p0 selects tss

  • 2.p0 sends “take a snapshot at tss” to all processes

  • 3. when clock of pi reads tssthen

    • records its local state i

    • starts recording messages received on each of incoming channels

    • stops recording a channel when it receives first message with timestamp greater than or equal to tss


Snapshot I

  • 1. p0 selects tss

  • 2.p0 sends “take a snapshot at tss” to all processes

  • 3. when clock of pi reads tssthen

    • records its local state i

    • sends an empty message along its outgoing channels

    • starts recording messages received on each of incoming channels

    • stops recording a channel when it receives first message with timestamp greater than or equal to tss


Correctness

Theorem:Snapshot I produces a consistent cut

Proof:

Need to prove

< Definition >

< 0 and 1>

< 5 and 3>

< Assumption >

< Property of real time>

< Definition >

< Assumption >

< 2 and 4>


Clock Condition

< Property of real time>

Can the Clock Condition be implemented some other way?


Lamport Clocks

  • Each process maintains a local variable LC

  • LC(e) = value of LC for event e


Increment Rules

Timestamp m with


Space-Time Diagrams and Logical Clocks

3

2

6

7

8

7

1

8

9

4

5

6


A subtle problem

  • whenLC=tdo S

    • doesn’t make sense for Lamport clocks!

    • there is no guarantee that LC will ever be t

    • S is anyway executed after

  • Fixes:

    • If e is internal/send and LC = t-2

      • execute e and then S

    • If e = receive(m) Æ (TS(m) ¸ t) Æ (LC · t-1)

      • put message back in channel

      • re-enable e; set LC=t-1; execute S


An obvious problem

  • No tss !

  • Choose large enough that it cannot be reached by applying the update rules of logical clocks


An obvious problem

  • No tss!

  • Choose large enough that it cannot be reached by applying the update rules of logical clocks

  • Doing so assumes

    • upper bound on message delivery time

    • upper bound relative process speeds

  • Better relax it


Snapshot II

  • p0selects 

  • p0 sends “take a snapshot at tss” to all processes; it waits for all of them to reply and then sets its logical clock to 

  • when clock of pi reads  then pi

    • records its local state i

    • sends an empty message along its outgoing channels

    • starts recording messages received on each incoming channel

    • stops recording a channel when receives first message with timestamp greater than or equal to 


empty message:

take a snapshot at

monitors

channels

records

local state

Process does nothing for the protocol during this time!

sends empty message:

Relaxing synchrony

Use empty message to announce snapshot!


Snapshot III

  • Processor p0 sends itself “take a snapshot “

  • when pireceives “take a snapshot” for the first time from pj:

    • records its local state i

    • sends “take a snapshot” along its outgoing channels

    • sets channel from pj to empty

    • starts recording messages received over each of its other incoming channels

  • when receives “take a snapshot” beyond the first time from pk:

    • pi stops recording channel from pk

  • when pi has received “take a snapshot” on all channels, it sends collected state to p0 and stops.


Snapshots: a perspective

  • The global state s saved by the snapshot protocol is a consistent global state


Snapshots: a perspective

  • The global state s saved by the snapshot protocol is a consistent global state

  • But did it ever occur during the computation?

    • a distributed computation provides only a partial order of events

    • many total orders (runs) are compatible with that partial order

    • all we know is that scould have occurred


Snapshots: a perspective

  • The global state s saved by the snapshot protocol is a consistent global state

  • But did it ever occur during the computation?

    • a distributed computation provides only a partial order of events

    • many total orders (runs) are compatible with that partial order

    • all we know is that scould have occurred

  • We are evaluating predicates on states that may have never occurred!


An Execution and its Lattice


An Execution and its Lattice


An Execution and its Lattice


An Execution and its Lattice


An Execution and its Lattice


An Execution and its Lattice


An Execution and its Lattice


An Execution and its Lattice


An Execution and its Lattice


An Execution and its Lattice


An Execution and its Lattice


An Execution and its Lattice


An Execution and its Lattice


An Execution and its Lattice


Reachability

  • klis reachable from ij if there is a path from kl to ij in the lattice


Reachability

  • klis reachable from ij if there is a path from kl to ij in the lattice


Reachability

  • klis reachable from ij if there is a path from kl to ij in the lattice


Reachability

  • klis reachable from ij if there is a path from kl to ij in the lattice


So, why do we care about s again?

  • Deadlock is a stable property

    • Deadlock Deadlock

  • If a run R of the snapshot protocol starts in i and terminates in f, then


So, why do we care about s again?

  • Deadlock is a stable property

    • Deadlock Deadlock

  • If a run R of the snapshot protocol starts in i and terminates in f, then

  • Deadlock in s implies deadlock in f

  • No deadlock in s implies no deadlock in i


Same problem, different approach

  • Monitor process does not query explicitly

  • Instead, it passively collects information and uses it to build an observation.

  • (reactive architectures, Harel and Pnueli [1985])

  • An observation is an ordering of event of the distributed computation based on the order in which the receiver is notified of the events.


Observations: a few observations

  • An observation puts no constraint on the order in which the monitor receives notifications


Observations: a few observations

  • An observation puts no constraint on the order in which the monitor receives notifications


Observations: a few observations

  • An observation puts no constraint on the order in which the monitor receives notifications


Causal delivery

  • FIFO delivery guarantees:


Causal delivery

  • FIFO delivery guarantees:

  • Causal delivery generalizes FIFO:


Causal delivery

  • FIFO delivery guarantees:

  • Causal delivery generalizes FIFO:

send event

receive event

deliver event


Causal delivery

  • FIFO delivery guarantees:

  • Causal delivery generalizes FIFO:

send event

receive event

deliver event


Causal delivery

  • FIFO delivery guarantees:

  • Causal delivery generalizes FIFO:

send event

receive event

deliver event


1

Causal delivery

  • FIFO delivery guarantees:

  • Causal delivery generalizes FIFO:

send event

receive event

deliver event


1

2

Causal delivery

  • FIFO delivery guarantees:

  • Causal delivery generalizes FIFO:

send event

receive event

deliver event


Causal Deliveryin Synchronous Systems

  • We use the upper bound  on message delivery time


Causal Deliveryin Synchronous Systems

  • We use the upper bound on message delivery time

  • DR1:At time t , p0 delivers all messages it received with timestamp up to t- in increasing timestamp order


Causal Deliverywith Lamport Clocks

  • DR1.1: Deliver all received messages in increasing (logical clock) timestamp order.


Causal Deliverywith Lamport Clocks

  • DR1.1: Deliver all received messages in increasing (logical clock) timestamp order.

1


4

Causal Deliverywith Lamport Clocks

  • DR1.1: Deliver all received messages in increasing (logical clock) timestamp order.

1

Should p0 deliver?


4

Causal Deliverywith Lamport Clocks

  • DR1.1: Deliver all received messages in increasing (logical clock) timestamp order.

  • Problem:Lamport Clocks don’t provide gap detection

1

Should p0 deliver?

Given two events e and e’ and their clock values LC(e) and LC(e’) —where LC(e) < LC(e’),determine whether some e’’event exists s.t. LC(e) <LC(e’’) < LC(e’)


Stability

  • DR2: Deliver all received stable messages in increasing (logical clock) timestamp order.

  • A message m received by p is stable at p if p will never receive a future message m s.t. TS(m’) <TS(m)


Implementing Stability

  • Real-time clocks

    • wait for  time units


Implementing Stability

  • Real-time clocks

    • wait for time units

  • Lamport clocks

    • wait on each channel for m s.t. TS(m) > LC(e)

  • Design better clocks!


Clocks and STRONG Clocks

  • Lamport clocks implement the clock condition:

  • We want new clocks that implement the strong clock condition:


Causal Histories

  • The causal history of an event e in (H,!) is the set


Causal Histories

  • The causal history of an event e in (H,!) is the set


Causal Histories

  • The causal history of an event e in (H,!) is the set


How to build

  • Each process : pi

    • initializes =0

    • if eikis an internal or send event, then

    • if eik is a receive event for message m, then


Pruning causal histories

  • Prune segments of history that are known to all processes (Peterson, Bucholz and Schlichting)

  • Use a more clever way to encode (e)


Vector Clocks

  • Consider i(e), the projection of (e) on pi

  • i(e) is a prefix of hi :i(e) = hiki – it can be encoded using ki

  • (e) = 1(e) [ 2(e) [ . . . [ n(e) can be encoded using

Represent using an n-vector VC such that


Update rules

Message m is timestamped with


Example

[1,0,0]

[2,1,0]

[5,1,2]

[3,1,2]

[4,1,2]

[1,2,3]

[0,1,0]

[4,3,3]

[1,0,1]

[1,0,2]

[1,0,3]

[5,1,4]


Operational interpretation

  • =

  • =

[1,0,0]

[2,1,0]

[5,1,2]

[3,1,2]

[4,1,2]

[1,2,3]

[0,1,0]

[4,3,3]

[1,0,1]

[1,0,2]

[1,0,3]

[5,1,4]


Operational interpretation

  • ´ no. of events executed pi by up to and including ei

  • ´

[1,0,0]

[2,1,0]

[5,1,2]

[3,1,2]

[4,1,2]

[1,2,3]

[0,1,0]

[4,3,3]

[1,0,1]

[1,0,2]

[1,0,3]

[5,1,4]


Operational interpretation

  • ´ no. of events executed pi by up to and including ei

  • ´ no. of events executed by pj that happen before ei of pi

[1,0,0]

[2,1,0]

[5,1,2]

[3,1,2]

[4,1,2]

[1,2,3]

[0,1,0]

[4,3,3]

[1,0,1]

[1,0,2]

[1,0,3]

[5,1,4]


VC properties:event ordering

  • Given two vectors V and V+, less than is defined as:

  • V<V+´ (VV+) Æ(8k : 1·k·n : V[k]·V+[k])

  • Strong Clock Condition:

  • Simple Strong Clock Condition: Given ei of pi and ej of pj, where i  j

  • Concurrency: Given ei of pi and ej of pj, where i  j


VC properties: consistency

  • Pairwise inconsistency

  • Events ei of pi and ej of pj(ij) are pairwise inconsistent (i.e. can’t be on the frontier of the same consistent cut) if and only if

  • Consistent Cut

  • A cut defined by (c1,. . .,cn) is consistent if and only if


[2,0,1]

[2,2,2]

[0,0,2]

VC properties:weak gap detection

  • Weak gap detection

  • Given ei of piand ej of pj, if VC(ei)[k]<VC(ej)[k] for some k j, then there exists ek s.t


[2,0,1]

[2,2,2]

[0,0,2]

VC properties:weak gap detection

  • Weak gap detection

  • Given ei of piand ej of pj, if VC(ei)[k]<VC(ej)[k] for some k j, then there exists ek s.t

[1,0,1]

[2,1,1]

[0,0,1]


VC properties:strong gap detection

  • Weak gap detection

  • Given ei of piand ej of pj, if VC(ei)[k]<VC(ej)[k] for some k j, then there exists ek s.t

  • Strong gap detection

  • Given ei of piand ej of pj, if VC(ei)[i]<VC(ej)[i] for some k j, then there exists ei’ s.t


VCs for Causal Delivery

  • Each process increments the local component of its VConly for events that are notified to the monitor

  • Each message notifying event e is timestamped with VC(e)

  • The monitor keeps all notification messages in a set M


Stability

  • Suppose p0has received mj from pj.

  • When is it safe for p0 to deliver mj ?


Stability

  • Suppose p0has received mj from pj

  • When is it safe for p0 to deliver mj ?

    • There is no earlier message in M


no. of pj messages delivered by p0

Stability

  • Suppose p0has received mj from pj

  • When is it safe for p0 to deliver mj ?

    • There is no earlier message in M

    • There is no earlier message from pj


no. of pj messages delivered by p0

Stability

  • Suppose p0has received mj from pj

  • When is it safe for p0 to deliver mj ?

    • There is no earlier message in M

    • There is no earlier message from pj

    • There is no earlier message mk’’ from pk (kj) … ?


Checking for .

  • Let mk’ be the last message p0delivered from pk

  • By strong gap detection, mk’’ exists only if

  • Hence, deliver mj as soon as


The protocol

  • p0 maintains an array D[1,. . .,n] of counters

  • D[i]=TS(mi)[i] where mi is the last message delivered from pi

  • DR3:Deliver m from pj as soon as both of the following conditions are satisfied:


  • Login