( a consistent cut C) (b happened before a) b C If this is not true, then the cut is inconsistent. Consistent cut. A cut is a set of events . b. g. c. a. d. P1. e. m. f. P2. P3. k. h. i. j. Cut 1. Cut 2. (Not consistent). (Consistent).
The set of states immediately following a consistent cut forms a consistent snapshot of a distributed system.
How to record a consistent snapshot? Note that
1. The recording must be non-invasive
2. Recording must be done on-the-fly.
You cannot stop the system.
Step 2. Every other process, upon receiving a marker for the first time (and before doing anything else) (a) Turns red (b) Records its own state (c) sends markers along all outgoing channels
The algorithm terminates when (1) every process turns red, and (2) Every process has received a marker through each incoming channel.Two steps
Hint. A pair of actions (a, b) can be scheduled in any order, if there is no causal order between them, so (a; b) is equivalent to (b; a)Why does it work?
Easy conceptualization of the snapshot state
Let an observer observe the following actions:
w[i] w[k] r[k] w[j] r[i] w[l] r[j] r[l] …
w[i] w[k] w[j] r[k] r[i] w[l] r[j] r[l] … [Lemma 1]
w[i] w[k] w[j] r[k] w[l] r[i] r[j] r[l] … [Lemma 1]
w[i] w[k] w[j] w[l] r[k] r[i] r[j] r[l] … [done!]
the tokens circulating in the systemExample 1. Count the tokens
Are these consistent cuts?
How to account for the channel states?
Use sent and received variables for each process.
Let machine i start Chandy-lamport snapshot before it has sent M along ch1. Also, let machine j receive the marker after it sends out M’ along ch2. Observe that the snapshot state is
down up M’
Doesn’t this appear strange? This state was never reached during the computation!
The observed state is a feasible state that is reachable
from the initial configuration. It may not actually be visited
during a specific execution.
The final state of the original computation is always
reachable from the observed state.
What good is a snapshot if that state has never been visited by the system?
- It is relevant for the detection of stable predicates.
- Useful for checkpointing.
What if the channels are not FIFO?
Study how Lai-Yang algorithm works. It does not use any marker
LY1. The initiator records its own state. When it needs to send a message m to another process, it sends a message (m, red).
LY2. When a process receives a message (m, red), it records its state if it has not already done so, and then accepts the message m.
Question 1. Why will it work?
Question 1 Are there any limitations of this approach?
Distributed snapshot = distributed read.
Distributed reset = distributed write
How difficult is distributed reset?