Chapter 21. Asynchronous Network Computing with Process Failures By Sindhu Karthikeyan. 21.1 The Network Model
Asynchronous Network Computing with Process Failures
By Sindhu Karthikeyan.
Theorem 21.1 If A is asynchronous broadcast system with a reliable broadcast channel, then there is an asynchronous send/receive system B with reliable FIFO send/receive channels that has the same user interface as A and that “simulates” A, as follows. For every Execution α of B, there is an execution α‘ of A such that the following conditions hold:
1. α and α‘ are indistinguishable to U (the composition of users Ui ).
2. For each i, a stopi occurs in α exactly if it does in α‘ .
Moreover, if α is fair, then α‘ is also fair.
. System B has one processor Qi for each processor Pi of A.
. Each Qi is responsible for simulating Pi, and participating in the simulation of the broadcast channel.
. Qi simulates a broadcast msg bcast(m)i output of Pi by performing send(m,t)i,j outputs for all j ≠ i, where t = local-integer valued tag (which starts from 1 and increments with each successive broadcast), and also performing an internal step simulating receive(m)i,i.
. If Qi receives a message (m,t) sent by Qj, it helps in simulation of Pj’s broadcast by relaying the message- it sends (m,t,j) to all processors other than i and j.
. Qi collects tagged messages which was broadcasted by Pj, j ≠ i, which are received directly from Qj or by relays.
. Qi is also allowed to perform an internal step simulating a receive(m)j,i, Qi can do this only when Qi has a message (m,t) originally broadcast by Pj, Qi has already relayed (m,t,j) to all processors other than I and j, and Qi has already simulated receivej,i events for message from Pj with all tag values less than t.
3. If a message with tag t is sent by any processor Qi then it must be that message originating at Pi with all the other messages having smaller tag values have been previously sent to all processors.
Impossibility of Agreement in the presence of Faults
Asynchronous network system
For Agreement problem
. Each user Ui has input actions – decide(v)i, and output actions -init(v)i. and Ui is assumed to perform at most one initi action at in any execution .
We consider the following conditions on the combined system consisting of A and Ui.
Well-formedness: In any execution and for any i, the interactions between Ui and A are well-formed for i.
Validity : In any execution, if all init actions that occur contain the same value v, then v is the only possible decision value.
Failure-free-termination: In any fair failure-free execution in which init event occurs on all ports, a decide event occurs on each port.
We say in an Asynchronous network system solves the Agreement problem if it guarantees well-formedness, agreement, validity, and failure-free termination.
f-failure termination, 0 ≤ f≤ n: In any fair execution in which init event occur on all ports, if there are stop events on at most f ports, then a decide event occurs on every non-failing port.
Wait-free termination is defined to be the special case of f-failure termination where f = n.
Proof : The construction begins with a fair failure-free input-first execution with a bivalent initialization .
Then we repeatedly extend the current execution, including at least one step of process 1 in the first extension, then at least one step of 2 in the second extension, and so on, in round-robin order, all while maintaining bivalence and avoiding failures.
The resulting execution is fair, because each process takes infinitely many steps.
But no process ever reaches a decision, which contradicts the failure-free termination requirement.
In the above theorem 21.2 it said that the agreement problem cannot be solved in an Asynchronous network system, even for only a single stopping failure.
The agreement problem can be solved in randomized Asynchronous network, this model is stronger than the ordinary Asynchronous network model, because it allows the processors to make random choices during the computation.
Here the correctness conditions are slightly weaker than the conditions in the ordinary asynchronous network model, all other conditions are guaranteed except for the termination condition is now probabilistic.
All non-faulty processors will decide by time t after the arrival of all inputs, with probability of at least P(t), where P is a particular monotone nondecreasing , unbounded function. This implies eventual termination with probability 1.
. Each process Pi has some local variables x and y, which are initially null.
. An init(v)i input causes process Pi to set x:= v.
. Pi executes a series of stages, each stage consisting of two rounds. Pi begins stage1 after it receives its initial value in an initi input.
. It continues to perform the algorithm even after it decides.
At each stage s ≥ 1, Pi does the following:
Round1 : Pi broadcasts (“first”,s,v), where v is the current value of x, and later on it waits to obtain n – f messages of the form (“first”,s,*). If all of these have the same value v, then Pi sets y = v, else y = null.
Round2 : Pi broadcasts (“second”,s,v), where v is its current value of y , then waits to obtain n – f messages of the form (“second”,s,*).
There are 3 cases:
Proof: For validity let us suppose that all init events that occurred in an execution contains the same value v, then it becomes obvious that any process that completes stage 1 must decide on v in that stage, hence satisfying the validity condition.
Now for agreement suppose that Pi decides v at stage s and no process decides at any other smaller-numbered stages, then Pi receives (n-f) (“second”, s, v) messages. Now this implies that any other process Pj that completes stage s receives at least n-2f (“second”, s, v) messages, since it hears from all but at the most f of the processors that Pi hears from.
So this means that Pj cannot decide on a value which is different from v at stage s.
Since the above is true for all Pj that complete stage s, it states that as in Validity argument, that any process that completes s + 1 must decide v at stage s + 1.
Lemma 21.5 For any adversary and any s ≥ 0, with probability at least 1 – (1 – 1/2n)s, all nonfaulty processes decide within s + 1 stages.
Lemma 21.6 For any adversary and any t ≥ 0, with probability p(t), all non-faulty processors decide within time t after the last init event.
The main correctness result is
Lemma 21.5 The BenOr algorithm guarantees well-formedness, agreement, and validity. It also guarantees that, with probability 1, all nonfaulty processors eventually decide.
Another way for solving this agreement problem in fault-prone asynchronous network, is by strengthing the model by adding a new type of system component known as failure detector. A failure detector is a module that provides information to the process in an asynchronous network about previous process failures.
The simplest Failure detector is a perfect failure detector, which is guaranteed to report only failures that have actually happened and to eventually report all such failures to all other non-failed processes.
Formally, we consider a system A that has the same structure as an asynchronous network system, except that it has additional input actions inform-stopped(j)i for each pair i and j of ports, i ≠ j.
Architecture for asynchronous broadcast system with a perfect failure dtector.
Each process Pi attempts to stabilize two pieces of data:
val(j) = v Є V, it means that Pi knows that Pj’s initial value is v.