- 156 Views
- Uploaded on
- Presentation posted in: General

Reaching Consensus: Why it can’t be done

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Reaching Consensus:Why it can’t be done

For Distributed Algorithms 2014

Presentation by Ziv Ronen

Based on “Impossibility of Distributed Consensus with One Faulty Process” By:

Michael J. Fischer, Nancy A. LynchMichael S. Paterson

- The problem
- Why the problem is unsolvable
- If time allow: how to solve the problem with initial faulty processors

- Consensus in the real world
- Our mission
- Model:
- Objectives
- Network
- Possible faults

- There are many cases when we want that several processors agree on an action.
- Usually, it is more important that all processors will agree on the same action then which action will be chosen.
- For example, if we have a database, we will want that any transaction will be committed by all processors or by none of them.

- Such agreement in fault free network is trivial.
- For instance, we can choose a leader that tell all the other what to do.

- However, real world processors are subject to failures
- They might stop working (good case).
- They might go haywire (bad case).
- They might become malevolent (worse case).

- We will want to find an algorithm that, for any decision in every network, will choose a single action to perform.
- However, we want that there will be at least two options, and that both of them can actually happen.

- We will work on a simplified problem, in which the processors only need to agree on a number that can be either 1 (commit) or 0 (discard).
- Initially Each processor chooses is initial number randomly (simulate decisions based on the system condition).
- 1 if can commit, 0 if can’t.

- Each processor need to choose an action. After the action was chosen, it can’t be redone
- In the end, all the processors need to agree on action, meaning they all choose 1 or 0

- We will required that the algorithm could return both 1 and 0 (maybe for different cases).
- So “always discard” or “always commit” is not a possible policy for our data base.

- We will assume fully asynchronic network
- If we send a message to a non-faulty processor, it will reach it after finite, unbounded time.

- We will also assume the network is fully connected.For generality we will also assume full knowledge of direction
- so any other topology can be simulated.

tick

synchronic

asynchronic

P1

P1

M2

M2

P2

P2

tick

synchronic

asynchronic

P1

P1

M2

M2

P2 is faulty!

P2

P2

- We will assume that the processors can only stop working entirely.
- We will also assume that only a single processor can malfunction in any given run.
- However, we will assume that:
- Other processors can’t tell that a processor stop working.
- A processor can fail at any given time.

- N≥2 processors.
- For each processor:
- Input value Xp{0,1}, part of the problem input.
- Output value yp{0,1,b}, initially b, can only change ones.
- Infinite storage

- Messages are of the form (p,m) where p is the target processor and m is the message. Any processor can send such message to any other processor.
- We will assume that every message stay in a “messages buffer” between the time it was send and received.
- Initially, the buffer is empty.

- Goal: at the end, for each p1,p2: yp1 = yp2 ≠b

Messages buffer

1

X1=1

Y1=b

2 X2=0

Y2=b

3

X3=1

Y3=b

4

X4=0

Y4=b

Messages buffer

1

X1=1

Y1=b

2 X2=0

Y2=0

2,m1

4,m2

4,m3

3

X3=1

Y3=0

4

X4=0

Y4=b

2,m2

2,m3

Messages buffer

1

X1=1

Y1=0

2 X2=0

Y2=0

2,m1

4,m2

2,m3

3

X3=1

Y3=0

4

X4=0

Y4=0

- Intuition
- Proof
- Definitions
- Lemma 1
- Lemma 2
- Lemma 3

- Let show the intuition for why this is an impossible task.
- I will demonstrate on the problem of database consensus.
- All the databases should have output value 1 if all workingdatabases have input value 1.
- All the databases should have output value 0 if at least one workingdatabase have input value 0.
- In this case, working mean not failing at the beginning of the algorithm.

- We will choose an initial state where both results are possible.
- In our case, if processor 1 failed during the algorithm, the result might be 1.
- Otherwise, the result should be 0.

1

X1=0

Y1=b

2 X2=1

Y2=b

3

X3=1

Y3=b

4

X4=1

Y4=b

- If 1 sent is first message:
- All processors know that it can’t commit .
- The algorithm should decide 0.

1

X1=0

Y1=b

2 X2=1

Y2=b

I failed to commit

0

3

X3=1

Y3=b

4

X4=1

Y4=b

- If 1 failed before sending this message,
the algorithm should decide without him.

- Since all other processor can commit, the algorithm should decide 1.

1

X1=0

Y1=b

2 X2=1

Y2=b

1

3

X3=1

Y3=b

4

X4=1

Y4=b

Z

- Let say that a processor “quasi failed” if:
- It may be alive or dead.
- If he is alive, he will execute its next step after the algorithm “finished” without him.

Z

1

X1=0

Y1=b

Processor

Schrödinger's cat

1

X1=0

Y1=b

1

X1=0

Y1=b

Z

- If 1 quasi failed:
- The algorithm have 3 choices:

Z

1

X1=0

Y1=b

2 X2=1

Y2=b

3

X3=1

Y3=b

4

X4=1

Y4=b

Z

- Decide 0.
- In this case, if processor one actually failed:
- The result will be wrong!

Z

1

X1=0

Y1=b

2 X2=1

Y2=b

0

3

X3=1

Y3=b

4

X4=1

Y4=b

Z

- Decide 1.
- In this case, if the processor wake up:
- The result will be wrong!

Z

1

X1=0

Y1=b

2 X2=1

Y2=b

1

3

X3=1

Y3=b

4

X4=1

Y4=b

Z

- Not deciding.
- In this case, if the processor actually failed:
- The algorithm will never decide.

Z

1

X1=0

Y1=b

2 X2=1

Y2=b

?

3

X3=1

Y3=b

4

X4=1

Y4=b

- There is an initial state where both answers are possible (Lemma 2).
- There is an event in a specific processor (in our case, processor 1 starts working and sending its message) that is occurrence, No matter when(Lemma 1),determine the outcome.
- If a processor quasi-fail, we can’t decide (because the answer depend on whether he actually fail, and we can’t know that).
- If we will not decide, then we will reach another one of those state (Lemma 3) and be stuck forever.

- Remember that in the example, we forced them to agree according to some policy. In the real problem (and in the following proof) we just need them to agree on the same value, no matter which.

- Configuration: the combination of the internal state (input, output, memory) for each processor and the messages in the buffer.
- Step: an action of on processor. For processor p, consists of:
- Try receiving a message (removing it from the messages buffer). If succeed, receive (p,m). If failed, receive (p,).
- Conduct computation. May send any finite amount of messages

Step 1

2,m1

Messages buffer

1

X1=1

Y1=b

2 X2=0

Y2=b

Y2=1

2,m1

2,m1

Step 2

3

4

- Event e=(p,m): the receiving of message m by p
- Since our processors are deterministic, the change of the configuration by step is depend only on the received message.
- The event e=(p,) is always possible for any p.

- e(C): the configuration reached from C by the event e.
- Schedule: a finite or infinite sequence σ of events.
- σ(C): The final configuration from initial configuration C

(1,)

(2,m1)

2,m1

Messages buffer

1

X1=1

Y1=b

2 X2=0

Y2=b

Y2=1

2,m1

2,m1

3

4

σ =((1,),(2,m1))

- Reachable: configuration C is reachable from C’ if schedule σ exists so: σ(C’) = C
- Accessible configuration: Configuration C is accessible if exists an initial configuration C’ so C is reachable from C’.
- DV(C): The set {v|v≠b and p:v=yp}, or the values that were chosen by some processor.
- A protocol is partially correct if:
- If configuration C is accessible, |DV(C)|≤1
- Two accessible configurations C,C’ exists so: DV(C)={0}, DV(C’)={1}

2,m1

Messages buffer

1

X1=1

Y1=b

Y1=0

2 X2=0

Y2=b

Y2=1

2,m1

2,m1

3

4

DV(C)={}

DV(C)={0}

DV(C)={0,1}

- Nonfaulty: processor is nonfaulty if it take infinite number of steps.
- Faulty: a Non-Nonfaulty processor (stop taking step after some time).
- Admissible: a run is admissible if it contain at most one faulty processor and the messages buffer is fair.
- Deciding: a run is deciding if eventually for some processor p, yp≠b
- A protocol P is totally correct in spite of one fault if:
- P is partially correct.
- Every Admissible run in P is deciding run

- No consensus protocol is totally correct in spite of one fault
- We will assume the contrary: assume protocol P’ is totally correct in spite of one fault

- For any two disjoint finite schedule σ1,σ2 and initial configuration C exists: σ1(σ2(C)) = σ2(σ1(C))
- Disjoint: involving different processors.

- Proof:
- From the system definition, since σ1,σ2 don’t interact.

2,m1

1,m2

1,m3

Messages buffer

1

X1=1

Y1=b

2 X2=0

Y2=b

Y2=1

2,m1

2,m1

Sequence 1

Sequence 2

1,m2

4,m4

4,m5

1,m3

3

4

4,m4

4,m5

2,m1

1,m2

1,m3

Messages buffer

1

X1=1

Y1=b

2 X2=0

Y2=b

Y2=1

2,m1

2,m1

Sequence 1

Sequence 2

1,m2

4,m4

4,m5

1,m3

3

4

4,m4

4,m5

Normal order:

Opposite order:

- Let FDV(C) be the union of DV(C’) for each C’ reachable from C.
- If FDV(C) = {0,1}, C is bivalent.
- If |FDV(C)|=1, C is univalent.
- If FDV(C) = {0}, C is 0-valent.
- If FDV(C) = {1}, C is 1-valent.
- P’ is totally correct, so FDV(C) ≠.

- Intuitively, FDV(C) the possible decisions from configuration C.

- Lemma: There is a bivalent initial configuration.

- Assume otherwise:
- From partial correctness, P’ have both 0-valent and 1-valent initial configurations.
- Let call two initial configurations adjacent if they differ only by a single processor input value.
- Any two initial configurations can be joined by a chain of adjacent configuration.
- Hence, there are two adjacent 0-valent and 1-valent initial configurations.

explanation

- Remainder 1: there are two adjacent 0-valent and 1-valent initial configurations.
- Let call them C0, C1 accordingly.

- C0, C1 are adjacent, so there is only one processor, p, that has different input value between them.
- Remainder 2: P’ is totally correct in spite of one fault.
- So P’ should reach a decision even if a processor fail.

- Let R be an admissible run from C0 where p fail. From totally correctness in spite of one fault, R must reach a deciding run. Let σ be the corresponding schedule.
- If 1DV(σ(C0)) , then 1FDV(C0), but C0 is0-valent. So 1DV(σ(C0)), therefore DV(σ(C0))={0}
- However, since the only different between C0, C1 is p and p fail, σ is legal on C1 and σ(C0)σ(C1) (equal except p, which fail and therefore didn’t decide) and so DV(σ(C0))=DV(σ(C1)) ={0}, 0FDV(σ(C1)), but C1 is 1-valent.

- For any configuration C and event e=(p,m) so e(C) is legal, Let Rne(C) be the set of all configuration reachable from C without applying e.
- Note that e can be applied on any C’Rne(C)

- Let eR(C) be {e(C’)| C’Rne(C)}
- Let two configuration, C,C’ be called neighbors if one is reachable from the other in a single step.
- Equivalent to saying that an event e exists such that C’=e(C) or C=e(C’)

- If C is bivalent then for each e=(p,m), eR(C) contain bivalent state.

- Let assume that every DeR(C) is univalent.
- C is bivalent, and therefore, for any i{0,1} exists a i-valent configuration Ei that is reachable from C. Let σibe a schedule that fulfill Ei=σi(C).
- let the configuration Fi be:
- If eσi, Fi=e(Ei)
- If eσi, then σi=σi‘(e(σi‘’)). Fi=e(σi‘’(C))

- In both cases, FieR(C), and therefore Fiis i-valent
- Since either Fi is reachable from Ei or vice-versa.

- So, eR(C) contain both 0-valent and 1-valent configuration.
- By easy induction on the length of the schedule to Fi (when e(C) is j-valent for j≠i) there exists two neighbors C0, C1 so Di =e(Ci) is i-valent for i{0,1}.
- Without loss of generality, assume C1=e’(C0)

0-valent

1-valent

e

e

C=C0

C1

F1

0-valent

Induction

0-valent

0-valent

1-valent

e

e

e

e

C

C

C0

C1

F1

bivalent

0-valent

0-valent

e

e

e

C

R

F1

e(R)eR(C), e(R) is bivalent, contradiction

- Remainders:
- e=(p,m).
- C0, C1 are neighbors.
- Di =e(Ci) is i-valent for i{0,1}.
- C1=e’(C0).
- Lemma1: If two schedules are disjoints, you can execute them in any order.

- Let e’=(p’,m’).
- If p’≠p: the schedules σ=(e), σ’=(e’) are disjoints, So by lemma1: D1=e(e’(C0))=σ(σ’(C0))=σ’(σ(C0))=e’(e(C0))=e’(D0).But then 1FDV(D0), contradiction.
- If p’=p: so lets look on a finite, deciding run when where p take no step. Since it mimic a single fault (quasi-fail) in p, and P’ is totally correct in spite of one fault, there is such run.

From “Impossibility of Distributed Consensus with One Faulty Process” By: Michael J. Fischer, Nancy A. Lynch, Michael S. Paterson

- A deciding run Where p quasi-fail:
- Let σbe the corresponding schedule.
- Let A=σ(C).
- A is deciding configuration, meaning |DV(A)|>0 and therefore |FDV(A)|=1(from partly correctness of P’)
- σ‘=(e’,e), σ‘’=(e) are disjoint from σ, since σ contain no event with p (p quasi-fail), and σ‘, σ‘’ contain only event with p (since p=p’).

- A deciding run Where p quasi-fail:
- Let σbe the corresponding schedule.
- Let A=σ(C).
- A is deciding configuration, meaning A is univalent(from partly correctness of P’)
- σ‘=(e’,e), σ‘’=(e) are disjoint from σ, since σ contain no event with p (p quasi-fail), and σ‘, σ‘’ contain only event with p (since p=p’).

- From lemma1: e(A)=σ’’(σ(C0)) = σ(σ’’(C0)) = σ(e(C0))= σ(D0),0FDV(A)
- From lemma1: e(e’(A))=σ’(σ(C0)) = σ(σ’(C0)) = σ(D1), 1FDV(A)
- But now A is bivalent, contradiction!

From “Impossibility of Distributed Consensus with One Faulty Process” By: Michael J. Fischer, Nancy A. Lynch, Michael S. Paterson

From Lemma 1

From “Impossibility of Distributed Consensus with One Faulty Process” By: Michael J. Fischer, Nancy A. Lynch, Michael S. Paterson

From Lemma 1

Two configuration That are reachable from A

A Bivalent butσ is deciding

- In order to finish the proof, we will now show an execution that never reach a decision.
- Remainder:
- A protocol P is totally correct in spite of one fault if:
- P is partially correct.
- Every Admissible run in P is deciding run

- A run is admissible if it contain at most one faulty processor and the messages buffer is fair.
- a run is deciding if eventually for some processor p, yp≠b(And therefore, reaching an univalent configuration).

- A protocol P is totally correct in spite of one fault if:
- We will assume that P is partially correct and find an Admissible run that is not deciding

- First, we will define a way to assure that the run is Admissible. Let have a queue of the processors and define stages in the following way:
- The stage end when a the first process in the process queue receive the earliest message sent to it (or no message if none was sent).
- At the end of stage, the processor is removed from the head of the queue and enter the tail.

- Since each stage end with the next processor in the queue and with the earliest message sent to it, infinite stages will mean:
- Infinite step in each processor
- Every message will eventually be received.

- Therefore, the run will be admissible.

Message at place j will be sent after at most N * j stages (4 * 3 = 12)

Processor in the j entry will run after at most j stages (3)

1

2

3

4

stage

1

2

3

4

stage

1

2

3

4

stage

1

2

3

4

stage

1

2

3

4

1

2

3

4

- We will assume that P is partially correct and find an Admissible run that is not deciding.
- Now, let make sure that it is not deciding:
- Start from a bivalent configuration C (Lemma2)
- Let e denote the first message in the message queue for the first processor in the processors queue. There is a bivalent configuration C’ reachable from C by a schedule that end by e (Lemma3).
- C = C’ (stage end).
- Return to step 2.

- We will assume that P is partially correct and find an Admissible run that is not deciding.
- Since each stage end in bivalent configuration, the run is not deciding.

- Therefore, P is not totally correct!
Q.E.D

THE END!

Question?

exit

Initially dead processors

0-valent

1-valent

1

X1=1

Y1=b

2 X2=1

Y2=b

1

X1=0

Y1=b

2 X2=0

Y2=b

3

X3=1

Y3=b

4

X4=0

Y4=b

3

X3=0

Y3=b

4

X4=1

Y4=b

0-valent

1-valent

?-valent

1

X1=1

Y1=b

2 X2=1

Y2=b

1

X1=0

Y1=b

2 X2=1

Y2=b

1

X1=0

Y1=b

2 X2=0

Y2=b

3

X3=1

Y3=b

4

X4=0

Y4=b

3

X3=1

Y3=b

4

X4=0

Y4=b

3

X3=0

Y3=b

4

X4=1

Y4=b

0-valent

1-valent

1-valent

1

X1=1

Y1=b

2 X2=1

Y2=b

1

X1=0

Y1=b

2 X2=1

Y2=b

1

X1=0

Y1=b

2 X2=0

Y2=b

3

X3=1

Y3=b

4

X4=0

Y4=b

3

X3=1

Y3=b

4

X4=0

Y4=b

3

X3=0

Y3=b

4

X4=1

Y4=b

0-valent

1-valent

0-valent

1

X1=1

Y1=b

2 X2=1

Y2=b

1

X1=0

Y1=b

2 X2=1

Y2=b

1

X1=0

Y1=b

2 X2=0

Y2=b

3

X3=1

Y3=b

4

X4=0

Y4=b

3

X3=1

Y3=b

4

X4=0

Y4=b

3

X3=0

Y3=b

4

X4=1

Y4=b

0-valent

1-valent

0-valent

1

X1=0

Y1=b

2 X2=1

Y2=b

1

X1=0

Y1=b

2 X2=0

Y2=b

3

X3=1

Y3=b

4

X4=0

Y4=b

3

X3=0

Y3=b

4

X4=1

Y4=b

0-valent

1-valent

0-valent

1

X1=0

Y1=b

2 X2=0

Y2=b

1

X1=0

Y1=b

2 X2=0

Y2=b

3

X3=0

Y3=b

4

X4=0

Y4=b

3

X3=0

Y3=b

4

X4=1

Y4=b

0-valent

1-valent

1

X1=0

Y1=b

2 X2=0

Y2=b

1

X1=0

Y1=b

2 X2=0

Y2=b

3

X3=0

Y3=b

4

X4=0

Y4=b

3

X3=0

Y3=b

4

X4=1

Y4=b

- Assume:
- N processors.
- At least L= (The majority) processors are alive.
- The processors don’t know who is alive.

- We want to reach a consensus.

- In the first stage, we will build a distributed directed graph G.
- The graphs will be built in the following way:
- Each processor have a corresponding node.
- Each processor send its id to any other processor.
- Each processor will wait for messages from L-1 other processors.
- If a message from processor i reach processor j, an edge (i,j) will be added to the graph.

3

2

4

1

5

7

6

2

3

1

4

7

5

6

2

3

1

4

7

5

6

2

3

1

4

7

5

6

- In the second stage, we will build a graph G+ which is the transitive closure of G, so that every processor know about enough of thegraph.
- The graphs will be built in the following way:
- Each processor send to all the other its:
- id.
- Initial value.
- L-1 neighbors.
- Each processor wait until it received such message from all its ancestors.

2

3

2, x2, (3,4,5)

1

4

7

5

6

3,x3,[2,4,5]

2

3

4,x4,[2,3,5]

1

4

5,x5,[2,4,6]

7

5

6

2

3

1

4

7

5

6

2

3

1

4

6,x6,[2,3,5]

7

5

6

2

3

1

4

7

5

6

2

3

1

4

7

5

6

2

3

1

4

7

5

6

2

3

1

4

7

5

6

- Claim: G+ contain 1, and only one, clique of size L or more that is not fully contained in other clique.
- Proof by the following steps. contain at least one:
- For each k < N, because the in-degree of each node in G is L-1, if G contain a path of size k then:
- G contain a cycle of size at least L.
or

- Gcontain a path of size k+1

- G contain a cycle of size at least L.
- Corollary: G contain a path of size N, it contain a cycle of size at least L (because option 2 is not possible).
- Corollary: G contain a cycle of size at least L.
- Since G+ is a transitive closure of G, if G contain cycle of size k then G+ contain a clique of size k.

- For each k < N, because the in-degree of each node in G is L-1, if G contain a path of size k then:

L-1

…

A1

At least

L-2

…

L-2

…

1

1A

A2

At most

1

At least

L-3

…

At least

L-4

…

L-2

…

1

1

1

A1

A2

A3

At most

1

At most

2

At least

L-i

…

At least

0

…

L-2

…

At least

0

…

Path of size L

A1

A L-1

A L

Ai

…

…

1

1

1

…

At most

L-2

…

At most

L-1

…

At most

i-1

Path of size k-(L-1)

Path of size (L-2)

A

…

At most

L-2

Path of size k-(L-1)

Path of size (L-2)

A

…

At most

L-2

B

Path of size k-(L-1)

Path of size (L-2)

A

…

At most

L-2

…

Aj

…

Ai

…

A L

A1

- Contain at most one clique:
- If contain two, since L is the majority of node, then there is a node in both clique.
- From transitive, the node set that is a union of the nodes in both clique is a clique.

Transitivity

i

j

- Claim: each living processor know about the clique.
- That because each node in the graph is a child of a processor in the clique, and therefore all nodes in the clique are ancestor of it and he will wait for them.

- The consensus: Let f be any function of the form f:({0,1} X 2|V|)->{0,1}, f known by all processor (part of there state). Then f(Unique Clique) is a binary value known by all processors.
- Consensus is reached!

THE END!

Question?

exit