Lecture 4 introduction to principles of distributed computing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 62

Lecture 4 Introduction to Principles of Distributed Computing PowerPoint PPT Presentation


  • 49 Views
  • Uploaded on
  • Presentation posted in: General

Lecture 4 Introduction to Principles of Distributed Computing. Sergio Rajsbaum Math Institute UNAM, Mexico. Lecture 4. Consensus in partially synchronous systems, and failure detectors Part I : Realistic timing model and metric Part II : Failure detectors, algorithms

Download Presentation

Lecture 4 Introduction to Principles of Distributed Computing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Lecture 4 introduction to principles of distributed computing

Lecture 4Introduction to Principles of Distributed Computing

Sergio Rajsbaum

Math Institute

UNAM, Mexico


Lecture 4

Lecture 4

Consensus in partially synchronous systems, and failure detectors

  • Part I: Realistic timing model and metric

  • Part II: Failure detectors, algorithms

  • Part III: this is the best possible

  • Part IV: New directions and extensions


Consensus a fundamental abstraction

CONSENSUS A fundamental Abstraction

Each process has an input, should decide an output s.t.

Agreement: correct processes’ decisions are the same

Validity: decision is input of one process

Termination: eventually all correct processes decide

There are at least two possible input values 0 and 1.

all possible vectors over the input values V


The lecture in a nutshell

L2(X0)

L(X0)

X0

The lecture in a nutshell

  • Consensus solvability depends on how long connectivity preserved by a particular model

  • In synchronous it is solvable, in asynchronous not. What about intermediate, more realistic models?

Connectivity

destroyed

Initial states

states after one round

Connectivity

preserved

states after 2 rounds


Basic model

Basic Model

  • Message passing (essentially equivalent to read/write shared memory model)

  • Channels between every pair of processes

  • Crash failures

    t < n potential failures out of n >1 processes

  • No message loss among correct processes


Is consensus solvable if so how long does it take to solve it

Is consensus solvable?If so, how long does it take to solve it?

  • It depends on what exactly the model is

  • But what is a realistic model?

  • And what are the common scenarios within the model? The nature of a distributed system is to include complex combinations of failures and delays


How fast can we solve consensus

How Fast Can We Solve Consensus?

Depends on the timing model:

  • Message delays

  • Processing times

  • Clocks

  • And on the metric used:

    • Worst case

    • Average

    • etc


  • The rest of this lecture

    The Rest of This Lecture

    • Part I: Realistic timing model and metric

    • Part II: Upper bounds

    • Part III: this is the best possible

    • Part IV: New directions and extensions


    Part i realistic timing model

    Part I: Realistic Timing Model


    First two simple models

    First two simple models


    Asynchronous model

    Asynchronous Model

    • Unbounded message delay, processor speed

      Consensus impossible even for t=1 [FLP85]


    Synchronous model

    Synchronous Model

    • Algorithm runs in synchronous rounds:

      • send messages to any set of processes,

      • receive messages from previous round,

      • do local processing (possibly decide, halt)

    Round

    • If process i crashes in a round, then any subset of the messages i sends in this round can be lost


    Synchronous consensus

    Synchronous Consensus

    • In a run with f failures (f<t)

      • Processes can decide in f+1 rounds

        [Lamport Fischer 82; Dolev, Reischuk, Strong 90](early-deciding)

    • 1 round with no failures

    • In this talk deciding

      • halting takes min(f+2,t+1) [Dolev, Reischuk, Strong 90]


    The middle ground

    The Middle Ground

    Many real networks are neither synchronous nor asynchronous

    • During long stable periods, delays and processing times are bounded

      • Like synchronous model

    • Some unstable periods

      • Like asynchronous model


    Partial synchrony model dwork lynch stockmeyer 88

    Partial Synchrony Model [Dwork, Lynch, Stockmeyer 88]

    • Processes have clocks (with bounded drift)

    • D, upper bound on message delay

    • r, upper bound on processing time

    • GST, global stabilization time

      • Until GST, unstable: bounds do not hold

      • After GST, stable: bounds hold

      • GST unknown


    Partial synchrony in practice

    Partial Synchrony in Practice

    • For D, r, choose bounds that hold with high probability

    • Stability forever?

      • We assume that once stable remains stable

      • In practice, has to last “long enough” for given algorithm to terminate

      • A commonly used model that alternates between stable and unstable times:

        Timed Asynchronous Model [Cristian, Fetzer 98]


    Consensus with partial synchrony

    Consensus with Partial Synchrony

    • Solvable

    • requires t < n/2 [DLS88]

      Unbounded running time

      by [FLP85], because model can be asynchronous for unbounded time


    Exercise

    Exercise

    • Prove that consensus is not solvable in the partially synchronous model, if t ≥ n/2

    • Prove that if t<n/2, it takes unbounded running time to be solved


    In a practical system

    In a Practical System

    Can we say more than:

    consensus will be solved eventually ?


    Performance metric

    Performance Metric

    Number of rounds in well-behavedruns

    • Well-behaved:

      • No failures

      • Stable from the beginning

    • Motivation: common case


    The rest of this lecture1

    The Rest of This Lecture

    • Part II: best known algorithms decide in 2 rounds in well-behaved runs

      • 2 time (with delay bound , 0 processing time)

    • Part III: this is the best possible

    • Part IV: new directions and extensions


    Part ii algorithms and the failure detector abstraction

    Part II: Algorithms, and the Failure Detector Abstraction

    II.a Failure Detectors and Partial Synchrony

    -=

    II.b Algorithms


    Time free algorithms

    Time-Free Algorithms

    • Goal: abstract away time, get simpler algorithms

    • We describe the algorithms using failure detector abstraction [Chandra, Toueg 96]


    Unreliable failure detectors chandra toueg 96

    Unreliable Failure Detectors [Chandra, Toueg 96]

    • Each process has local failure detector oracle

      • Typically outputs list of processes suspected to have crashed at any given time

    • Unreliable: failure detector output can be arbitrary for unbounded (finite) prefix of run


    Performance of failure detector based consensus algorithms

    Performance of Failure Detector Based Consensus Algorithms

    • Implement a failure detector in the partial synchrony model

    • Design an algorithm for the failure detector

    • Analyze the performance in well-behaved runs of the combined algorithm


    A natural failure detector implementation in partial synchrony model

    A Natural Failure Detector Implementation in Partial Synchrony Model

    • Implement failure detector using timeouts:

      • When expecting a message from a process i, wait D + r + clock skew before suspecting i

    • In well-behaved runs, D, r always hold, hence no false suspicions


    The resulting failure detector is p eventually perfect

    The resulting failure detector is <>P - Eventually Perfect

    • Strong Completeness: From some point on, every faulty process is suspected by every correct process

    • Eventual Strong Accuracy: From some point on, every correct process is not suspected


    Weakest failure detectors for consensus

    Weakest Failure Detectors for Consensus

    • <>S - Eventually Strong

      • Strong Completeness

      • Eventual Weak Accuracy: From some point on, some correct process is not suspected

    • W - Leader

      • Outputs one trusted process

      • From some point, all correct processes trust the same correct process


    A simple w implementation

    A Simple W Implementation

    • Use <>P implementation

    • Output lowest id non-suspected process

      In well-behaved runs: process 1 always trusted


    Exercise1

    Exercise

    • Write the algorithm code for this failure detector W, and prove it is correct


    Relationships among failure detector classes

    Relationships among Failure Detector Classes

    • <>S is a subset of <>P

    • <>S is strictly weaker than <>P

    • <>S ~ W[Chandra, Hadzilacos, Toueg 96]

      Food for thought:

      What is the weakest timing model where <>S and/or W are implementable but <>P is not?


    Relationships among failure detector classes recent results

    Relationships among Failure Detector Classes- Recent Results

    Partial Answer: In PODC’03 Aguilera et al present a system with synchronous processes S :

    • any number of them may crash, and

    • only the output links of an unknown correct process are eventually timely (all other links can be asynchronous and/or lossy)

      <>P is not implementable in S, W yes

      New proof that: <>S is strictly weaker than <>P


    Note on the power of consensus

    Note on the Power of Consensus

    • Consensus cannot implement <>P, interactive consistency, atomic commit, …

    • So its “universality”, in the sense of

      • wait-free objects in shared memory [Herlihy 93]

      • state machine replication [Lamport 78; Schneider 90]

        does not cover sensitivity to failures, timing, etc.


    Other failure detector implementations

    Other Failure Detector Implementations

    Food for thought:

    When is building <>P more costly than <>S or W?

    Partial answer: Aguilera at al PODC’03 observe

    • any implementation of <>P (even in a perfectly synchronous system) requires all alive processes to send messages forever, while W can be implemented such that eventually only the leader sends messages


    Other failure detector implementations1

    Other Failure Detector Implementations

    • Message efficient <>S implementation [Larrea, Fernández, Arévalo 00]

    • QoS tradeoffs between accuracy and completeness [Chen, Toueg, Aguilera 00]

    • Leader Election [Aguilera, Delporte, Fauconnier, Toueg 01]

    • Adaptive <>P[Fetzer, Raynal, Tronel 01]


    Part ii algorithms and the failure detector abstraction1

    Part II: Algorithms, and the Failure Detector Abstraction

    II.a Failure Detectors and Partial Synchrony

    II.b Algorithms


    Algorithms that take 2 rounds in well behaved runs

    Algorithms that Take 2 Rounds in Well-Behaved Runs

    • <>S-based [Schiper 97; Hurfin, Raynal 99; Mostefaoui, Raynal 99]

    • W-based for t < n/3[Mostefaoui, Raynal 00]

    • W-based for t < n/2[Dutta, Guerraoui 01]

    • Paxos (optimized version) [Lamport 89; 96]

      • Leader-based (W)

      • Also tolerates omissions, crash recoveries

    • COReL - Atomic Broadcast [Keidar, Dolev 96]

      • Group membership based (<>P)


    Of this laundry list we present two algorithms

    Of This Laundry List, We Present Two Algorithms

    • <>S-based [MR99]

    • Paxos


    S based consensus mr99

    <>S-based Consensus [MR99]

    • val  input v; est null

      for r =1, 2, … do

      coord(r mod n)+1

      if I am coord,then send (r,val) to all

      wait for ( (r, val)from coordOR suspect coord (by <>S))

      if receive val from coord then estval elseest null

      send (r, est)to all

      wait for (r,est) from n-t processes

      if any non-null est received thenvalest

      if all ests have same vthen send (“decide”, v) to all; return(v)

      od

    • Upon receive (“decide”, v), forward to all, return(v)

    1

    2


    In well behaved runs

    In Well-Behaved Runs

    1

    1

    1

    decide v1

    (1, v1)

    2

    2

    .

    .

    .

    .

    .

    .

    n

    n

    est = v1

    (1, v1)


    In case of omissions

    In Case of Omissions

    The algorithm can block in case of transient message omissions, waiting for a specific round message that will not arrive


    Paxos lamport 88 96 01

    Paxos [Lamport 88; 96; 01]

    • Uses W failure detector

    • Phase 1: prepare

      • A process who trusts itself tries to become leader

      • Chooses largest unique (using ids) ballot number

      • Learns outcome of all smaller ballots

    • Phase 2: accept

      • Leader proposes a value with his ballot number.

      • Leader gets majority to accept his proposal.

      • A value accepted by a majority can be decided


    Paxos variables

    Paxos - Variables

    • Type Rank

      • totally ordered set with minimum element r0

    • Variables:

      Rank BallotNum, initially r0

      Rank AcceptNum, initially r0

      Value  {^} AcceptVal, initially ^


    Paxos phase i prepare

    Paxos Phase I: Prepare

    • Periodically, until decision is reached do:

      if leader (by W) then

      BallotNum  (unique rank > BallotNum)

      send (“prepare”, rank) to all

    • Upon receive (“prepare”, rank) from i

      if rank > BallotNum then

      BallotNum  rank

      send (“ack”, rank, AcceptNum, AcceptVal) to i


    Paxos phase ii accept

    Paxos Phase II: Accept

    Upon receive (“ack”, BallotNum, b, val) from n-t

    if all vals = ^ then myVal = initial value

    else myVal = received val with highest b

    send (“accept”, BallotNum, myVal) to all /* proposal */

    Upon receive (“accept”, b, v) with b  BallotNum

    AcceptNum  b; AcceptVal  v /* accept proposal */

    send (“accept”, b, v) to all (first time only)


    Paxos deciding

    Paxos – Deciding

    Upon receive(“accept”, b, v) from n-t

    decide v

    periodically send (“decide”, v) to all

    Upon receive (“decide”, v)

    decide v


    In well behaved runs1

    In Well-Behaved Runs

    1

    1

    1

    1

    1

    2

    2

    2

    (“prepare”,1)

    (“accept”,1 ,v1)

    .

    .

    .

    .

    .

    .

    .

    .

    .

    (“ack”,1,r0,^)

    n

    n

    n

    (“accept”,1 ,v1)

    Our W implementation

    always trusts process 1

    decide v1


    Optimization

    Optimization

    • Allow process 1 (only!) to skip Phase 1

      • use rank r0

      • propose its own initial value

    • Takes 2 rounds in well-behaved runs

    • Takes 2 rounds for repeated invocations with the same leader


    What about message loss

    What About Message Loss?

    • Does not block in case of a lost message

      • Phase I can start with new rank even if previous attempts never ended

    • But constant omissions can violate liveness

    • Specify conditional liveness:

      If n-t correct processes including the leader can communicate with each other

      then they eventually decide


    Synchronous consensus1

    Synchronous Consensus

    • In a run with f failures (f<t)

      • Processes can decide in f+1 rounds

      • And no less !

        [Lamport Fischer 82; Dolev, Reischuk, Strong 90](early-deciding)

    • 1 round with no failures

    • In this talk deciding

      • halting takes min(f+2,t+1) [Dolev, Reischuk, Strong 90]


    Uniform consensus

    Uniform Consensus

    • Uniform agreement: decision of every two processes is the same

      Recall: with consensus, only correct processes have to agree (disagreement with the dead is OK)

      This version of consensus will be useful to extend the lower bound argument to asynchronous models


    Synchronous uniform consensus

    Synchronous Uniform Consensus

    Every algorithm has a run with f failures (f<t-1), that takes at least f+2 rounds to decide

    • [Charron-Bost, Schiper 00; KR 01]

      • as opposed to f+1 for consensus


    A simple proof of the uniform consensus synchronous lower bound keidar rajsbaum ipl 02

    A Simple Proof of the Uniform Consensus Synchronous Lower Bound[Keidar, Rajsbaum IPL 02]


    Theorem f 2 lower bound

    Theorem: f+2 Lower Bound

    • Assume n>t, and f < t-1

    • Lf(X0) - final states of runs with f failures

      • connected

      • in any state in Lf(X0) exist at least 3 non-failed processes and 2 can fail

    • Take z, z’X0 s.t. val(z)val(z’),

      • let x, x’ be failure-free extensions of z, z’: x=z.(i,[0])f  Lf(X0)


    Exercise2

    Exercise

    • Consider Modify the theorem and the proof of this talk for the consensus problem (instead of the uniform consensus problem)


    Upper bounds from part ii

    Upper Bounds From Part II

    We saw that there are algorithms that take 2 rounds todecide in well-behaved runs

    • <>S-based, W-based, Paxos, COReL

    • Presented two of them.


    Why are there no 1 round algorithms

    Why are there no 1-Round Algorithms?

    There is a lower bound of 2 rounds in well-behaved executions

    • Similar bounds shown in [Dwork, Skeen 83; Lamport 00]

  • We will show that the bound follows from a similar bound on Uniform Consensus in the synchronous model


  • Uniform consensus1

    Uniform Consensus

    • Uniform agreement: decision of every two processes is the same

      Recall: with consensus, only correct processes have to agree


    From consensus to uniform consensus

    From Consensus to Uniform Consensus

    In partial synchrony model, any algorithm A for consensus solves uniform consensus[Guerraoui 95]

    Proof: Assume by contradiction that A does not solve uniform consensus

    • in some run, p,q decide differently, p fails

    • p may be non-faulty, and may wake up after q decides


    Synchronous uniform consensus1

    Synchronous Uniform Consensus

    Every algorithm has a well-behaved run that takes 2 rounds to decide

    • More generally, it has a run with f failures (f<t-1), that takes at least f+2 rounds to decide[Charron-Bost, Schiper 00; KR 01]

      • as opposed to f+1 for consensus


    Bibliography

    Bibliography

    • Keidar and Rajsbaum, “A Simple Proof of the Uniform Consensus Synchronous Lower Bound,” in IPL, Vol. 85, pp. 47-52, 2003.

    • Keidar and Rajsbaum, “Onthe Cost of Fault-Tolerant Consensus When There Are No Faults” in Keidar’s page, including slides and papers.

    • Moses, Rajsbaum, “A Layered Analysis of Consensus,” SIAM J. Comput. 31(4): 989-1021, 2002.

    • Mostéfaoui, Rajsbaum, Raynal: Conditions on input vectors for consensus solvability in asynchronous distributed systems. J. ACM, 2003


    Lecture 4 introduction to principles of distributed computing

    End of Lecture 4


  • Login