- 47 Views
- Uploaded on
- Presentation posted in: General

Applications of Probabilistic Quorums to Iterative Algorithms

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Applications of Probabilistic Quorums to Iterative Algorithms

HyunYoung Lee, University of Denver

Jennifer L. Welch, Texas A&M University

presented at ICDCS 2001

- The Probabilistic Quorum Algorithm (PQA)
- Abstracting PQA into Random Register (RR)
- Using RRs in Iterative Convergent Algorithms
- Monotone RRs and their Performance
- Simulation Results
- Conclusions

- The Probabilistic Quorum Algorithm (PQA)
- Abstracting PQA into Random Register (RR)
- Using RRs in Iterative Convergent Algorithms
- Monotone RRs and their Performance
- Simulation Results
- Conclusions

- Provides illusion of shared variables for inter-process communication on top of a message-passing distributed system
- Benefits of shared memory paradigm:
- familiar from uniprocessor case
- supports good software development practice

- Examples: Treadmarks [Amza+], DASH [Gharachorloo+], ...

app

proc r

app

proc 1

read(Y)

return(Y,5)

write(X,3)

ack(X)

client r

client 1

send

recv

send

recv

network

Implements

shared variables

X, Y, Z, ...

recv

send

recv

send

server 1

server n

- Keep a copy of shared variable at nreplica servers that communicate by messages.
- A quorum is a subset of replica servers.
- To write: client updates copies in a quorum with new value plus timestamp.
- To read: client receives copies from a quorum and returns value with latest timestamp.

- To ensure each read obtains latest value written, every read quorum must intersect every write quorum.

4,9:00

10,8:00

4,9:00

a write quorum

4,9:00

12,7:00

a read quorum

- Availability: minimum number of servers that must fail to disable every quorum [Peleg & Wool].
- Optimal (largest) availability is (n).
Achieved when every set of size n/2 +1 is a quorum.

- Optimal (largest) availability is (n).
- Load: probability of accessing the busiest server, in the best case [Naor & Wool].
- Optimal (smallest) load is (1/n).
Tradeoff Theorem [Naor & Wool]: For any quorum system, if load is optimal (1/n), then availability is at most (n).

- Optimal (smallest) load is (1/n).

- Relax requirement that every read quorum overlap every write quorum.
- Instead, choose each quorum uniformly at random from the set of all k-sized subsets of the n replica servers, for k < n/2.
Theorem: If k = (n), then

- availability is n - k = (n)
- load is (1/n)

- To handle server failures: keep trying until enough responses to form a quorum are received.

4,9:00

10,8:00

12,7:00

A read quorum

4,9:00

12,7:00

Drawback: A read quorum might not overlap the most recent write quorum, causing a read to return an out-of-date value.

a write quorum

Theorem: Probability of not overlapping is < e-h2, when k = hn.

- What are the semantics of the shared variable (register) implemented by the PQA?
- What kind of applications can tolerate reads returning, with low probability, out-of-date values?

- The Probabilistic Quorum Algorithm (PQA)
- Abstracting PQA into Random Register (RR)
- Using RRs in Iterative Convergent Algorithms
- Monotone RRs and their Performance
- Simulation Results
- Conclusions

W1(c)

W 2(b)

W3(a)

W4(c)

One writer and multiple readers.

[R1] Every read or write invocation has a response.

[R2] Every read Rreads from some write W:

(1) W begins before R ends.

(2) R’s value is same as W ’s value.

(3) W is latest such write.

R(c)

[R3] For every finite execution ending with a write W, probability that W is read from infinitely often is 0 (over all extensions with an infinite number of writes).

Related Work:

- Most work on randomized shared objects concerns termination, not correct responses.
- [Afek+] and [Jayanti+] assumed a fixed subset of shared objects that can return incorrect values.

Theorem 1: PQA implements an RR.

Proof:

[R1]: Each invocation gets a response since no lost messages and only crash failures of servers.

[R2]: Each read reads a value written by a previous or overlapping write, since no data corruption.

[R3]: Show probability that at least one replica in a write quorum is never overwritten is 0:

Pr( 1 replica survives h writes )

k Pr( replica j survives h writes )

= k Pr( j Q1 … j Qh )

= k hi=1 Pr( j Qi )

= k((n-k)/n)h

0 as h .

- The Probabilistic Quorum Algorithm (PQA)
- Abstracting PQA into Random Register (RR)
- Using RRs in Iterative Convergent Algorithms
- Monotone RRs and their Performance
- Simulation Results
- Conclusions

- Repeatedly apply a function to a vector to produce another vector until reaching a fixed point.
- Responsibility for vector components is distributed across several processes.
- Vector component updates are based on possibly out-of-date views of the vector components.

Requirements:

[A1]: All views come from the past.

[A2]: Every component is updated infinitely often.

[A3]: Each view is used only finitely often.

time

vector components

0

Red views are updated

ones.

1

2

Arrows indicate views

used in last update.

3

[A1], [A2], [A3] are equivalent to the existence of a partition of the update sequence into pseudocycles (p.c.’s):

- at least one update per component, and
- every view used was created in current or previous p.c.

X

p.c. i -1

p.c. i

Theorem [UD]: Sufficient condition on F for convergence to fixed point, if update sequence satisfies [A1]-[A3]: There exists integer M and sequence of sets D0, D1,… such that

- each DK is Cartesian product of m sets (independence)
- D0 D1 … DM = DM+1 = …= { fixed point }
- If x DK, then F(x) DK+1 for all K.

m-vector

...

DM

DM-1

D1

D0

fixed point

- G is weighted directed graph with n nodes.
- Compute n x n vector x; process i updates i-th row of x, 1 in.
- Initially x is adjacency matrix for G.
- F(x) computes y, where yij = min 1 kn { xik + xkj}.
Shown to be an ACO by [UD].

Claim: Worst-case number of pseudocycles for F to converge is log2 diameter(G).

Theorem 2: If F is an ACO, then every iterative execution using RRs for the vector components converges with probability 1.

Proof: Show the sequence of updates in the execution satisfies [A1], [A2] and [A3] with probability 1.

[A1]: All views are from the past by [R2].

[A2]: Application ensures every component is updated i.o.

[A3] holds with probability 1: Each view is used finitely often with probability 1 by [R3].

- RRs can be used to implement any ACO, which includes algorithms for
- APSP
- transitive closure
- constraint satisfaction
- solving system of linear equations

- If PQA is used for the RRs, improved load and availability are provided.
- Convergence is guaranteed with probability 1.
- But how long does it take to converge?

- A round finishes when every process has
- read all the vector components
- applied the function
- updated its own vector components
at least once.

How many (expected) rounds per p.c.?

We don’t know with current RR definition, so modify definition...

- The Probabilistic Quorum Algorithm (PQA)
- Abstracting PQA into Random Register (RR)
- Using RRs in Iterative Convergent Algorithms
- Monotone RRs and their Performance
- Simulation Results
- Conclusions

[R1] - [R3] plus

[R4]: If read R by process i reads from W and a later read R' by i reads from W' , then W' does not precede W.

R(c)

R'(b)

X

W '(b)

W(c)

[R5]: There exists q s.t. for all r,

Pr[r reads are needed until W or a later write is read from] (1 - q)r-1q.

So q is the probability of a “successful” read (w.r.t. W).

(

)

n

k

Same as previous probabilistic quorum algorithm, except:

- Read client keeps track of value with latest timestamp that it has seen so far.
- This value is returned if its timestamp is later than all those obtained from current quorum.
Theorem 3: Attains q = 1 -

W ’s or later value is read if a subsequent read quorum overlaps W ’s quorum.

(

)

n - k

k

Theorem 4: Expected number of rounds per pseudocycle, when implementing an ACO with monotone RRs, is at most 1/q.

Proof: For p.c. h to end, each process i must read from a write first write in p.c. h-1.

Once this read occurs for i, every later read by i is at least as recent, since monotone.

Expected # rounds for first read is 1/q by [R5].

Corollary: For monotone PQA, expected # rounds per p.c. is (1 - ((n-k)/n)k)-1.

Expression is between 1 and 2 when k = n.

Strict quorum system has 1 round per p.c.

Monotone PQA has > 1 expected round per p.c. but may have fewer messages per p.c.

Which has better message complexity?

Messages per round in synchronous case:

- Each of the m vector components is read by each of the p processes and written by one.
- Each operation generates two messages to each of the k quorum members.
2m(p+1)k.

MPQA: When k = n, expected # messages per p.c. is c2m(p+1)n, 1 < c < 2.

Recall when k = n, expected # messages per p.c. for MPQA is c2m(p+1)n, 1 < c < 2.

- High availability (n):
- Strict: k = n/2 + 1, so # messages per p.c. is 2m(p+1)(n/2 +1). Worse.

- Low load (1/n):
- Strict: k = n (e.g., rows and columns of grid), so # messages per p.c. is 2m(p+1)n. Asymptotically same.

- The Probabilistic Quorum Algorithm (PQA)
- Abstracting PQA into Random Register (RR)
- Using RRs in Iterative Convergent Algorithms
- Monotone RRs and their Performance
- Simulation Results
- Conclusions

Simulated non-monotone and monotone RR implementations using PQs with APSP application to study:

- difference between synchronous and asynchronous cases
- expected convergence time in non-monotone case (no analysis)
- actual expected convergence time in monotone case compared to computed upper bound

Input graph:

log2 33 = 6 pseudocycles to converge.

Measured rounds till convergence (when simulated results equaled precomputed actual answer).

Each plotted point is average of 7 runs.

...

1

1

1

1

2

34

Computed upper

bound is not tight.

Synch & asynch are

very similar.

Monotone is better

than non-monotone.

- The Probabilistic Quorum Algorithm (PQA)
- Abstracting PQA into Random Register (RR)
- Using RRs in Iterative Convergent Algorithms
- Monotone RRs and their Performance
- Simulation Results
- Conclusions

- Proposed two specifications of randomized shared variables that can return wrong answers, monotone and non-monotone random read-write registers.
- Both specs can be implemented with PQA of [MRW].
- Our specs can be used to implement a significant class of iterative convergent algorithms, characterized by [UD]; algorithms converge with probability 1.
- Computed bounds on convergence time and message complexity for ACOs in monotone case.
- Simulation results indicate monotone is faster than non-monotone, asynch and synch are similar, and computed upper bound is not tight.

- Are our specs of more general interest? Other good algs that implement them? Different specs better?
- Useful applications for other shared data structures (e.g., stack) with errors? How to specify and implement them?
- How to tolerate client failures? Approximate agreement as an application?