1 / 43

Two Techniques for Proving Lower Bounds

Two Techniques for Proving Lower Bounds. Hagit Attiya Technion. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A. Goal of this Presentation. Describe two common techniques for proving lower bounds in distributed computing:

lapis
Download Presentation

Two Techniques for Proving Lower Bounds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Two Techniques for Proving Lower Bounds Hagit Attiya Technion TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA

  2. Goal of this Presentation • Describe two common techniques for proving lower bounds in distributed computing: • Information theory arguments • Covering • Variations • Applications

  3. My always first slide… problem nicer system architecture algorithm implementation real system architecture

  4. Part I Information Theory Arguments

  5. Overview • Bound the flow of information among processes (and memory) • Show that information takes long to be acquired • Argue that solving a particular problem requires information about many processes • Usually applies to: • Shared memory systems • Synchronous executions (imply lower bounds also for asynchronous executions) • Details depend on the primitives used

  6. Single-writer registers: Possible argument • Need to read from each process • The state of a process can be found only in its own register • Hence, first process must read n registers

  7. Not really When processes take steps together First process doubles information in 2nd step But can’t do better than that

  8. More Refined Argument • Consider synchronized executions • Processes take steps in rounds • All reads appear before all writes • INF(pi,t-1): The set of inputs influencing process pi at the start of round t • For t = 1, INF(pi,t-1) = {pi} • For t > 1, ifpi reads a value written by pj, INF(pi,t) = INF(pi,t-1) [ INF(pj,t-1) • For t > 1, if pi writes, INF(pi,t) = INF(pi,t-1)

  9. INF determines the state • INF(pi,t-1): The set of inputs influencing process pi at the start of round t • For t = 1, INF(pi,t-1) = {pi} • For t > 1, ifpi reads a value written by pj, INF(pi,t) = INF(pi,t-1) [ INF(pj,t-1) • For t > 1, if pi writes, INF(pi,t) = INF(pi,t-1) Proof by case analysis Lemma: If the states of processes in INF(pi,t-1) are the same in configurations C and C’, then pitakes the same steps in a t-round execution from C and from C’

  10. Size of INF • INF(pi,t-1): The set of inputs influencing process pi at the start of round t • For t = 1, INF(pi,t-1) = {pi} • For t > 1, ifpi reads a value written by pj, INF(pi,t) = INF(pi,t-1) [ INF(pj,t-1) • For t > 1, if pi writes, INF(pi,t) = INF(pi,t-1) • I(t) = max |INF(pi,t)|  I(t) ≤ 2t Lemma: I(0) = 1, and I (t) ≤ 2 I(t-1)

  11. Simple application: Computing OR • Consider input configuration C0 = (0,0, , 0, , 0) • The size of the influence set of a process is < n in all rounds < log n • Some process pi is not in INF(p1,log n-1) • By lemma, p_1 returns the same value in C0 and in C1 = (0,0, , 1, , 0) • A contradiction pi

  12. Application: Approximate agreement For a small ² > 0 • Processes start with input in [0,1] • Must decide on an output in [0,1] such that • All outputs are within ² of each other (agreement) • If all inputs are v, the output is v (validity) System is asynchronous and a process must decide even if it runs by itself (solo termination)

  13. Application: Approximate agreement [Attiya, Shavit, Lynch] • Consider input configuration C0 = (0,0, , , , 0) • Run all processes to completion from C0must decide 0 • If number of rounds T < log n • I(T) < n • 9 process pi INF(p1,T)

  14. Approximate agreement (cont.) • Consider two input configurations C0 = (0, , , , , 0) C1 = (0, , 1 , , 0) • Run pi to completion, must decide 1 • pi INF(p1,T) • p1 still decides 0 when running from this configuration, contradicting agreement pi Theorem: Solo-terminating approximate agreement requires (log n) rounds in a synchronous failure-free run

  15. Approximate agreement (cont.) • Consider two input configurations C0 = (0, , , , , 0) C1 = (0, , 1 , , 0) • Run pi to completion, must decide 1 • pi INF(p1,T) • p1 still decides 0 when running from this configuration, contradicting agreement pi Overhead of solo-termination: in “nice” runs, since otherwise, a synchronous algorithm can solve the problem in one round. Theorem: Solo-terminating approximate agreement requires (log n) rounds in a synchronous failure-free run

  16. With multi-writer registers • Previous theorem does not hold • A wait-free approximate agreement algorithm that takes O(1) rounds in “nice” executions [Schenk] • Even simpler: An O(1) OR algorithm

  17. With multi-writer registers • Previous theorem does not hold • A wait-free approximate agreement algorithm that takes O(1) rounds in “nice” executions [Schenk] • Even simpler: An O(1) OR algorithm • Only a few initial configurations to distinguish between Overhead of single-writer registers: Separates single-writer and multi-writer registers Can you find it?

  18. Information flow with multi-writer registers The previous argument does not hold Instead, consider how learning more information allows to differentiate between input configurations Capture as a partitioning of process states and memory values [Beame] (0, , , , , 0) (0, , 0 , , 1) (0, , 1 , , 0) (1, , 1 , , 0)

  19. Multi-writer registers: Ordering events Within each round • Put all reads, then • Put all writes • Reads obtain value written at the end of previous round

  20. Partitioning into equivalence classes For process p and round t, two input configurations are in the same equivalence class of P(p,t)if p is in the same state after t rounds from both(in a synchronous failure-free execution) P(t): the number of classes after t rounds (max over p) V(R,t), V(t) defined similarly for locations R  P(t), V(t) · (4n+2)2t−2 Lemma: P(t) · P(t-1)V(t-1) and V(t) · n P(t-1)+V(t-1)

  21. Application: The collect problem • update(v) stores v as latest value of a process • collect() returns a set of values (one per process) When each process initially stores one of two values • There are 2n possible input configurationsEach leading to a different output Previous lemma implies (4n+2)2t−2 ≥ P(t) ≥ 2n • Must have (log n) rounds

  22. Also for other primitives (CAS) Non-reading CAS Reading CAS returns the old value (can be handled, but we won’t do that) Can also extend to non-reading kCAS • CAS(R,old,new){ • if R==old then • R = new • return success • else return fail • }

  23. Careful with CAS More information flow in a sequence of steps initially, R == 0 cas(R,0,1) cas(R,1,2) . . . cas(R,n−1,n) On the other hand cas(R,n-1,n) cas(R,n-2,n-1) . . . cas(R,0,1)      

  24. Ordering events within a round Put all reads first. Put all writes last. For every register R whose current value is v, consider all CAS events: • Put all events with old  v: all fail • Put all events with old == v: only the first succeeds(assumes operations are non-degenerate) Allows to prove a lemma analogue to multi-writer registers (different constants)

  25. Information Flow with Bounded Fan-In Arbitrary objects, but bounded contention • Not too many processes access the same base object similtaneously Isolate processes n a Q-independentexecution • Only processes in Q take steps • Access only objects not modified by processes in Q • For a process p 2 Q, a Q-independent execution is indistinguishable from a p-solo execution

  26. Constructing independent executions Lemma: For any algorithm using only objects with contention ≤ w and every t ≥ 0, there is a t-round Qt-independent execution, with| Qt | ≥ n/(w+2)t Proof by induction, with a trivial base case. Induction step: consider Qt-independentexecution. We use the following result from graph theory. Look at the next steps processes in Qt are about to perform, and construct an undirected graph (V,E) Turan theorem: Any graph (V,E) has an independent set of size |V|2/(|V|+2|E|)

  27. Induction step: The graph • V = Qt • E contains an edge {pi, pj} if • pi and pjaccess the same object, or • pi is about to read an object modified by pj, or • pjis about to read an object modified by pi |E| ≤ | Qt|(w+1)/2 Turan’s theorem and inductive hypothesis there is an independent set Qt+1 of size ≥ n/(w+2)t Omit all steps of Qt – Qt+1from the execution to get aQt+1-independentexecution

  28. Application: Weak Test&Set Weak test&set: Like test&set but at most onesuccess Take t such that (w+2)t < n Lemma gives a t-round {pi,pj}-independent execution • Each of pi and pj seems to be running solo • must succeed • Contradiction Theorem: The solo step complexity of weak test&setis (log n / log w )

  29. Part II Covering

  30. Covering: The basic idea Several processes write to the same location Writes by early processes are lost, if no read in between • Must write to distinct locations • Other process must read these locations

  31. Max Register • WriteMax(v,R) operation • ReadMaxoperation op returns the maximal value written by a WriteMax operation that • completed before op started, or • overlaps op • Special case of a linearizable object

  32. Lower bound for ReadMax operation [Jayanti, Tan, Toueg] The proof is constructive Theorem: ReadMax must read n different registers.

  33. Construction for the lower bound p1 … pkperform WriteMax operations writes by p1 … pk to R1 … Rk Pnperforms ReadMax operation reads R1 … Rk ®k ¯k °k Proof by induction on k = 0, …, n Base case is simple Taking k = n yields the result

  34. Inductive Step p1 … pkperform WriteMax operations writes by p1 … pk to R1 … Rk Pnperforms ReadMax operation ®k ¯k °k must write to R R1 …Rk pk+1 perform WriteMax operations writes by p1 … pk to R1 … Rk does not observe pk+1 Pnperforms ReadMax operation ¯k °k

  35. Inductive Step p1 … pkperform WriteMax operations writes by p1 … pk to R1 … Rk Pnperforms ReadMax operation ®k ¯k °k must write to R R1 …Rk pk+1 perform WriteMax operations ¼k writes by p1 … pk to R1 … Rk must readR R1 …Rk Pnperforms ReadMax operation ¯k °k

  36. Inductive Step p1 … pkperform WriteMax operations writes by p1 … pk to R1 … Rk Pnperforms ReadMax operation ®k ¯k °k pk+1 perform WriteMax operations ¼k writes by p1 … pk to R1 … Rk Pnperforms ReadMax operation write to Rk+1 ¯k °k Claim follows with R1 … RkRk+1 and ®k+1 = ®k¼k

  37. Swap objects Theorem holds for other primitives and objects, e.g., (register-to memory) swap Need some care in constructing ¼k, °k • swap(R,v){ • tmp = R • return tmp • }

  38. Result holds also for other objects • E.g., counters • Constructed execution contains many increment operations • Better algorithms when • Few increment operations • Max register holds bounded values [Aspnes, Attiya, Censor-Hillel]

  39. Counters with CAS Counters can be implemented with a single location R, and a single CAS per operation: • To increment, simply: • read previous value from R • CAS +1 to R • To read the counter, simply read R  Lots of contention on R!  This is inevitable

  40. The memory stalls measure [Dwork, Herlihy, Waarts] If k processes access (or modify) the same location at the same configuration • The first process incurs one step, and no stalls • The second process incurs one step, and one stall • . • . • . • The k’th process incurs one step, and k-1 stalls

  41. Lower bound on number of stalls Theorem: ReadCounter must incur n stalls + steps. Similar construction as in previous theorem p1 … pkperform Increment operations p1 … pkpoised onR1 … Rm, m · k Pnperforms ReadCounter operation accessesR1 … Rm

  42. Lower bound on number of stalls Theorem: ReadCounter must incur n stalls + steps. Similar construction as in previous theorem p1 … pkperform Increment operations p1 … pkpoised onR1 … Rm, m · k Pnperforms ReadCounter operation incurs k stalls + steps accessesR1 … Rk

  43. Wrap-up • There are many lower bound results But fewer techniques… • Some results & techniques are relevant to questions asked in Transform • Material is based on monograph-in-writing with Faith Ellen • Let me know if you want to proof-read it!

More Related