1 / 33

Lower Bounds for Read / Write Streams

Lower Bounds for Read / Write Streams. Paul Beame Joint work with Trinh Huynh (Dang-Trinh Huynh-Ngoc) University of Washington. Data stream Algorithms. Many huge successes No need to remind people at this workshop! Some problems provably hard

Download Presentation

Lower Bounds for Read / Write Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Lower Bounds for Read/Write Streams Paul Beame Joint work withTrinh Huynh (Dang-Trinh Huynh-Ngoc)University of Washington

  2. Data stream Algorithms • Many huge successes • No need to remind people at this workshop! • Some problems provably hard • E.g. Frequency moments Fk, k > 2 require space Ω(n1-2/k) [Bar-Yossef-Jayram-Kumar-Sivakumar 02], [Chakrabarti-Khot-Sun 03]

  3. Beyond Data Streams • Disk storage can be huge • Can stream data to/from disks in real time • Sequential access hides latency • Motivates multipass streams • Analyzed by similar methods to single pass • Why stop at a single copy? • Working with more than one copy at once may make computations easier • Why stream the data onto disks exactly as read? • Can make modifications to data while writing

  4. Disks  read/write streams Key Parameters: space, #passes=reversals Assume #streams is constant Introduced by [Grohe-Schweikardt 05] Read/write streams model 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 1 1 0 1 0 memory

  5. Read/write streams model • Much more powerful than data-stream model • Sort with O(log n)passes, O(log n) space, 3 streams • MergeSort • Exactly compute any frequency moment • Data-stream requires passes  space= Ω(n) • Θ(log n) passes, O(1) space gives all of LOGSPACE [Hernich-Schweikardt 08] What can be computedin o(log n)passes + small space?

  6. Previous lower bounds for R/W streams • In o(log n) passes need Ω(n1-ε) space to • Sort n numbers [Grohe-Schweikardt 05] • Test set-equalityA=B, multiset equality, XQuery, XPath [Grohe-Hernich-Schweikardt 06] • Same lower bounds apply for randomized algorithms with one-sided error [Grohe-Hernich-Schweikardt 06]

  7. Previous lower bounds for R/W streams • Lower bounds for general randomness and two-sided error: • Ino(log nlog log n)passes, needΩ(n1-ε)space to: • Approximate F*within factor 2 • Find Empty-Join, XQuery/XPath-Filtering etc. [B-Jayram-Rudra 07] What about approximating frequency moments Fkfor k  2?

  8. Our Main Result Theorem: Any randomized R/W-stream algorithm using o(log n) passes needs Ω(n1-4/k-ε)space to 2-approximate Fk • Implies polynomial space for k>4 • Compare with:Θ(n1-2/k)on data streams R/W streams with o(log n) passes don’t help much for approximating frequency moments.

  9. Methods

  10. [Alon-Matias-Szegedy 96] approach to lower bounding Fk in data streams • Reduce testing t-party set-disjointness to Fk Easy! • Simulate any data-stream algorithm by amulti-party number-in-hand communication game Trivial! • ApplyΩ(n/t) communication lower bound on t-party set-disjointness [AMS 96,Saks-Sun 02,Bar-Yossef-Jayram-Kumar-Sivakumar 02, Chakrabarti-Khot-Sun 03,Grönemeier 09](tight!) Solved easily by R/W streams! Fails for R/W streams! Cannot be applied to R/W streams!

  11. Promise Set-Disjointness (DISJ) 0, x1,…,xtare pair-wise disjoint DISJn,t(x1,…,xt) = 1, a s.t. axi for every i Undefined otherwise x1 x2 x3 x4 x5  • t-party NIH communication: Ω(nt) • Approximating Fk testing DISJn,tfor t n1/k

  12. R/W streams easily solve DISJn,t • Testing DISJn,t with 2 streams,3 passes,O(log n) space • Input: x1,x2,…,xt{0,1}n x1 x2 xt-1 xt x2 xt-1 xt x1

  13. How to prove lower bounds in R/W streams? • Lower bounds [GS05], [GHS05], [BJR07] for R/W streams don’t use [AMS96] outline • Introduce permuted 2-party versions of problems • Employ ad-hoc combinatorial arguments We take a more general approach related to [AMS96] directly using NIH comm. complexity

  14. Our approach to lower bound Fk R/W streams algorithm for t-party-permuted-DISJon input size n Number-in-hand communication protocol for t-party-DISJon input size  nt2

  15. [Alon,Matias,Szegedy 96]’s approach to lower bound Fk in data stream Our approach to lower bound Fkin R/W streams • Reduce testing t-party set-disjointness to Fk Easy! • Simulate data-stream algorithms bymulti-party number-in-hand communication game Apply our simulation • Apply communication lower bound on t-party set-disjointness [AMS96,SS02,B-YJKS02,CKS03,G09] (tight!) • 1. Reduce testing permutedt-party DISJ to Fk • 2. Simulate R/W streams for permutedDISJ by NIH comm. for DISJ on slightly smaller input size

  16. Ideas from the proof

  17. Segmenting DISJn,t Input: x1,x2,…,xt{0,1}n • View DISJn,tas an OR of m subproblems DISJn/m,t x1 x2 xt-1 xt 1 2    m 1 2    m nm nm

  18. Permuted DISJ Fix 1,2,…,tpermutationson[m] Permuted-DISJn,m,t • View Permuted-DISJn,m,t as an OR of m subproblems DISJn/m,t DISJn/m,t DISJn/m,t       t(xt) 1(x1) 2(x2) 1(1)1(2)   1(m) 1 2   m t(1) t(2)    t(m) 1 2    m nm nm

  19. Why is permuted-DISJ hard? • Intuitively, to solve a subproblem (e.g. blue), we need to compare at least two blue segments • Need to compare at least two segments of every color • If segments are shuffled, many passes are needed DISJn/m,t    i(xi) j(xj) l(xl)    

  20. Permuted DISJ Good subproblem: computation always depends only on at most one of its t segments (and the memory/state) If segments are randomly shuffled: With o(log m) passes, t=o(m1/2) parties, 99% of the m subproblems are good Reduction idea:Try to embed an ordinary DISJn/m,t in one of the good subproblems Catch:Which subproblems are good depends on input

  21. Simulation s-spaceR/W streams algo A for permuted-DISJn,m,t NIH comm. protocol for DISJn/m,t t players on input y1,y2,…,yt: • Generate m-1 DISJn/m,t’s that look like* y1,y2,…,yt • Shuffle with 1,2,…,t • (y1,y2,…,yt)isgoodw.h.p • Run A on 1(x1),…,t(xt) 1(x1) x1 y1 2(x2) x2 y2  *same sizes but don’t intersect

  22. Generating the extended input Given y1,y2,…,yt, players • Exchange the sizes of each of the sets • O(tlogn) bits • Choose random consistent reordering of the indices of each y1,y2,…,yt • Generate m-1 random inputs to DISJn/m,twith same set sizes as y1,y2,…,yt but that are disjoint • Place y1,y2,…,yt in random position and then shuffle Key observation: If y1,y2,…,ytare disjoint then this resolves the catch • After shuffling, all the subproblems look the same so the probability that the subproblem where y1,y2,…,yt lands is good does not depend on the input

  23. Simulating R/W stream algorithm A using NIH communication • As A executes on input v=1(x1),…,t(xt) players know all inputs except y1,…,yt • each player builds up copy of a dependency graphσ(v) for the elements of each stream so far • Using σ(v), at each step all players either • know the next move, or • know which one player knows next block of moves • that player communicates • know that need two players’ info: simulation “fails” • If subproblem y1,…,yt is good for v then simulation does not fail • If players detect failure they output “not disjoint” • If input was disjoint then only 1% chance of this

  24. Stream R to L Stream L to R Stream L to R Dependency Graph Vertices: Elements of each stream in each pass Edges: From element to elements in previous pass that contained heads at same time it did pass 0 pass 1 pass j -1 pass j pass j+1

  25. Why most subproblems are good • Simple case:algorithm just makes copies of the input stream and compares them • # of subproblems with > 1 segment read at same time on single pass through the streams (L-to-R or R-to-L on each stream) • ≤ # segments appearing in the same (or reversed) order • Almost surely, for random permutations 1,2,…,t no pair has a common subsequence or inverted subsequence longer than 2em1/2 • When t is o(m1/2) the total is o(m).

  26. Why most subproblems are good • General case:May combine information about all streams onto a single stream in single pass • What is combined may depend on the input values • Each element depends on the segments that it can reach in the input stream via the dependency graph

  27. Why most subproblems are good • For each fixed v, after p=o(log m) passes: • Each element can depend on only 2O(p) different input segments • For any one stream, the sequence of its elements’ dependencies on input segments is the interleaving of 2O(p) monotone subsequences from 1,2,…,t  Only 2O(p) t m1/2=mo(1) bad subproblems on input v

  28. Communication Cost of Simulation • For each fixed v, after p=o(log m) passes: • Only 2O(p) t elements depend on a segment and have a neighbor that does not depend on it • Players only need to communicate when segment dependencies change • only happens 2O(p)t times at cost of O(ps) bits per time

  29. Limitations and Future Work

  30. R/W streams algo for permuted-DISJn,m,t NIH CC protocol for DISJn/m,t Limitation of using permuted-DISJ • Gap from data stream due to loss in input size • Most of this loss is necessary • Need nm  (t2) to use Ω(n/t) CC lower bound for DISJn/m,t • Efficient R/W algo for permuted-DISJn,m,tunless m ≥ t32 • Implies that n isΩ(mt2) which is Ω(t3.5)  Since we need t≈n1/k, the lower bound Ω(n/t) is trivial for k  3.5

  31. A longest-common-subsequence problem on permutations In any 3 permutations on [m] there is a pair with longest common subsequence length ≥m1/3. • Algorithm for permuted-DISJn,m,t follows from the following theorem: Proof: For each i [m] define a triple ti of integers: For each of the 3 pairs of permutations put length of the longest common subsequence for that pair that ends with valuei. Can show that all m triples are different. So some triple must contain a coordinate ≥m1/3 • Tight even for 4 permutations

  32. R/W stream algorithm for permuted-DISJn,m,t for large t  1(x1) 2(x2) 3(x3) 4(x4) 5(x5) 6(x6)  1(x1) 2(x2) 3(x3) 4(x4) 5(x5) 6(x6) • Compare m1/3 blocks each time In any three permutations on [m] there is a pair with longest common subsequence length ≥m1/3. t  m2/3, any : Testing permuted-DISJn,m,t with 2 streams, 3 passes, O(log nmt) space

  33. Open problems • Is Ω(n1-4/k-ε)lower bound for R/W streams tight? • Gap from O(n1-2/k) upper bound in data stream • Can’t use permuted-DISJn,m,t to close it • Polynomial space to compute Fkfor 2 < k ≤ 4? • Other problems on R/W streams? • L(m,k)  maximum LCS length that can be guaranteed between some pair in any set of k permutations on [m]. • We show L(m,3) L(m,4) m1/3 • What is L(m,k) for other values of k? • [B-Blais-Huynh 08]L(m,k)= m1/3+o(1)for kmO(1)

More Related