1 / 24

Tight Lower Bounds for the Distinct Elements Problem

Tight Lower Bounds for the Distinct Elements Problem. David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk. 4. 3. 7. 3. 1. 1. 0. The Problem. Stream of elements a 1 , …, a n each in {1, …, m} Want F 0 = # of distinct elements Elements in adversarial order

Download Presentation

Tight Lower Bounds for the Distinct Elements Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk

  2. 4 3 7 3 1 1 0 The Problem • Stream of elements a1, …, an each in {1, …, m} • Want F0 = # of distinct elements • Elements in adversarial order • Algorithms given one pass over stream • Goal: Minimum-space algorithm …

  3. A Trivial Algorithm … 4 3 7 3 1 1 0 00000000 10011011 • Keep m-bit characteristic vector v of stream • j in stream $ vj = 1 • F0 = wt(10011011) = 5 • Space = m Can we do better?

  4. Negative Results • Any algorithm computing F0 exactly must use (m) space [AMS96] • Any deterministic alg. that outputs x with |F0 – x| < F0 must use (m) space [AMS96] • What about randomized approximation algorithms?

  5. Rand. Approx. Algorithms for F0 • O(log log m/2 + log m log 1/) alg. outputs x with Pr[| F0 – x| < F0 ] > ¾ [BJKST02] • Lots of hashing tricks Is this optimal? • Previous lower bounds • (log m) [AMS96] • (1/) [Bar-Yossef] • Open Problem of [BJKST02]: GAP: 1/ << 1/2

  6. Idea Behind Lower Bounds Alice Bob y 2 {0,1}m x 2 {0,1}m Stream s(y) Stream s(x) S Internal state of A (1 §) F0 algorithm A (1 §) F0 algorithm A • Compute (1 §) F0(s(x) ± s(y)) w.p. > ¾ • Idea: If can decide f(x,y) w.p. > ¾, space used • by A at least f’s rand. 1-way comm. complexity

  7. Randomized 1-way comm. complexity • Boolean function f: X£Y! {0,1} • Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y) • Only 1 message sent: must be from Alice to Bob • Comm. cost of protocol = expected length of longest message sent over all inputs. •  -error randomized 1-way comm. complexity of f, R(f), is comm. cost of optimal protocol computing f w.p. ¸ 1- • How do we lower bound R(f)?

  8. The VC Dimension [KNR] • F = {f : X! {0,1}} family of Boolean functions • f 2F is length-|X | “bit string” • For S µX, shatter coefficient SC(fS) of S is |{f |S}f 2 F| = # distinct bit strings when F restricted to S • SC(F, p) = maxS 2X, |S| = p SC(fS) • If SC(fS) = 2|S|, S shattered by F • VC Dimension of F, VCD(F), = size of largest S shattered by F

  9. Shatter Coefficient Theorem • Notation: For f: X£Y! {0,1}, define: fX = { fx(y) : Y ! {0,1} | x 2X }, where fx(y) = f(x,y) • Theorem [BJKS]: For every f: X £ Y ! {0,1}, every p ¸ VCD( fX ), R1/4(f) = (log(SC(fX, p)))

  10. The (1/) Lower Bound [Bar-Yossef] • Alice has x 2R {0,1}m, wt(x) = m/2 • Bob has y 2R {0,1}m, wt(y) = m and: • Either wt(x Æ y) = 0 OR wt(x Æ y) = m f(x,y) = 0 f(x,y) = 1 • R1/4(f) = (VCD(fX)) = (1/) [Bar-Yossef] • s(x), s(y) any streams w/char. vectors x, y • f(x,y) = 1 ! F0(s(x) ± s(y)) = m/2 • f(x,y) = 0 ! F0(s(x) ± s(y)) = m/2 + m • (1+’)m/2 < (1 - ’)(m/2 + m) for ’ = () • Hence, can decide f ! F0 alg. uses (1/) space

  11. Our Results • Remainder of talk:  (1/2) lower bound for  = (m-1/(9+k)) for any k > 0. • !O(log log m/2 + log m log 1/) upper bound almost optimal • IDEA: Reduce from protocol for computing dot product

  12. The Promise Problem • t = (1/2), Y = basis of unit vectors of Rt Alice Bob x 2 [0,1]t ||x|| = 1 y 2Y • Promise Problem : • hx,yi = 0 hx,yi = 2/t1/2 • f(x,y) = 0 OR f(x,y) = 1 • X = {x 2 [0,1]t, ||x|| = 1 and 9 y 2Y s.t. (x,y) 2 } • We lower bound R1/4(f) via SC(fX, t)

  13. Bounding SC(fX, t) • Theorem: SC(fX, t/4) = 2(t) • Proof: • 8 T ½ {Y} s.t. |T| = t/4, put xT = (2/t1/2) ¢e 2 T e • Define X1½X as X1 = {xT | T ½ {Y}, |T| = t/4} • Claim: 8 s 2 {0,1}t w/ wt(x) = t/4, s 2 truth tab. of fX1 • Proof: • Let s 2 {0,1}t with 1s in positions i1, …, it/4 • Put T = {ei1, …, eit/4}. 8 e 2 T, he, xTi = 2/t1/2 = 2 • 8 e 2Y - T, h e, xTi = 0 • There are 2(t) such s.

  14. Bounding R1/4(f) • Corollary: • Reduction: we need protocol computing f with communication = space used by any (1 §) F0 approx. alg.

  15. Reduction • Recall: • hx,yi = 0 if f(x,y) = 0 • hx,yi = 2/t1/2 if f(x,y) = 1 • Goal: Reduce “separation” of hx,yi to separation of F0(s(x) ± s(y)) for streams s(x),s(y) Alice/Bob can derive from x,y • Use relation: ||y-x||2 = ||y||2 + ||x||2 – 2hx, yi • f(x,y) = 0 ! ||y-x|| = 21/2 • f(x,y) = 1 ! ||y-x|| < 21/2 (1- 1/t1/2) = 21/2 (1 - ())

  16. Overview of Reduction x 2 [0,1]t ||x|| = 1 y 2 E • Low-distortion embedding • : l2t! l1poly(t) (y) (x) 2. Rational Approximation 3. Scale rationals to integers s 4. Convert integer coords to unary to get {0,1} vectors x’,y’ y’ x’ s(x’) s(y’) F0 Alg State F0 Alg F0(s(x’) ± s(y’)) can decide f(x,y) w.p. ¸ 3/4 F0(s(x’) ± s(y’))

  17. Embedding l2t into l1poly(t) • A (1+)-distortion embedding  : l2t! l1d is mapping s.t. 8 p,q 2 l2t, • Theorem [FLM77]: 89 a (1+ )-distortion embedding : l2t! l1d with:

  18. Embedding l2t into l1d x 2 [0,1]t ||x|| = 1 y 2 E Low-distortion embedding : l2t! l1d (y) (x) • Using Theorem [FLM77], Alice/Bob get (x), (y) 2 Rd with d = O(t ¢ (log 1/) / 2): •  specified later

  19. Rational Approximation • z = z(t): N ! N; assume z ¸ d • Approximate each coord. of output of embedding by integer multiple of 1/z

  20. Scaling • Alice (resp. Bob) multiplies each coord. of (resp. ) by z • Obtains s( ) (resp. s( ) • Claim: coords. are integers in range [-2z, 2z] • Proof: • | | · |(¢)| + d/z · 2 • |s( )| = z||

  21. Converting to Unary • For i=1 to d • j à s( )i • Replace s( )i with 12z+j02z-j • Bob does same for s( ) • x’, y’ denote new length 4dz bitstrings • wt(x’) = |s()|, wt(y’) = |s( )| • (x’,y’) = |s( ) – s()|

  22. Reducing (x’,y’) to F0 • Alice (Bob) chooses stream ax’ (ay’) with char. vector x’ (y’). • Lemma: If 1 < wt(x’), wt(y’) < 2, then: 1 + (x’,y’)/2 < F0(ax’±ay’) < 2 + (x’,y’)/2 Follows from fact: F0(ax’±ay’) = wt(x’ Ç y’)

  23. Reducing (x’,y’) to F0 • Use lemma to show: • Set  = (), z = (1/5 log 1/) so that two cases distinguished by (1 §()) F0 alg

  24. Conclusions • ax’, ay’ must be in universe of size ¸ 4zd = (log (1/)/9) • Reduction only valid if 4zd · m •  (1/2) bound for  = (m-1/(9+k)) 8 k > 0. • Recently lower bound improved to: • (1/2) for ¸ m-1/2, which is optimal • Find set of vectors directly in Hamming space via involved prob. method argument

More Related