1 / 21

Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error

Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error. T.S. Jayram David Woodruff IBM Almaden. Data Stream Model. Have a stream of m updates to an n-dimensional vector v “add x to coordinate i”

morse
Download Presentation

Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error T.S. Jayram David Woodruff IBM Almaden

  2. Data Stream Model • Have a stream of m updates to an n-dimensional vector v • “add x to coordinate i” • Insertion model -> all updates x are positive • Turnstile model -> x can be positive or negative • stream length and updates < poly(n) • Estimate statistics of v • # of distinct elements F0 • Lp-norm |v|p = (Σi |vi|p )1/p • entropy • and so on • Goal: output a (1+ε)-approximation with limited memory

  3. Lots of “Optimal” Papers • Lots of “optimal” results • “An optimal algorithm for the distinct elements problem” [KNW] • “Fast moment estimation in optimal space” [KNPW] • “A near-optimal algorithm for estimating entropy of a stream” [CCM] • “Optimal approximations of the frequency moments of data streams” [IW] • “A near-optimal algorithm for L1-difference” [NW] • “Optimal space lower bounds for all frequency moments” [W] • This paper • Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error

  4. What Is Optimal? • F0 = # of non-zero entries in v • “For a stream of indices in {1, …, n}, our algorithm computes a (1+ε)-approximation using an optimal O(ε-2 + log n) bits of space with 2/3 success probability… This probability can be amplified by independent repetition.” • If we want high probability, say, 1-1/n, this increases the space by a multiplicative log n • So “optimal” algorithms are only optimal for algorithms with constant success probability

  5. Can We Improve the Lower Bounds? x 2 {0,1}ε-2 y 2 {0,1}ε-2 Gap-Hamming: either Δ(x,y) > ½ + ε or Δ(x,y) < ½-ε Lower bound of Ω(ε-2) with 1/3 error probability But upper bound of ε-2 with 0 error probability

  6. Our Results

  7. Streaming Results • Independent repetition is optimal! • Estimating Lp-norm in turnstile model up to 1+ε w.p. 1-δ • Ω(ε-2 log n log 1/δ) bits for any p • [KNW] get O(ε-2 log n log 1/δ) for 0 · p · 2 • Estimating F0 in insertion model up to 1+ε w.p. 1-δ • Ω(ε-2log 1/δ + log n) bits • [KNW] get O(ε-2 log 1/δ) for ε-2 > log n • Estimating entropy in turnstile model up to 1+ε w.p. 1-δ • Ω(ε-2log n log 1/δ) bits • Improves Ω(ε-2 log n) bound [KNW]

  8. Johnson-Lindenstrauss Transforms • Let A be a random matrix so that with probability 1- δ, for any fixed q 2 Rd |Aq|2 = (1 ± ε) |q|2 • [JL] A can be a 1/ε2 log 1/δ x d matrix • Gaussians or sign variables work • [Alon] A needs to have (1/ε2 log 1/δ) / log 1/ε rows • Our result: A needs to have 1/ε2 log 1/δ rows

  9. Communication Complexity Separation f(x,y) 2 {0,1} y x 1 0 D1/3, ρ (f) = communication of best 1-way deterministic protocol that errs w.p. 1/3 on distribution ρ [KNR]: R||1/3(f) = maxproduct distributions ¹ £ λ D ¹ £ λ,1/3(f)

  10. Communication Complexity Separation f(x,y) 2 {0,1} VC-dimension: maximum number r of columns for which all 2r rows occur in communication matrix on these columns [KNR]: R||1/3(f) = Θ(VC-dimension(f)) Our result: there exist f and g with VC-dimension k, but: R||δ(f) = Θ(k log 1/δ) while R||δ(g) = Θ(k)

  11. Our Techniques

  12. Lopsided Set Intersection (LSI) U = 1/ε2¢ 1/δ Is S Å T = ;? S ½ {1, 2, …, U} |S| = 1/ε2 T ½ {1, 2, …, U} |T| = 1/δ • Alice cannot describe S with o(ε-2 log U) bits • If x, y are uniform then with constant probability, S Å T = ; • R||1/3(LSI) > Duniform, 1/3 (LSI) = Ω(ε-2log 1/δ)

  13. Lopsided Set Intersection (LSI2) U = 1/ε2¢ 1/δ Is S Å T = ;? S ½ {1, 2, …, U} |S| = 1/ε2 T ½ {1, 2, …, U} |T| = 1 • R||δ/3(LSI2) ¸ R||1/3(LSI) = Ω(ε-2log 1/δ) • Union bound over set elements in LSI instance

  14. Low Error Inner Product x 2 {0, ε}U |x|2 = 1 U = 1/ε2¢ 1/δ Does <x,y> = 0? y 2 {0, 1}U |y|2 = 1 Estimate <x, y> up to ε w.p. 1-δ -> solve LSI2 w.p. 1-δ R||δ(inner productε) = Ω(ε-2log 1/δ)

  15. L2-estimationε - log 1/δ factor is new, but want an (ε-2log n log 1/δ) lower bound - Can use a known trick to get an extra log n factor x 2 {0, ε}U |x|2 = 1 U = 1/ε2¢ 1/δ What is |x-y|2 ? y 2 {0, 1}U |y|2 = 1 • |x-y|22 = |x|22 + |y|22 - 2<x, y> = 2 – 2<x,y> • Estimate |x-y|2 up to (1+Θ(ε))-factor solves inner-productε • So R||δ(L2-estimationε) = Ω(ε-2log 1/δ)

  16. Augmented Lopsided Set Intersection (ALSI2) Universe [U] = [1/ε2¢ 1/δ] j 2 [U] i*2 {1, 2, …, r} Si*+1 …, Sr S1, …, Sr½ [U] All i: |Si| = 1/ε2 Is j 2 Si*? R||1/3(ALSI2) = (r ε-2log 1/δ)

  17. Reduction of ALSI2 to L2-estimationε • - Set r = Θ(log n) • R|| δ(L2-estmationε) = (ε-2log n log 1/δ) • Streaming Space > R|| δ(L2-estimationε) S1 S2 … Sr x1 x2 … xr j Si*+1 … Sr yi* xi*+1 … xr } } y x y - x = 10i* yi* - i=1i* 10i¢ xi |y-x|2 is dominated by 10i* |yi* – xi*|2

  18. Lower Bounds for Johnson-Lindenstrauss x 2 {-nO(1), …, nO(1)} t y 2 {-nO(1), …, nO(1)} t Use public randomness to agree on a JL matrix A • Can estimate |x-y|2 up to 1+ε w.p. 1-δ • - #rows(A) = (r ε-2log 1/δ /log n) • Set r = Θ(log n) Ax - Ay |A(x-y)|2

  19. Low-Error Hamming Distance Universe = [n] Δ(x,y) = Hamming Distance between x and y x 2 {0,1}n y 2 {0,1}n • R||δ (Δ(x,y)ε) =(ε-2 log 1/δ log n) • Reduction to ALSI2 • Gap-Hamming to LSI2 reductions with Low Error • Implies our lower bounds for estimating • Any Lp-norm • Distinct Elements • Entropy

  20. Conclusions • Prove first streaming space lower bounds that depend on probability of error δ • Optimal for Lp-norms, distinct elements • Improves lower bound for entropy • Optimal dimensionality bound for JL transforms • Adds several twists to augmented indexing proofs • Augmented indexing with a small set in a large domain • Proof builds upon lopsided set disjointness lower bounds • Uses multiple Gap-Hamming to Indexing reductions that handle low error

  21. ALSI2 to Hamming Distance Embed multiple copies by duplicating coordinates at different scales j 2 [U] i*2 {1, 2, …, r} Si*+1 …, Sr S1, …, Sr½ [1/ε2¢ 1/δ] All i: |Si| = 1/ε2 - Let t = 1/ ε2 log 1/δ - Use public coin to generate t random strings b1, …, bt2 {0,1}t - Alice sets xi = majorityk in Si bi, k - Bob sets yi = bi ,j

More Related