1 / 37

1+eps-Approximate Sparse Recovery

1+eps-Approximate Sparse Recovery. Eric Price MIT. David Woodruff IBM Almaden. Compressed Sensing. Choose an r x n matrix A Given x 2 R n Compute Ax Output a vector y so that |x-y| p · (1+ ε ) |x-x top k | p x top k is the k-sparse vector of largest magnitude coefficients of x

jbarbour
Download Presentation

1+eps-Approximate Sparse Recovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden

  2. Compressed Sensing • Choose an r x n matrix A • Given x 2 Rn • Compute Ax • Output a vector y so that |x-y|p· (1+ε) |x-xtop k|p • xtop k is the k-sparse vector of largest magnitude coefficients of x • p = 1 or p = 2 • Minimize number r = r(n, k, ε) of “measurements” PrA[ ] > 2/3

  3. Previous Work • p = 1 [IR, …] r = O(k log(n/k) / ε) (deterministic A) • p = 2 [GLPS] r = O(k log(n/k) / ε) In both cases, r = (k log(n/k)) [DIPW] What is the dependence on ε?

  4. Why 1+ε is Important • Suppose x = ei + u • ei = (0, 0, …, 0, 1, 0, …, 0) • u is a random unit vector orthogonal to ei • Consider y = 0n • |x-y|2 = |x|2· 21/2¢ |x-ei|2 It’s a trivial solution! • (1+ε)-approximate recovery fixes this In some applications, can have 1/ε = 100, log n = 32

  5. Our Results Vs. Previous Work • p = 1 [IR, …] r = O(k log(n/k) / ε) r = O(k log(n/k) ¢ log2(1/ ε) / ε1/2) (randomized) r = (k log(1/ε) / ε1/2) • p = 2: [GLPS] r = O(k log(n/k) / ε) r = (k log(n/k) / ε) Previous lower bounds (k log(n/k)) Lower bounds for randomized constant probability

  6. Comparison to Deterministic Schemes • We getr = O~(k/ε1/2) randomized upper bound for p = 1 • We show(k log (n/k) /ε) for p = 1 for deterministic schemes • So randomized easier than deterministic

  7. Our Sparse-Output Results • Output a vector y from Ax so that |x-y|p· (1+ε) |x-xtop k|p • Sometimes want y to be k-sparse r = ~(k/εp) • Both results tight up to logarithmic factors • Recall that for non-sparse output r = £~(k/εp/2)

  8. Talk Outline • O~(k / ε1/2) upper bound for p = 1 • Lower bounds

  9. Simplifications • Want O~(k/ε1/2) for p = 1 • Replace k with 1 • Sample 1/k fraction of coordinates • Solve the problem for k = 1 on the sample • Repeat O~(k) times independently • Combine the solutions found ε/k, ε/k, …, ε/k, 1/n, 1/n, …, 1/n ε/k, 1/n, …, 1/n

  10. k = 1 • Assume |x-xtop|1 = 1, and xtop = ε • First attempt • Use CountMin [CM] • Randomly partition coordinates into B buckets, maintain sum in each bucket • The expected l1-mass of “noise” in a bucket is 1/B • If B = £(1/ε), most buckets have count < ε/2, but bucket that contains xtop has count > ε/2 • Repeat O(log n) times

  11. Second Attempt • But we wanted O~(1/ε1/2) measurements • Error in a bucket is 1/B, need B ¼ 1/ε • What about CountSketch? [CCF-C] • Give each coordinate i a random ¾(i) 2 {-1,1} • Randomly partition coordinates into B buckets, maintain Σi s.t. h(i) = j¾(i)¢xi in j-th bucket • Bucket error is (Σi  top xi2/ B)1/2 • Is this better?

  12. CountSketch • Bucket error Err = (Σ i  topxi2 / B)1/2 • All |xi| ·ε and |x-xtop|1 = 1 • Σi  top xi2· 1/ ε¢ε2·ε • So Err · (ε/B)1/2 which needs to be at most ε • Solving, B ¸ 1/ ε • CountSketch isn’t better than CountMin 

  13. Main Idea • We insist on using CountSketch with B = 1/ε1/2 • Suppose Err = (Σ i  top xi2/ B)1/2 = ε • This means Σ i  top xi2= ε3/2 • Forget about xtop ! • Let’s make up the mass another way

  14. Main Idea • We have: Σ i  top xi2= ε3/2 • Intuition: suppose all xi, i  top, are the same or 0 • Then: (# non-zero)*value = 1 (# non-zero)*value2 = ε3/2 • Hence, value = ε3/2and # non-zero = 1/ε3/2 • Sample ε-fraction of coordinates uniformly at random! • value = ε3/2and # non-zero sampled = 1/ε1/2, so l1-contribution = ε • Find all non-zeros with O~(1/ε1/2) measurements

  15. General Setting • Σ i  top xi2= ε3/2 • Sj = {i | 1/4j < xi2· 1/4j-1} • Σ i  top xi2= ε3/2implies there is a j for which |Sj|/4j = ~(ε3/2) ε3/4 … 16ε3/2, …, 16ε3/2 4ε3/2, …, 4ε3/2 ε3/2, …, ε3/2

  16. General Setting • If |Sj| < 1/ε1/2, then 1/4j > ε2, so 1/2j > ε, can’t happen • Else, sample at rate 1/(|Sj| ε1/2) to get 1/ε1/2elements of |Sj| • l1-mass of |Sj| in sample is > ε • Can we find the sampled elements of Sj? Use Σ i  top xi2= ε3/2 • The l22 of the sample is about ε3/2¢ 1/(|Sj| ε1/2) = ε/|Sj| • Using CountSketch with 1/ε1/2 buckets: Bucket error = sqrt{ε1/2¢ε3/2¢1/(|Sj| ε1/2)} = sqrt{ε3/2/|Sj|} < 1/2j since |Sj|/4j > ε3/2

  17. Algorithm Wrapup • Sub-sample O(log 1/ε) times in powers of 2 • In each level of sub-sampling maintain CountSketch with O~(1/ε1/2) buckets • Find as many heavy coordinates as you can! • Intuition: if CountSketch fails, there are many heavy elements that can be found by sub-sampling • Wouldn’t work for CountMin as bucket error could be ε because of n-1 items each of value ε/(n-1)

  18. Talk Outline • O~(k / ε1/2) upper bound for p = 1 • Lower bounds

  19. Our Results • General results: • ~(k / ε1/2) for p = 1 • (k log(n/k) / ε) for p = 2 • Sparse output: • ~(k/ε) for p = 1 • ~(k/ε2) for p = 2 • Deterministic: • (k log(n/k) / ε) for p = 1

  20. Simultaneous Communication Complexity Bob Alice What is f(x,y)? y x MB(y) MA(x) • Alice and Bob send a single message to the referee who outputs f(x,y) with constant probability • Communication cost CC(f) is maximum message length, over randomness of protocol and all possible inputs • Parties share randomness

  21. Reduction to Compressed Sensing • Shared randomness decides matrix A • Alice sends Ax to referee • Bob sends Ay to referee • Referee computes A(x+y), uses compressed sensing recovery algorithm • If output of algorithm solves f(x,y), then # rows of A * # bits per measurement > CC(f)

  22. A Unified View • General results: Direct-Sum Gap-l1 • ~(k / ε1/2) for p = 1 • ~(k / ε) for p = 2 • Sparse output: Indexing • ~(k/ε) for p = 1 • ~(k/ε2) for p = 2 • Deterministic: Equality • (k log(n/k) / ε) for p = 1 Tighter log factors achievable by looking at Gaussian channels

  23. General Results: k = 1, p = 1 • Alice and Bob have x, y, respectively, in Rm • There is a unique i* for which (x+y)i* = d For all j  i*, (x+y)j2 {0, c, -c}, where |c| < |d| • Finding i* requires (m/(d/c)2) communication [SS, BJKS] • m = 1/ε3/2, c = ε3/2 , d = ε • Need (1/ε1/2) communication

  24. General Results: k = 1, p = 1 • But the compressed sensing algorithm doesn’t need to find i* • If not then it needs to transmit a lot of information about the tail • Tail a random low-weight vector in {0, ε3/2, - ε3/2}1/ε3 • Uses distributional lower bound and RS codes • Send a vector y within 1-ε of tail in l1-norm • Needs 1/ε1/2 communication

  25. General Results: k = 1, p = 2 • Same argument, different parameters • (1/ε) communication • What about general k?

  26. Handling General k • Bounded Round Direct Sum Theorem [BR] (with slight modification) given k copies of a function f, with input pairs independently drawn from ¹, solving a 2/3 fraction needs communication (k¢CC¹ (f)) ε1/2 ε3/2, …, ε3/2 } k ε3/2, …, ε3/2 ε1/2 … ε1/2 ε3/2, …, ε3/2 Instance for p = 1

  27. Handling General k • CC = (k/ε1/2) for p = 1 • CC = (k/ε) for p = 2 • What is implied about compressed sensing?

  28. Rounding Matrices [DIPW] • A is a matrix of real numbers • Can assume orthonormal rows • Round the entries of A to O(log n) bits, obtaining matrix A’ • Careful • A’x = A(x+s) for “small” s • But s depends on A, no guarantee recovery works • Can be fixed by looking at A(x+s+u) for random u

  29. Lower Bounds for Compressed Sensing • # rows of A * # bits per measurement > CC(f) • By rounding, # bits per measurement = O(log n) • In our hard instances, universe size = poly(k/ε) • So # rows of A * O(log (k/ε)) > CC(f) • # rows of A = ~(k/ε1/2) for p = 1 • # rows of A = ~(k/ε) for p = 2

  30. Sparse-Output Results • Sparse output: Indexing • ~(k/ε) for p = 1 • ~(k/ε2) for p = 2

  31. Sparse Output Results - Indexing What is xi? i 2 {1, 2, …, n} x 2 {0,1}n CC(Indexing) = (n)

  32. (1/ε) Bound for k=1, p = 1 Generalizes to k > 1 to give ~(k/ε) Generalizes to p = 2 to give ~(k/ε2) y = ei x 2 {- ε, ε}1/ε • Consider x+y • If output is required to be 1-sparse must place mass on the i-th coordinate • Mass must be 1+ε if xi = ε, otherwise 1-ε

  33. Deterministic Results • Deterministic: Equality • (k log(n/k) / ε) for p = 1

  34. Deterministic Results - Equality Is x = y? y 2 {0,1}n x 2 {0,1}n Deterministic CC(Equality) = (n)

  35. (k log(n/k) / ε) for p = 1 Choose log n signals x1, …, xlog n, each with k/ε values equal to ε/k x = Σi=1log n 10i xi Choose log n signals y1, …, ylog n, each with k/ε values equal to ε/k y = Σi=1log n 10i yi Consider x-y Compressed sensing output is 0n iff x = y

  36. General Results – Gaussian Channels (k = 1, p = 2) • Alice has a signal x =ε1/2eifor random i 2 [n] • Alice transmits x over a noisy channel with independent • N(0, 1/n) noise on each coordinate • Consider any row vector a of A • Channel output = <a,x> + <a,y>, where <a,y> is N(0, |a|22/n) • Ei[<a,x>2] = ε |a|22/n • Shannon-Hartley Theorem: • I(i; <a,x>+<a,y>) = I(<a,x>; <a,x>+<a,y>) · ½ log(1+ ε) = O(ε)

  37. Summary of Results • General results • £~(k/εp/2) • Sparse output • £~(k/εp) • Deterministic • £(k log(n/k) / ε) for p = 1

More Related