1+eps-Approximate Sparse Recovery

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden

Compressed Sensing • Choose an r x n matrix A • Given x 2 Rn • Compute Ax • Output a vector y so that |x-y|p· (1+ε) |x-xtop k|p • xtop k is the k-sparse vector of largest magnitude coefficients of x • p = 1 or p = 2 • Minimize number r = r(n, k, ε) of “measurements” PrA[ ] > 2/3

Previous Work • p = 1 [IR, …] r = O(k log(n/k) / ε) (deterministic A) • p = 2 [GLPS] r = O(k log(n/k) / ε) In both cases, r = (k log(n/k)) [DIPW] What is the dependence on ε?

Why 1+ε is Important • Suppose x = ei + u • ei = (0, 0, …, 0, 1, 0, …, 0) • u is a random unit vector orthogonal to ei • Consider y = 0n • |x-y|2 = |x|2· 21/2¢ |x-ei|2 It’s a trivial solution! • (1+ε)-approximate recovery fixes this In some applications, can have 1/ε = 100, log n = 32

Our Results Vs. Previous Work • p = 1 [IR, …] r = O(k log(n/k) / ε) r = O(k log(n/k) ¢ log2(1/ ε) / ε1/2) (randomized) r = (k log(1/ε) / ε1/2) • p = 2: [GLPS] r = O(k log(n/k) / ε) r = (k log(n/k) / ε) Previous lower bounds (k log(n/k)) Lower bounds for randomized constant probability

Comparison to Deterministic Schemes • We getr = O~(k/ε1/2) randomized upper bound for p = 1 • We show(k log (n/k) /ε) for p = 1 for deterministic schemes • So randomized easier than deterministic

Our Sparse-Output Results • Output a vector y from Ax so that |x-y|p· (1+ε) |x-xtop k|p • Sometimes want y to be k-sparse r = ~(k/εp) • Both results tight up to logarithmic factors • Recall that for non-sparse output r = £~(k/εp/2)

Talk Outline • O~(k / ε1/2) upper bound for p = 1 • Lower bounds

Simplifications • Want O~(k/ε1/2) for p = 1 • Replace k with 1 • Sample 1/k fraction of coordinates • Solve the problem for k = 1 on the sample • Repeat O~(k) times independently • Combine the solutions found ε/k, ε/k, …, ε/k, 1/n, 1/n, …, 1/n ε/k, 1/n, …, 1/n

k = 1 • Assume |x-xtop|1 = 1, and xtop = ε • First attempt • Use CountMin [CM] • Randomly partition coordinates into B buckets, maintain sum in each bucket • The expected l1-mass of “noise” in a bucket is 1/B • If B = £(1/ε), most buckets have count < ε/2, but bucket that contains xtop has count > ε/2 • Repeat O(log n) times

Second Attempt • But we wanted O~(1/ε1/2) measurements • Error in a bucket is 1/B, need B ¼ 1/ε • What about CountSketch? [CCF-C] • Give each coordinate i a random ¾(i) 2 {-1,1} • Randomly partition coordinates into B buckets, maintain Σi s.t. h(i) = j¾(i)¢xi in j-th bucket • Bucket error is (Σi  top xi2/ B)1/2 • Is this better?

CountSketch • Bucket error Err = (Σ i  topxi2 / B)1/2 • All |xi| ·ε and |x-xtop|1 = 1 • Σi  top xi2· 1/ ε¢ε2·ε • So Err · (ε/B)1/2 which needs to be at most ε • Solving, B ¸ 1/ ε • CountSketch isn’t better than CountMin 

Main Idea • We insist on using CountSketch with B = 1/ε1/2 • Suppose Err = (Σ i  top xi2/ B)1/2 = ε • This means Σ i  top xi2= ε3/2 • Forget about xtop ! • Let’s make up the mass another way

Main Idea • We have: Σ i  top xi2= ε3/2 • Intuition: suppose all xi, i  top, are the same or 0 • Then: (# non-zero)*value = 1 (# non-zero)*value2 = ε3/2 • Hence, value = ε3/2and # non-zero = 1/ε3/2 • Sample ε-fraction of coordinates uniformly at random! • value = ε3/2and # non-zero sampled = 1/ε1/2, so l1-contribution = ε • Find all non-zeros with O~(1/ε1/2) measurements

General Setting • Σ i  top xi2= ε3/2 • Sj = {i | 1/4j < xi2· 1/4j-1} • Σ i  top xi2= ε3/2implies there is a j for which |Sj|/4j = ~(ε3/2) ε3/4 … 16ε3/2, …, 16ε3/2 4ε3/2, …, 4ε3/2 ε3/2, …, ε3/2

General Setting • If |Sj| < 1/ε1/2, then 1/4j > ε2, so 1/2j > ε, can’t happen • Else, sample at rate 1/(|Sj| ε1/2) to get 1/ε1/2elements of |Sj| • l1-mass of |Sj| in sample is > ε • Can we find the sampled elements of Sj? Use Σ i  top xi2= ε3/2 • The l22 of the sample is about ε3/2¢ 1/(|Sj| ε1/2) = ε/|Sj| • Using CountSketch with 1/ε1/2 buckets: Bucket error = sqrt{ε1/2¢ε3/2¢1/(|Sj| ε1/2)} = sqrt{ε3/2/|Sj|} < 1/2j since |Sj|/4j > ε3/2

Algorithm Wrapup • Sub-sample O(log 1/ε) times in powers of 2 • In each level of sub-sampling maintain CountSketch with O~(1/ε1/2) buckets • Find as many heavy coordinates as you can! • Intuition: if CountSketch fails, there are many heavy elements that can be found by sub-sampling • Wouldn’t work for CountMin as bucket error could be ε because of n-1 items each of value ε/(n-1)

Talk Outline • O~(k / ε1/2) upper bound for p = 1 • Lower bounds

Our Results • General results: • ~(k / ε1/2) for p = 1 • (k log(n/k) / ε) for p = 2 • Sparse output: • ~(k/ε) for p = 1 • ~(k/ε2) for p = 2 • Deterministic: • (k log(n/k) / ε) for p = 1

Simultaneous Communication Complexity Bob Alice What is f(x,y)? y x MB(y) MA(x) • Alice and Bob send a single message to the referee who outputs f(x,y) with constant probability • Communication cost CC(f) is maximum message length, over randomness of protocol and all possible inputs • Parties share randomness

Reduction to Compressed Sensing • Shared randomness decides matrix A • Alice sends Ax to referee • Bob sends Ay to referee • Referee computes A(x+y), uses compressed sensing recovery algorithm • If output of algorithm solves f(x,y), then # rows of A * # bits per measurement > CC(f)

A Unified View • General results: Direct-Sum Gap-l1 • ~(k / ε1/2) for p = 1 • ~(k / ε) for p = 2 • Sparse output: Indexing • ~(k/ε) for p = 1 • ~(k/ε2) for p = 2 • Deterministic: Equality • (k log(n/k) / ε) for p = 1 Tighter log factors achievable by looking at Gaussian channels

General Results: k = 1, p = 1 • Alice and Bob have x, y, respectively, in Rm • There is a unique i* for which (x+y)i* = d For all j  i*, (x+y)j2 {0, c, -c}, where |c| < |d| • Finding i* requires (m/(d/c)2) communication [SS, BJKS] • m = 1/ε3/2, c = ε3/2 , d = ε • Need (1/ε1/2) communication

General Results: k = 1, p = 1 • But the compressed sensing algorithm doesn’t need to find i* • If not then it needs to transmit a lot of information about the tail • Tail a random low-weight vector in {0, ε3/2, - ε3/2}1/ε3 • Uses distributional lower bound and RS codes • Send a vector y within 1-ε of tail in l1-norm • Needs 1/ε1/2 communication

General Results: k = 1, p = 2 • Same argument, different parameters • (1/ε) communication • What about general k?

Handling General k • Bounded Round Direct Sum Theorem [BR] (with slight modification) given k copies of a function f, with input pairs independently drawn from ¹, solving a 2/3 fraction needs communication (k¢CC¹ (f)) ε1/2 ε3/2, …, ε3/2 } k ε3/2, …, ε3/2 ε1/2 … ε1/2 ε3/2, …, ε3/2 Instance for p = 1

Handling General k • CC = (k/ε1/2) for p = 1 • CC = (k/ε) for p = 2 • What is implied about compressed sensing?

Rounding Matrices [DIPW] • A is a matrix of real numbers • Can assume orthonormal rows • Round the entries of A to O(log n) bits, obtaining matrix A’ • Careful • A’x = A(x+s) for “small” s • But s depends on A, no guarantee recovery works • Can be fixed by looking at A(x+s+u) for random u

Lower Bounds for Compressed Sensing • # rows of A * # bits per measurement > CC(f) • By rounding, # bits per measurement = O(log n) • In our hard instances, universe size = poly(k/ε) • So # rows of A * O(log (k/ε)) > CC(f) • # rows of A = ~(k/ε1/2) for p = 1 • # rows of A = ~(k/ε) for p = 2

Sparse-Output Results • Sparse output: Indexing • ~(k/ε) for p = 1 • ~(k/ε2) for p = 2

Sparse Output Results - Indexing What is xi? i 2 {1, 2, …, n} x 2 {0,1}n CC(Indexing) = (n)

(1/ε) Bound for k=1, p = 1 Generalizes to k > 1 to give ~(k/ε) Generalizes to p = 2 to give ~(k/ε2) y = ei x 2 {- ε, ε}1/ε • Consider x+y • If output is required to be 1-sparse must place mass on the i-th coordinate • Mass must be 1+ε if xi = ε, otherwise 1-ε

Deterministic Results • Deterministic: Equality • (k log(n/k) / ε) for p = 1

Deterministic Results - Equality Is x = y? y 2 {0,1}n x 2 {0,1}n Deterministic CC(Equality) = (n)

(k log(n/k) / ε) for p = 1 Choose log n signals x1, …, xlog n, each with k/ε values equal to ε/k x = Σi=1log n 10i xi Choose log n signals y1, …, ylog n, each with k/ε values equal to ε/k y = Σi=1log n 10i yi Consider x-y Compressed sensing output is 0n iff x = y

General Results – Gaussian Channels (k = 1, p = 2) • Alice has a signal x =ε1/2eifor random i 2 [n] • Alice transmits x over a noisy channel with independent • N(0, 1/n) noise on each coordinate • Consider any row vector a of A • Channel output = <a,x> + <a,y>, where <a,y> is N(0, |a|22/n) • Ei[<a,x>2] = ε |a|22/n • Shannon-Hartley Theorem: • I(i; <a,x>+<a,y>) = I(<a,x>; <a,x>+<a,y>) · ½ log(1+ ε) = O(ε)

Summary of Results • General results • £~(k/εp/2) • Sparse output • £~(k/εp) • Deterministic • £(k log(n/k) / ε) for p = 1

1+eps-Approximate Sparse Recovery

1+eps-Approximate Sparse Recovery

Presentation Transcript

Sparse Recovery Using Sparse (Random) Matrices

1eps-Approximate Sparse Recovery

Matrix Extensions to Sparse Recovery

Sparse Recovery Using Sparse (Random) Matrices

Chapter 12: Sample Surveys There is no recovery from poorly collected data!

1+eps-Approximate Sparse Recovery

Cs: compressed sensing

Image acquisition using sparse (pseudo)-random matrices

Fast and robust sparse recovery

Sparse Recovery ( Using Sparse Matrices)

Zhilin Zhang, Bhaskar D. Rao University of California, San Diego March 28, 2012

On the Power of Adaptivity in Sparse Recovery

Rahul Garg and Rohit Khandekar IBM T. J. Watson Research Center Yorktown Heights, NY, USA

Department ICEA

Monte-Carlo Planning II

CS 290H 26 October Sparse approximate inverses, support graphs

An Introduction To Compressive Sampling

Part 1

Finding a maximum independent set in a sparse random graph

Toro 1

E-N.G. Grylonakis , C.K. Filelis - Papadopoulos, G.A. Gravvanis

Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets