Geometric Functional Analysis in High-Dimensional Space Exploration

expander codes and pseudorandom subspaces of Rn James R. Lee University of Washington [joint with Venkatesan Guruswami (Washington) and Alexander Razborov (IAS/Steklov)]

Classical high-dimensional geometry [Kasin 77, Figiel-Lindenstrauss-Milman 77]: random sections of the cross polytope For a random subspace X µRN with dim(X) = N/2, (e.g. choose X = span {v1, …, vN/2} where viare i.i.d. on the unit sphere) In other words, every x 2 X has its L2 mass very “spread” out: This holds not only for each vi, but every linear combination

Classical high-dimensional geometry [Kasin 77, Figiel-Lindenstrauss-Milman 77]: random sections of the cross polytope For a random subspace X µRN with dim(X) = N/2, (e.g. choose X = span {v1, …, vN/2} where viare i.i.d. on the unit sphere)

This is a prominent example of the (now ubiquitous) use of the probabilistic method in asymptotic convex geometry. an existential crisis Geometric functional analysts face a dilemma we know well: Almost every subspace satisfies this property, but we can’t pinpoint even one. [Szarek, ICM06; Milman, GAFA01; Johnson-Schechtman, handbook01]asked: Can we find an explicit subspace on which the L1 and L2norms are equivalent? Related questions about explicit, high-dim. constructions arose (concurrently) in CS: - explicit embeddings of L2into L1 for nearest-neighbor search (Indyk) - explicit compressed sensing matrices M : RNRn for n ¿ N (Devore) - explicit Johnson-Lindenstrauss (dimension reduction) transform (Ailon-Chazelle) Why do analytists / CSists care about explicit high-dimensional constructions?

For a subspace X µRN, we define the distortion of Xby distortion By Cauchy-Schwarz, we always have N1/2¸(X) ¸ 1. Random construction: A random X µRN satisfies: dim(X) = (1-)Nand (X) = O(1). [Kasin 77] dim(X) = (N) and (X) ·1+. [Fiegel-Lindenstrauss-Milman 77] Example (Hadamard): Let X = ker(first N/2 rows of Hadamard), then (X) ¼ N1/4.

distortion dimension applications Compressive sensing Coding in characteristic zero, Geometric functional analysis Nearest-neighbor search View as an embedding: O(1) distortion, (N) dimension 1+ distortion, small blowup in dimension (Milman believes impossible) Want a map A : RNRn with n ¿ N, such that any r-sparse signal x 2RN (vector with at most r non-zero entries) can be uniquely and efficiently recovered from Ax. Relation to distortion: [Kashin-Temlyakov] Can uniquely and efficiently recover any r-sparse signal for r · N/(ker(A))2. (Even tolerates additional “noise” in the “non-sparse” parts of the signal.)

Want a map A : RNRn such that any r-sparse signal x 2RN (vector with at most r non-zero entries) can be uniquely and efficiently recovered from Ax. sensing and distortion Want to solve: Given compressed signal y, minimize ||x||0 subject to Ax = y. (P0) Highly non-convex optimization problem, NP-hard for general A. Basis Pursuit: Given compressed signal y, minimize ||x||1 subject to Ax = y. (P1) Can use linear programming! [Lots of work has been done here: Donoho et. al.; Candes-Tao-Romberg; etc.] [KT07]: If y = Av and v has at most N/[2 (ker(A))]2non-zero coordinates, then (P0) and (P1) give the same answer. let’s prove this

[KT07]: If y = Av and v has at most N/[2 (ker(A))]2non-zero coordinates, then (P0) and (P1) give the same answer. sensing and distortion For x 2RN and S µ [N], let xS be x restricted to coordinates in S. If x 2 ker(A) and

Sub-linear dimension: previous results: explicit Rudin’60 (and later LLR’94) achieve dim(X) ¼ N1/2 and (X) · 3 (X = span {4-wise independent vectors}) Indyk’00 achieves dim(X) ¼ exp((log N)1/2) and (X) = 1+o(1). Indyk’07 achieves dim(X) ¼ N/2(log log N)2 and (X) = 1+o(1). Our result: We construct an explicit subspace X µRN with dim(X) = (1-o(1))N and In our constructions, X = ker(explicit sign matrix).

Partial derandomization: previous results: derandomization Let Ak, Nbe a random k £ N sign matrix (entries are ±1 i.i.d) Kashin’s technique shows that almost surely, (and dim(ker(Ak, N)) ¸ N – k) Can reduce to O(N log2 N) random bits [Indyk 00] Can reduce to O(N log N) random bits [Artstein-Milman 06] Can reduce to O(N) random bits [Lovett-Sodin 07] Our result: With No(1) random bits, we get (X) · polylog(N). With N random bits for any >0, we get (X) = O(1). [Guruswami-L-Wigderson]

d j À n N G = ([N], [n], E) - bipartite graph, d-right-regular and L µRd a subspace. the expander code construction where xS2R|S| is x restricted to the coordinates in S µ [N] and (j) is the neighborhood of j. x1 x2 x3 Resembles construction of Gallager, Tanner (L is the “inner” code). Following Tanner and Sipser-Spielman, we will show that if L is “good” and G is an “expander” then X(G,L) is even better (in some parameters). xN

Say that a subspace L µRd is (t, )-spread if every x 2 L satisfies some quantitative matters If L is ((d), )-spread, then Conversely, if L has (L) = O(1), then L is ((d),(1))-spread. For a bipartite graph G = ([N],[n],E), the expansion profile of Gis (This is expansion from left to right.)

Setup: spread-boosting theorem G = ([N], [n], E) - bipartite graph, d-right-regular and left degree · D. L µRda(t, )-spread subspace. Conclusion: If X(G,L) is (T, )-spread, then X(G,L) is How to apply: Assume D=O(1) and G(q) = (q) 8q 2 [N] (impossible to achieve) X(G,L) is (½, 1)-spread )(t, )-spread )(t2, 2)-spread … )((N), logt(N))-spread )(X(G,L)) . (1/)logt(N)

Setup: spread-boosting theorem G = ([N], [n], E) - bipartite graph, d-right-regular and left degree · D. L µRda(t, )-spread subspace. Conclusion: If X(G,L) is (T, )-spread, then X(G,L) is S should “leak” L2 mass outside (since L is spreading and G is an expander), unless most of the mass in S is concentrated on a small subset B (impossible by assumption) S B

Let H be a (non-bipartite) d-regular graph with second eigenvalue  = O(d1/2). Let G be the edge-vertex incidence graph (an edge is connected to its endpoints) when L is random (explicit constructions exist by Margulis, Lubotsky-Phillips-Sarnak) Alon-Chung: nodes of H edges of H Random subspace L µRd is ((d), (1))-spread Letting d = N1/4, the spread-boosting thm gives X(G,L) is (T,)-spread )X(G,L) is Takes O(log log N) steps to reach (N)-sized sets )poly(log N) distortion.

Spectral Lemma: explicit construction: ingredients for L Let A be any k £ d matrix whose columns a1, …, ad2Rkare unit vectors and such that for every i  j, |hai, aji|·. Then ker(A) is + Kerdock codes (aka Mutually Unbiased Bases) [Kerdock’72, Cameron-Seidel’73] ((d1/2), (1))-spread subspaces of dimension (1-)d for every eps>0

Kerdock + Spectral Lemma gives ((d1/2), (1))-spread subspaces of dimension (1-)d for every eps>0 boosting L with sum-product expanders Problem: If G=Ramanujan construction and L=Kerdock, the spread-boosting theorem gives nothing. (Ramanujan loses d1/2 and Kerdock gains only d1/2) Solution: Produce L’ = X(G,L) where L=Kerdock and G=sum-product expander Sum-product theorems [Bourgain-Katz-Tao, …] For A µFp, with |A| · p0.99 we have

Kerdock + Spectral Lemma gives ((d1/2), (1))-spread subspaces of dimension (1-)d for every eps>0 boosting L with sum-product expanders Problem: If G=Ramanujan construction and L=Kerdock, the spread-boosting theorem gives nothing. (Ramanujan loses d1/2 and Kerdock gains only d1/2) Solution: Produce L’ = X(G,L) where L=Kerdock and G=sum-product expander Using [Barak-Impagliazzo-Wigderson/BKSSW] and the spread-boosting theorem, L’ is (d1/2+c, (1))-spread for some c > 0.

boosting L with sum-product expanders Now we can plug L’ into G=Ramanujan and get non-trivial boosting. (almost done…) Solution: Produce L’ = X(G,L) where L=Kerdock and G=sum-product expander Using [Barak-Impagliazzo-Wigderson/BKSSW] and the spread-boosting theorem, L’ is (d1/2+c, (1))-spread for some c > 0.

Improve the current bounds: • First attempt would be O(1) distortion with sub-linear randomness. some open questions Improve dependence on the co-dimension (important for compressed sensing) If dim(X) ¸ (1-)N, we get distortion dependence (1/)O(log log N). Could hope for . • Stronger pseudorandom properties:Restricted Isometry Property • [T. Tao’s blog] Find an explicit collection of unit vectors v1, v2, …, vN2Rn with N À n so that every small enough sub-collection is “nearly orthogonal.” • Breaking the diameter bound: • Show that the kernel of a random {0,1} matrix with only 100 ones per • row has small distortion. Or prove that sparse matrices cannot work.

Refuting random subspaces with high distortion • Give efficiently computable certificates for (X) small or Restricted Isometry • Property which exist almost surely for random X µRN. some open questions • Linear time expander decoding? • Are their recovery schemes that run faster than Basis Pursuit?

Geometric Functional Analysis in High-Dimensional Space Exploration

Geometric Functional Analysis in High-Dimensional Space Exploration

Presentation Transcript

Modeling and Estimation of Dependent Subspaces

Word Expander

Pseudorandom Number Generation

Krylov Subspaces

6.1 Vector Spaces and Subspaces

SUBSPACES and BASES

Pseudorandom Generators

Lambertian Reflectance and Linear Subspaces

Graph Codes and Expander Graphs

Pseudorandom Number Generators

Face Recognition and Feature Subspaces

EXPANDER GRAPHS

Error-Correcting Codes and Pseudorandom Projections

Linear Subspaces - Geometry

Face Recognition and Feature Subspaces

Text expander

Error-Correcting Codes and Pseudorandom Projections

Expander Graphs, Randomness Extractors and List-Decodable Codes

Face Recognition and Feature Subspaces