Sparse Approximations

Sparse Approximations Nick Harvey University of British Columbia TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA

Approximating Dense Objectsby Sparse Objects Floor joists Wood Joists Engineered Joists

Approximating Dense Objectsby Sparse Objects Bridges Masonry Arch Truss Arch

Approximating Dense Objectsby Sparse Objects Bones Human Femur RobinBone

Mathematically • Can an object with many pieces be approximately represented by fewer pieces? • Independent random sampling usually does well • Theme of this talk:When can we beat random sampling? Dense Matrix Dense Graph Sparse Matrix Sparse Graph

Talk Outline • Vignette #1: Discrepancy theory • Vignette #2: Singular values and eigenvalues • Vignette #3: Graphs • Theorem on “Spectrally Thin Trees”

Discrepancy • Given vectors v1,…,vn2Rd with kvikp bounded.Want y2{-1,1}n with kiyivikqsmall. • Eg1: If kvik1·1 then Ekiyivik1 · • Eg2:If kvik1·1 then 9ys.t.kiyivik1· Spencer ‘85: Partial Coloring + Entropy Method Gluskin ‘89:Sidak’s Lemma Giannopoulos ‘97: Partial Coloring + Sidak Bansal ‘10: Brownian Motion + Semidefinite Program Bansal-Spencer ‘11: Brownian Motion + Potential function Lovett-Meka ‘12: Brownian Motion Non-algorithmic Algorithmic

Discrepancy • Given vectors v1,…,vn2Rd with kvikp bounded.Want y2{-1,1}n with kiyivikqsmall. • Eg1: If kvik1·1 then Ekiyivik1 · • Eg2:If kvik1·1 then 9ys.t.kiyivik1· • Eg3:If kvik1·¯, kvik1·±, and kivik1·1, then9y with kiyivik1 · Harvey ’13: Using Lovasz Local Lemma. Question:Can log(±/¯2) factor be improved?

Partitioning sums of rank-1 matrices 2 • Let v1,…,vn2Rd satisfy iviviT=I and kvik2·±.Want y2{-1,1}n with kiyiviviTk2small. • Random sampling: EkiyiviviTk2· . Rudelson’96: Proofs using majorizing measures, then nc-Khintchine • Marcus-Spielman-Srivastava ’13:9y2{-1,1}n with kiyiviviTk2· .

Partitioning sums of matrices • Given dxd symmetric matrices M1,…,Mn2Rd withiMi=I and kMik2·±.Want y2{-1,1}n with kiyiMik2small. • Random sampling: EkiyiMik2· Also follows from nc-Khintchine. Ahlswede-Winter’02: Using matrix moment generating function. Tropp ‘12: Using matrix cumulant generating function.

Partitioning sums of matrices • Given dxd symmetric matrices M1,…,Mn2Rd withiMi=I and kMik2·±.Want y2{-1,1}n with kiyiMik2small. • Random sampling: EkiyiMik2· • Question:9y2{-1,1}n with kiyiMik2· ? • Conjecture: SupposeiMi=I and kMikSch-1·±.9y2{-1,1}n with kiyiMik2· ? • MSS ’13: Rank-one case is true • Harvey ’13: Diagonal case is true (ignoring log(¢) factor) False!

Partitioning sums of matrices • Given dxd symmetric matrices M1,…,Mn2Rd withiMi=I and kMik2·±.Want y2{-1,1}n with kiyiMik2small. • Random sampling: EkiyiMik2· • Question:Suppose only thatkMik2·1. 9y2{-1,1}n with kiyiMik2· ? • Spencer/Gluskin: Diagonal case is true

Column-subset selection • Given vectors v1,…,vn2Rd with kvik2=1. Let st.rank=n/kiviviTk2. Let . 9y2{0,1}ns.t. iyi=k and (1-²)2·¸k(iyiviviT). Spielman-Srivastava ’09:Potential function argument Youssef ’12: Let . 9y2{0,1}ns.t. iyi=k,(1-²)2·¸k(iyiviviT) and¸1(iyiviviT)·(1+²)2.

Column-subset selectionup to the stable rank • Given vectors v1,…,vn2Rd with kvik2=1.Let st.rank=n/kiviviTk2. Let .Fory2{0,1}ns.t. iyi=k, can we control¸k(iyiviviT) and¸1(iyiviviT)? • ¸k can be very small, say O(1/d). • Rudelson’s theorem: can get ¸1·O(logd) and ¸k>0. • Harvey-Olver ’13: ¸1·O(logd /loglogd) and ¸k>0. • MSS ‘13: If iviviT=I, can get ¸1·O(1) and ¸k>0.

Graph Laplacian 5 10 Graph with weightsu: d c a 1 2 Effective Resistance fromstot: voltage difference when each edge eis a (1/ue)-ohm resistor and a 1-amp current source placed between s and t = (es-et)TLuy(es-et) Effective Conductance: cst = 1 / (effective resistance fromstot) b c d b a Laplacian Matrix: a negative of u(ac) b Lu = D-A = weighted degree of node c c d

Spectral approximation of graphs Edge weights u Edge weights w ®-spectral sparsifier:Lu¹Lw¹®¢Lu Lu= Lw=

Ramanujan Graphs • Suppose Lu is complete graph on n vertices (ue=18e). • Lubotzky-Phillips-Sarnak ’86:For infinitely many d and n, 9w2{0,1}E such thatewe=dn/2(actually Lw is d-regular)and • MSS ‘13: Holds for all d¸3, and all n=c¢2k. • Friedman ‘04: If Lw is a randomd-regular graph, then 8²>0 with high probability.

Spectrally-thin trees • Question: Let G be an unweighted graph with n vertices. Let C = mine (effective conductance of edge e).Want a subtreeT of G with . • Equivalent to • Goddyn’s Conjecture ‘85: There is a subtreeT with • Relates to conjectures of Tutte (‘54) on nowhere-zero flows,and to approximations of the traveling salesman problem.

Spectrally-thin trees • Question: Let G be an unweighted graph with n vertices. Let C = mine (effective conductance of edge e).Want a subtreeT of G with . • Rudelson’s theorem: Easily gives ®=O(logn). • Harvey-Olver ‘13:®=O(logn/loglogn).Moreover, there is an efficient algorithm to find such a tree. • MSS ’13: ®=O(1), but not algorithmic.

Spectrally Thin Trees Given an (unweighted) graph G with eff. conductances¸C. Can find an unweighted tree T with • Proof overview: • Show independent sampling gives spectral thinness, but not a tree. • ►Sample every edge e independently with prob. xe=1/ce • Show dependent sampling gives a tree, and spectral thinness still works.

Matrix Concentration Theorem: [Tropp ‘12]Let Y1,…,Ym beindependent, PSD matrices of size nxn.Let Y=iYi and Z=E[Y]. Suppose Yi¹R¢Z a.s. Then

Independent sampling Define sampling probabilities xe=1/ce. It is known that exe=n–1. Claim:Independent sampling gives TµE with E[|T|]=n–1and Theorem [Tropp ‘12]:Let M1,…,Mm be nxn PSD matrices. Let D(x) be a product distribution on {0,1}m with marginalsx. Let Suppose Mi¹Z. Then Define Me=ce¢Le. Then Z=LG and Me¹Z holds. Setting ®=6logn/loglogn, we get whp. But T is not a tree! Laplacianofthesingleedgee Properties of conductances used

Spectrally Thin Trees Given an (unweighted) graph G with eff. conductances¸C. Can find an unweighted tree T with • Proof overview: • Show independent sampling gives spectral thinness, but not a tree. • ►Sample every edge e independently with prob. xe=1/ce • Show dependent sampling gives a tree, and spectral thinness still works. • ►Run pipage rounding to get tree T with Pr[e2T]=xe=1/ce

Pipage rounding [Ageev-Svirideno ‘04, Srinivasan ‘01, Calinescu et al. ‘07, Chekuri et al. ‘09] LetP be any matroidpolytope.E.g., convex hull of characteristic vectors of spanning trees. Given fractional x Find coordinates a and bs.t. linezx+z(ea–eb) stays in current face Find two points where line leaves P Randomly choose one of thosepoints s.t. expectation is x Repeat until x=ÂT is integral x is a martingale: expectation of final ÂT is original fractional x. ÂT1 ÂT2 ÂT6 x ÂT3 ÂT5 ÂT4

Pipage rounding and concavity Sayf : Rm!R is concave under swapsifz!f(x+z(ea-eb)) is concave 8x2P, 8a,b2[m]. Let X0 be initial point and ÂT be final point visited by pipage rounding. Claim: If f concave under swaps then E[f(ÂT)]·f(X0). [Jensen] LetEµ{0,1}m be an event. Let g : [0,1]m!R be a pessimistic estimator for E, i.e., Claim: Suppose g is concave under swaps. Then Pr[ÂT2E]·g(X0).

Chernoff Bound • Chernoff Bound: Fix anyw,x2[0,1]m and let ¹=wTx. • Define . Then, • Claim:gt,µ is concave under swaps. [Elementary calculus] • Let X0be initial point and ÂTbe final point visited by pipage rounding. • Let ¹=wTX0. Then • Bound achieved by independent sampling also achieved by pipage rounding

Matrix Pessimistic Estimators Theorem [Tropp ‘12]:Let M1,…,Mm be nxn PSD matrices. Let D(x) be a product distribution on {0,1}m with marginalsx. Let Suppose Mi¹Z. Let Then and . Pessimistic estimator • Main Theorem:gt,µ is concave under swaps. • Bound achieved by independent sampling also achieved by pipage rounding

Spectrally Thin Trees Given an (unweighted) graph G with eff. conductances¸C. Can find an unweighted tree T with • Proof overview: • Show independent sampling gives spectral thinness, but not a tree. • ►Sample every edge e independently with prob. xe=1/ce • Show dependent sampling gives a tree, and spectral thinness still works. • ►Run pipage rounding to get tree T with Pr[e2T]=xe=1/ce

Matrix Analysis Matrix concentration inequalities are usually proven via sophisticated inequalities in matrix analysis Rudelson: non-commutative Khinchine inequality Ahlswede-Winter: Golden-Thompson inequalityif A, B symmetric, then tr(eA+B) ·tr(eAeB). Tropp: Lieb’s concavity inequality [1973]if A, B Hermitian and C is PD, thenz!trexp(A+log(C+zB)) is concave. Key technical result: new variant of Lieb’s theoremif AHermitian, B1, B2 are PSD, and C1, C2 are PD, thenz!trexp(A+log(C1+zB1)+log(C2–zB2)) is concave.

Questions Can Spencer/Gluskin theorem be extended to matrices? Can MSS’13 be made algorithmic? Can MSS’13 be extended to large-rank matrices? O(1)-spectrally thin trees exist. Can one be found algorithmically? Are O(1)-spectrally thin trees helpful for Goddyn’s conjecture?

Sparse Approximations