1 / 19

Numerical Linear Algebra in the Streaming Model

Numerical Linear Algebra in the Streaming Model. Ken Clarkson - IBM David Woodruff - IBM. The Problems. Given n x d matrix A and n x d’ matrix B , we want estimators for The matrix product A T B The matrix X * minimizing ||AX-B||

lauracsmith
Download Presentation

Numerical Linear Algebra in the Streaming Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM

  2. The Problems Given n x d matrix A and n x d’ matrix B, we want estimators for • The matrix product AT B • The matrix X* minimizing ||AX-B|| • A slightly generalized version of least-squares regression • Given integer k, the matrix Ak of rank k minimizing ||A-Ak|| • We consider the Frobenius matrix norm: root of sum of squares

  3. General Properties of Our Algorithms • 1 pass over matrix entries, given in any order (allow multiple updates) • Maintain compressed versions or “sketches” of matrices • Do small work per entry to maintain the sketches • Output result using the sketches • Randomized approximation algorithms

  4. Matrix Compression Methods • In a line of similar efforts… • Element-wise sampling [AM01], [AHK06] • Sketching / Random Projection: maintain a small number of random linear combinations of rows or columns [S06] • Row / column sampling [DK01], [DKM04], [DMM08] • Usually more than 1 pass • Here: sketching

  5. Outline • Matrix Product • Regression • Low-rank approximation

  6. Approximate Matrix Product • A and B have n rows, we want to estimate ATB • Let S be an n x m sign (Rademacher) matrix • Each entry is +1 or -1 with probability ½ • m is small, to be specified • Entries are O(log 1/δ)-wise independent • Sketches are STA and STB • Our estimate of ATB is [ATS][STB]/m • Easy to maintain sketches given updates • O(m) time per update, O(mc log(nc)) bits of space for STA and STB • Output [ATS][STB]/m using fast rectangular matrix multiplication

  7. Expected Error and a Tail Estimate • Using linearity of expectation, E[ATSSTB/m] = ATE[SST]B/m = ATB • Moreover, for δ, ε > 0, there is m = O(log 1/ δ) ε-2so that Pr[||ATSSTB/m-ATB|| > ε ||A|| ||B||] ·δ (again ||C|| = [Σi, j Ci, j2]1/2) • This tail estimate seems to be new • Follows from bounding O(log 1/δ)-th moment of ||ATSSTB/m-ATB|| • Improves space of SarlÓs’ 1-pass algorithm by a log c factor

  8. Matrix Product Lower Bound • Our algorithm is space-optimal for constant δ • a new lower bound • Reduction from a communication game • Augmented Indexing, players Alice and Bob • Alice has random x 2 {0,1}s • Bob has random i 2 {1, 2, …, s} • also xi+1, …, xs • Alice sends Bob one message • Bob should output xi with probability at least 2/3 • Theorem [MNSW]: Message must be (s) bits on average

  9. Lower Bound Proof • Set s := £(cε-2log cn) • Alice makes matrix U • Uses x1…xs • Bob makes matrix U’ and B • Uses i and xi+1, …, xs • Alg input will be A:=U+U’ and B • A and B are n x c/2 • Alice: • Runs streaming matrix product Alg on U • Sends Alg state to Bob • Bob continues Alg with A := U + U’ and B • ATB determines xi with probability at least 2/3 • By choice of U, U’, B • Solving Augmented Indexing • So space of Alg must be (s) = (cε-2log cn) bits

  10. Lower Bound Proof • U = U(1); U(2); …, U(log(cn)); 0s • U(k)is an£(ε-2) x c/2 submatrix with entries in {-10k, 10k} • U(k)i, j = 10k if matched entry of x is 0, else U(k)i, j = -10k • Bob’s index i corresponds to U(k*)i*, j* • U’ is such that A = U+U’ = U(1); U(2); …, U(k*); 0s • U’ is determined from xi+1, …, xs • ATB is i*-throw of U(k*) • ||A|| ¼ ||U(k*)|| since the entries of A grow geometrically • ε2||A||2¢ ||B||2, the squared error, is small, so most entries of the approximation to ATB have the correct sign

  11. Linear Regression • The problem: minX ||AX-B|| • X* minimizing this has X* = A-B, where A- is the pseudo-inverse of A • The algorithm is • Maintain STA and STB • Return X’ solving minX ||ST(AX-B)|| • Main claim: if A has rank k, there is an m = O(kε-1log(1/δ)) so that with probability at least 1- δ,||AX’-B|| · (1+ε)||AX*-B|| • That is, relative error for X’ is small • A new lower bound (omitted here) shows our space is optimal

  12. Regression Analysis • Why should X’ be good? • First reduce to showing that ||A(X*-X’)|| is small • Use normal equation ATAX* = ATB • Implies ||AX’-B||2 = ||AX*-B||2 + ||A(X’-X*)||2

  13. Regression Analysis Continued • Bound ||A(X’-X*)||2 using several tools: • Normal equation (STA)T(STA)X’ = (STA)T(STB) • Our tail estimate for matrix product • Subspace JL: for m = O(kε-1log(1/δ)), STapproximately preserves lengths of all vectors in a k-space • Overall, ||A(X’-X*)||2 = O(ε)||AX*-B||2 • But ||AX’-B||2 = ||AX*-B||2 + ||A(X’-X*)||2

  14. Best Low-Rank Approximation • For any matrix A and integer k, there is a matrix Ak of rank k that is closest to A among all matrices of rank k. • LSI, PCA, recommendation systems, clustering • The sketch STA holds information about A • There is a rank k matrix Ak’ in the rowspace of STA so that ||A-Ak’|| · (1+ε)||A-Ak||

  15. Low-Rank Approximation via Regression • Why is there such an Ak’? Apply the regression results with A ! Ak, B ! A • The X’ minimizing ||ST(AkX-A)|| has ||AkX’-A||· (1+ ε)||Ak X*-A|| • But here X* = I, and X’ = (ST Ak)- STA • So the matrix AkX’ = Ak(STAk)-STA: • Has rank k • Is in the rowspace of STA • Is within 1+ εof the smallest distance of rank-k matrix • Problem: seems to require 2 passes

  16. 1 Pass Algorithm • Suppose R is a d x m sign matrix. By regression results transposed, the columnspace of AR contains a good rank-k approximation to A • X’ minimizing ||ARX-A|| has ||ARX’-A|| · (1+ ε)||A-Ak|| • Apply regression results with A ! AR and B ! A and X’’ minimizing ||ST(ARX-A)|| • So X’’=(STAR)-STA has ||ARX’’-A|| · (1+ ε)||ARX’-A|| · (1+ε)2||A-Ak|| • Algorithm maintains AR and STA, and computes ARX’’ = AR(STAR)-STA • Compute best rank-k approximation to ARX’’ in the columnspace of AR (ARX” has rank kε-2)

  17. Concluding Remarks • Space bounds are tight for product, regression • Get 1-pass and O(kε-2(n+dε-2)log(nd)) space for low-rank approximation. • First sublinear 1-pass streaming algorithm with relative error • Show our space is almost optimal via new lower bound (omitted) • Total work is O(Nkε-4), where N is the number of non-zero entries of A • In next talk, [NDT] give a faster algorithm for dense matrices • Improve the dependence on ε • Look at tight bounds for multiple passes

  18. A Lower Bound binary string x matrix A index i and xi+1, xi+2, … k ε-1 columns per block 0s k rows 0, 0 0, 0 0, 0 … … -1000, -1000 -1000, 1000 1000, 1000 … … 10000, -10000 -10000, -10000 -10000, -10000 … … 0, 0 0, 0 0, 0 … … … … … … … 0 0 … … … 10, -10 -10, 10 -10,-10 … … -100, -100 100, 100 100, 100 … … n-k rows Error now dominated by block of interest Bob also inserts a k x k identity submatrix into block of interest

  19. Lower Bound Details Block of interest: k ε-1 columns k rows 0s P*Ik 0s n-k rows * Bob inserts k x k identity submatrix, scaled by large value P Show any rank-k approximation must err on all of shaded region So good rank-k approximation likely has correct sign on Bob’s entry

More Related