1 / 73

Fast Regression Algorithms Using Spectral Graph Theory

Fast Regression Algorithms Using Spectral Graph Theory. Richard Peng. Outline. Regression: why and how Spectra: fast solvers Graphs: tree embeddings. Learning / Inference. Find (hidden) pattern in (noisy) data. Input signal, s:. Output:. Regression. p ≥ 1: convex

Download Presentation

Fast Regression Algorithms Using Spectral Graph Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Regression Algorithms Using Spectral Graph Theory Richard Peng

  2. Outline • Regression: why and how • Spectra: fast solvers • Graphs: tree embeddings

  3. Learning / Inference Find (hidden) pattern in (noisy) data Input signal, s: Output:

  4. Regression • p ≥ 1: convex • Convex constraints e.g. linear equalities minimize Mininimize: |x|pSubject to: constraints on x

  5. Application 0: LASSO Ax Widely used in practice: • Structured output • Robust to noise [Tibshirani `96]:Min |x|1s.t.Ax = s

  6. Application 1: ImageS • MinΣi~j∈E(xi-xj-si~j)2 • Poisson image processing • No bears were harmed in the making of these slides

  7. Application 2: Min cut 0 0 1 1 s t • Min Σij∈E|xi-xj| • s.t.xs=0, xt=1 0 0 1 1 • Fractional solution = integral solution • Remove fewest edges to separate vertices s and t

  8. Regression Algorithms Convex optimization • 1940~1960: simplex, tractable • 1960~1980: ellipsoid, poly time • 1980~2000: interior point, efficient minimize • m = # non-zeros • Õ hides log factors • Õ(m1/2) interior steps

  9. Efficiency Matters • m > 106 for most images • Even bigger (109): • Videos • 3D medical data

  10. Key Subroutine Ax=b minimize • Õ(m1/2) Each step of interior point algorithms finds a step direction • Linear system solves

  11. More Reasons for Fast Solvers [Boyd-Vanderberghe `04], Figure 11.20: The growth in the average number of Newton iterations (on randomly generated SDPs)… is very small

  12. Linear System Solvers • [1st century CE] Gaussian Elimination: O(m3) • [Strassen `69] O(m2.8) • [Coppersmith-Winograd `90] O(m2.3755) • [Stothers `10] O(m2.3737) • [Vassilevska Williams`11] O(m2.3727) • Total: > m2

  13. Not fast  not used: • Preferred in practice: coordinate descent, subgradient methods • Solution quality traded for time

  14. Fast Graph Based L2 Regression[Spielman-Teng ‘04] More in 12 slides Ax=b Input: Linear system where A is related to graphs, b Output: Solution to Ax=b Runtime: Nearly Linear, Õ(m)

  15. Graphs Using Algebra Ax=b Fast convergence + Low cost per step = state of the art algorithms

  16. Laplacian Paradigm Ax=b [Daitch-Spielman `08]: mincostfow [Christiano-Kelner-Mądry-Spielman-Teng `11]: approx maximum flow /min cut

  17. Extension 1 [Chin-Mądry-Miller-P `12]: regression, image processing, grouped L2

  18. Extension 2 [Kelner-Miller-P `12]: k-commodity flow Dual: k-variate labeling of graphs t s

  19. Extension 3 [Miller-P `13]: faster for structured images / separable graphs

  20. Need: Fast Linear System Solvers Ax=b minimize Implication of fast solvers: • Fast regression routines • Parallel, work efficient graph algorithms

  21. Other Applications • [Tutte `66]: planar embedding • [Boman-Hendrickson-Vavasis`04]: PDEs • [Orecchia-Sachedeva-Vishnoi`12]: balanced cut / graph separator

  22. Outline • Regression: why and how • Spectra: Linear system solvers • Graphs: tree embeddings

  23. Problem Given: matrix A, vector b Size of A: • n-by-n • m non-zeros Ax=b

  24. Special Structure of A • A = Deg – Adj • Deg: diag(degree) • Adj: adjacency matrix Aij= deg(i) if i=j w(ij) otherwise ` [Gremban-Miller `96]: extensions to SDD matrices

  25. Unstructured Graphs • Social network • Intermediate systems of other algorithms are almost adversarial

  26. Nearly Linear Time Solvers[Spielman-Teng ‘04] Input: n by n graph LaplacianA with m non-zeros, vector b Where: b = Ax for some x Output: Approximate solution x’ s.t. |x-x’|A<ε|x|A Runtime: Nearly Linear. O(m logcn log(1/ε)) expected • runtime is cost per bit of accuracy. • Error in the A-norm: |y|A=√yTAy.

  27. How Many Logs Runtime: O(mlogcnlog(1/ ε)) Value of c: I don’t know  [Spielman]: c≤70 [Miller]: c≤32 [Koutis]: c≤15 [Teng]: c≤12 [Orecchia]: c≤6 When n = 106, log6n > 106

  28. Practical Nearly Linear Time Solvers[Koutis-Miller-P `10] Input: n by n graph LaplacianA with m non-zeros, vector b Where: b = Ax for some x Output: Approximate solution x’ s.t. |x-x’|A<ε|x|A Runtime: O(mlog2n log(1/ ε)) • runtime is cost per bit of accuracy. • Error in the A-norm: |y|A=√yTAy.

  29. Practical Nearly Linear Time Solvers[Koutis-Miller-P `11] Input: n by n graph LaplacianA with m non-zeros, vector b Where: b = Ax for some x Output: Approximate solution x’ s.t. |x-x’|A<ε|x|A Runtime: O(mlognlog(1/ ε)) • runtime is cost per bit of accuracy. • Error in the A-norm: |y|A=√yTAy.

  30. Stages of The Solver • Iterative Methods • Spectral Sparsifiers • Low Stretch Spanning Trees

  31. Iterative Methods Numerical analysis: Can solve systems in A by iteratively solving spectrally similar, but easier, B

  32. What is Spectrally Similar? A ≺ B ≺ kA for some small k • Ideas from scalars hold! • A ≺ B: for any vector x, |x|A2 < |x|B2 [Vaidya `91]: Since A is a graph, B should be too! [Vaidya `91]: Since G is a graph, H should be too!

  33. `Easier’ H • Ways of easier: • Fewer vertices • Fewer edges Can reduce vertex count if edge count is small Goal: H with fewer edges that’s similar to G

  34. Graph Sparsifiers Sparse equivalents of graphs that preserve something • Spanners: distance, diameter. • Cut sparsifier: all cuts. • What we need: spectrum

  35. What we need: ultraSparsifiers [Spielman-Teng `04]: ultrasparsifiers with n-1+O(mlogpn/k) edges imply solvers with O(mlogpn) running time. ` • Given: G with n vertices, m edgesparameter k • Output: H with n vertices, n-1+O(mlogpn/k) edges • Goal: G ≺ H ≺ kG `

  36. Example: Complete Graph O(nlogn) random edges (with scaling) suffice w.h.p.

  37. General Graph Sampling Mechanism • For edge e, flip coin Pr(keep) = P(e) • Rescale to maintain expectation Number of edges kept: ∑e P(e) Also need to prove concentration

  38. Effective Resistance • View the graph as a circuit • R(u,v) = Pass 1 unit of currentfrom u to v, measure resistance of circuit `

  39. EE101 • Effective resistance in general:solve Gx = euv, where euv is indicator vector, R(u,v) = xu – xv. `

  40. (Remedial?) EE101 • w1 • R(u, v) = 1/w1 ` • u • v • w1 • w2 • R(u, v) = 1/w1 + 1/w2 ` • u • v • Single edge: R(e) = 1/w(e) • Series: R(u, v) = R(e1) + … + R(el)

  41. Spectral Sparsification by Effective REsistance [Spielman-Srivastava `08]: Setting P(e) to W(e)R(u,v)O(logn) gives G ≺ H ≺ 2G* • [Foster `49]:∑e W(e)R(e) = n-1 • Spectral sparsifier with O(nlogn) edges • Ultrasparsifier? Solver??? • *Ignoring probabilistic issues

  42. The Chicken and Egg Problem How to find effective resistance? • [Spielman-Srivastava `08]: use solver • [Spielman-Teng `04]: need sparsifier

  43. Our Work Around • Use upper bounds of effective resistance, R’(u,v) • Modify the problem

  44. Rayleigh’s Monotonicity Law ` • Rayleigh’s Monotonicity Law: R(u,v) only increase when edges are removed Calculate effective resistance w.r.t. a tree T

  45. Sampling Probabilities According to Tree ` • Sample Probability: edge weight times effective resistance of tree path • stretch • Goal: small total stretch

  46. Good Trees Exist More in 12 slides (again!) Every graph has a spanning tree with total stretch O(mlogn) • Hiding loglogn • ∑e W(e)R’(e) = O(mlogn) • O(mlog2n) edges, too many!

  47. ‘Good’ Tree??? ` • Stretch = 1+1 = 2 Unit weight case:stretch ≥ 1 for all edges

  48. What Are We Missing? • Haven’t used k! ` • Need: • G≺ H ≺ kG • n-1+O(mlogpn/k) edges • Generated: • G≺ H ≺ 2G • n-1+O(mlog2n) edges `  

  49. Use k, somehow ` • G ≺ G’≺ kG • Tree is good! • Increase weights of tree edges by factor of k

  50. Result ` • Stretch = 1/k+1/k = 2/k • Tree heavier by factor of k • Tree effective resistance decrease by factor of k

More Related