1 / 25

Multiplicative updates for L1-regularized regression

Multiplicative updates for L1-regularized regression. Prof. Lawrence Saul Dept of Computer Science & Engineering UC San Diego (Joint work with Fei Sha & Albert Park ). Trends in data analysis. Larger data sets In 1990s : thousands of examples In 2000+ : millions or billions

johnwade
Download Presentation

Multiplicative updates for L1-regularized regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiplicative updates for L1-regularized regression Prof. Lawrence Saul Dept of Computer Science & Engineering UC San Diego (Joint work with Fei Sha & Albert Park)

  2. Trends in data analysis • Larger data sets • In 1990s : thousands of examples • In 2000+ : millions or billions • Increased dimensionality • High resolution, multispectral images • Large vocabulary text processing • Gene expression data

  3. How do we scale? • Faster computers: • Moore’s law is not enough. • Data acquisition is too fast. • Massive parallelism: • Effective, but expensive. • Not always easy to program. • Brain over brawn: • New, better algorithms. • Intelligent data analysis.

  4. Searching for sparse models • Less is more: Number of nonzero parameters should not scale with size or dimensionality. • Models with sparse solutions: • Support vector machines • Nonnegative matrix factorization • L1-norm regularized regression

  5. An unexpected connection • Different problems • large margin classification • high dimensional data analysis • linear and logistic regression • Similar learning algorithms • Multiplicative vs additive updates • Guarantees of monotonic convergence

  6. This talk I. Multiplicative updates • Unusual form • Attractive properties II. Sparse regression • L1 norm regularization • Relation to quadratic programming III. Experimental results • Sparse solutions • Convex duality • Large-scale problems

  7. Part I.Multiplicative updates Be fruitful and multiply.

  8. Nonnegative quadratic programming (NQP) • Optimization • Solutions • Cannot be found analytically. • Tend to be sparse.

  9. Matrix decomposition • Quadratic form • Nonnegative components - =

  10. Multiplicative update • Matrix-vector products By construction, these vectors are nonnegative. • Iterative update • multiplicative • elementwise • no learning rate • enforces nonnegativity

  11. Fixed points • vi = 0 When multiplicative factor is less than unity, element decays quickly to zero. • vi > 0 When multiplicative factor equals unity, partial derivative vanishes: (Av+b)i = 0.

  12. Attractive properties for NQP • Theoretical guarantees Objective decreases at each iteration. Updates converge to global minimum. • Practical advantages • No learning rate. • No constraint checking. • Easy to implement (and vectorize).

  13. Part II.Sparse regression Feature selection via L1 norm regularization…

  14. Linear regression • Training examples • vector inputs • scalar outputs • Model fitting • tractable: least squares • ill-posed: if dimensionality exceeds n

  15. Regularization • L2 norm • L1 norm What is the difference?

  16. L2 versus L1 • L2 norm • Differentiable • Analytically tractable • Favors small (but nonzero) weights. • L1 norm • Non-differentiable, but convex • Requires iterative solution. • Estimated weights are sparse!

  17. Reformulation as NQP • L1-regularized regression • Change of variables • Separate out +/- elements of w. • Introduce nonnegativity constraints.

  18. L1 norm as NQP change of variables These problems are equivalent!

  19. Why reformulate? • Differentiability Simpler to optimize a smooth function, even with constraints. • Multiplicative updates • Well-suited to NQP. • Monotonic convergence. • No learning rate. • Enforce nonnegativity.

  20. Logistic regression • Training examples • vector inputs • binary (0/1) outputs • L1-regularized model-fitting Solve optimization via multiple L1-regularized linear regressions.

  21. Part III.Experimental results

  22. Convergence to sparse solution Evolution of weight vector under multiplicative updates for L1-regularized linear regression.

  23. Primal-dual convergence • The convex dual of NQP is NQP! • Multiplicative updates can also solve dual. • Duality gap bounds intermediate errors.

  24. Large-scale implementation L1-regularized logistic regression on n=19K documents and d=1.2Mfeatures (70/20/10 split for train/test/dev)

  25. Discussion • Related work based on: • auxiliary functions • iterative least squares • nonnegativity constraints • Strengths of our approach: • simplicity • scalability • modularity • insights from related models

More Related