1 / 17

Multiplicative updates for the LASSO

Multiplicative updates for the LASSO. Morten Mørup and Line Harder Clemmensen Informatics and Mathematical Modeling Technical University of Denmark. Overview. Multiplicative updates (MU) Non-negative matrix factorization (NMF) Convergence of MU Semi-NMF MU for the LASSO

siyamak
Download Presentation

Multiplicative updates for the LASSO

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiplicative updates for the LASSO Morten Mørup and Line Harder Clemmensen Informatics and Mathematical Modeling Technical University of Denmark Morten Mørup

  2. Overview • Multiplicative updates (MU) • Non-negative matrix factorization (NMF) • Convergence of MU • Semi-NMF • MU for the LASSO • Results obtained analyzing a small and 2 large scale BioInformatics datasets Morten Mørup

  3. Multiplicative updates Step size parameter Morten Mørup

  4. Non-negative matrix factorization (NMF) (Lee & Seung - 2001) NMF gives Part based representation (Lee & Seung – Nature 1999) Morten Mørup

  5. Proof of convergence for =1 by auxiliary functions (Lee & Seung 2001) Morten Mørup

  6. Multiplicative updates can also be used for Semi-NMF (Ding et al. 2006) (A) MU (B) MUqp (Sha et al. 2003) Morten Mørup

  7. The least absolute shrinkage and selection operator LASSO (Tibshirani, 1996) Also known as basis pursuit denoising BPD (Chen et al. 1999) Morten Mørup

  8. The LASSO problem is in general highly over complete I x M YI x J XM x J = LASSO is based on a sparse coding principle / principle of parsimony – simplest solution also the best solution Morten Mørup

  9. LASSO by quadratic programming (Other approaches: LARS, Homotopy method (Drori et al. 2006 ), Danzig Selector (Friedlander and Saunders)) Morten Mørup

  10. This recast problem can naturally be solved by multiplicative updates Morten Mørup

  11. Multiplicative updates for the LASSO (A) MU (B) MUqp Morten Mørup

  12. Proof of convergence of updates (A) Bounds derived in (Ding et al. 2006) (B) Follows directly from proof given in (Sha et al 2003) Morten Mørup

  13. Small scale data set (M < J) Prostate cancer: The study examines the correlation between the level of specific prostate antigen and 8 clinical measures (M = 8). The clinical measures were taken on 97 men (J = 97) who were about to receive a radical prostatectomy. QP: Matlab standard QP solver BP: BPD algorithm from www.sparselab.stanford.edu Data taken from (Stamey et al., 1989) also used as example in (Hastie et al. 2001) Morten Mørup

  14. Large scale data sets (M >>J) Microarray data taken from (Pochet et al. 2004) Dataset 1 (Alon et al. 1999): Colon cancer 2000 genes (40 tumor, 22 normal, train:27/13 test: 13/9 ) Morten Mørup

  15. Dataset 2 (Hedenfalk et al. 2001): Breast cancer 3226 genes. BRCA1 mutation, BRCA2 mutation, and sporadic cases of breast cancer. We considered BRCA1 mutations from the tissues with BRCA2 mutations or sporadic mutations (7 tumor, 15 normal, train: 4/10 test: 3/5 ) Morten Mørup

  16. Conclusion • Multiplicative updates forms simple algorithms to solve for the LASSO and as such also generalizes to unconstrained optimization, i.e.  = +- -. • The updates are ensured to converge (in the sense of monotonically decreasing the objective function). • The MU-algorithms devised for the LASSO more stable than traditional QP solvers such as MATLAB’s standard QP-solver but not as fast as state of the art algorithms such as the solver given for BPD at www.sparselab.stanford.edu. • The MU-algorithms can easily be extended to the elastic net and fused lasso and forms a general optimization framework. Morten Mørup

  17. References A. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine, “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proc. Natl. Acad. Sci. USA, 1999. S.S. Chen, D.L. Donoho, and M.A. Saunders, “Atomic decomposition by basis pursuit,” SIAM J. Sci. Comp., vol. 20, no. 1, pp. 33–61, 1999. C. Ding, T. Li, and M.I. Jordan, “Convex and seminonnegative matrix factorizations,” LBNL Tech Report 60428, 2006. I. Drori and D.L. Donoho, “Solution of l1 minimization problems by lars/homotopy methods,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2006. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” Annals of Statistics, vol. 32, no. 2, pp. 407–499, 2004. Michael P. Friedlander and Michael A. Saunders ”DISCUSSION OF “THE DANTZIG SELECTOR” BY CAND`ES AND TAO” submitted annals of Statistics V. Guigue, A. Rakotomamonjy, and S. Canu, “Kerne basis pursuit,” European Conference on Machine Learning, Porto, 2005. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Springer, 2001. I. Hedenfalk, D. Duggan, Y. Chen, M. Radmacher, M. Bittner, R. Simon, P. Meltzer, B. Gutsterson, M. Esteller, M. Raffeld, Z. Yakhini, A. Ben-Dor, E. Dougherty, J. Kononen, L. Bubendorf, W. Fehrle, S. Pittaluga, S. Gruvberger, N. Loman, O. Johannsson, H. Olsson, B. Wilfond, G. Sauter, O.-P. Kallioniemi, A. Borg, and J. Trent, “Gene-expression profiles in hereditary breast cancer,” The New England Journal of Medicine, vol. 344, pp. 539–548, 2001. D.D. Lee and H.S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–91, 1999. D.D. Lee and H.S. Seung, “Algorithms for non-negativematrix factorization,” in Advances in Neural InformationProcessing Systems, 2000, pp. 556–562. M. Mørup and L.H. Clemmensen, “Mulasso,”http://www2.imm.dtu.dk/pubdb/views/edoc_download.php/5235/zip/imm5235.zip. N. Pochet, F. De Smet, A. K. Suykens, and L. R. De Moor Bart, “Systematic benchmarking of microarray data classification: assessing the role of nonlinearity and dimensionality reduction,” Bioinformatics, vol. 20, no. 17, pp. 3185–95, 2004. M.R. Osborne, B. Presnell, and B.A. Turlach, “A new approach to variable selection in least squares problems,” IMA Journal of Numerical Analysis, vol. 20, no. 3, pp. 389–403, 2000. S.C. Shaobing and D. Donoho, “Basis pursuit,” 28th Asilomar conf. Signals, Systems Computers, 1994. F. Sha, L.K. Saul, and D.D. Lee, “Multiplicative updates for nonnegative quadratic programming in support vector machines,” in Advances in Neural Information vProcessing Systems 15, 2002. T. Stamey, J. Kabalin, J. McNeal, I. Johnstone, H. Freiha, E. Redwine, and N. Yang, “Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate ii. radical prostatectomy treated patients,” Journal of Urology, vol. 16, pp. 1076–1083, 1989. R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996. R. Tibshirani and M.A. Saunders, “Sparsity and smoothness via the fused lasso,” J. R. Statist. Soc. B, vol. 67, no. 1, pp. 91–108, 2005. H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” J. R. Statist. Soc. B, vol. 67, no. 2, pp. 301–320, 2005. Morten Mørup

More Related