bolasso model consistent lasso estimation through the bootstrap n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Bolasso : Model Consistent Lasso Estimation through the Bootstrap PowerPoint Presentation
Download Presentation
Bolasso : Model Consistent Lasso Estimation through the Bootstrap

Loading in 2 Seconds...

  share
play fullscreen
1 / 22
nguyet

Bolasso : Model Consistent Lasso Estimation through the Bootstrap - PowerPoint PPT Presentation

152 Views
Download Presentation
Bolasso : Model Consistent Lasso Estimation through the Bootstrap
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Bolasso: Model Consistent Lasso Estimation through the Bootstrap Bach-ICML08 Presented by Sheehan Khan to “Beyond Lasso” reading group April 9, 2009

  2. Outline • Follows the structure of the paper • Define Lasso • Comments on scaling the penalty • Bootstrapping/Bolasso • Results • A few afterthoughts on the paper • Synopsis of 2009 Extended Tech report • Discussion2

  3. Problem formulation • Standard Lasso formulation • New notation (consistent with ICML08) • Response vector (n samples) • Design matrix (n samples xp features) • Generative model

  4. How should we set μn? • Shows 5 mutual exclusive possibilities • implies • means we minimize implies is not a consistent estimate of

  5. How should we set μn? • slower than requires denotes the active set • faster thanloose the sparsifying effect of l1 penalty We saw similar arguments in Adaptive Lasso

  6. How should we set μn? • we can state: Prop1: Prop2: *Dependence on Q omitted in the body of paperbut appears in the appendix

  7. So what? • Props 1&2 tell us that asymptotically: • We have positive probability selecting the active features • We have vanishing probability of missing active features • We may or may not get additional non-active features based on the dataset • With many independent sets, the common features must be the active sets

  8. Bootstrap • In practice we do not get many datasets • We can use m bootstrap replications of the given set • For now we use pairs, later we will used centered residuals

  9. Bolasso

  10. Asymptotic Error • Prop3:Given that we have • Can be tightened if

  11. Results on Synthetic Data • 1000 samples • 16 features (first 8 active) • Average over 256 datasets • Force lasso (black) bolasso (red) lasso bolasso (m=128) m=2,4,8…256

  12. Results on Synthetic Data • 1000 samples • 16 features (first 8 active) • Average over 256 datasets • Force lasso (black) bolasso (red) lasso bolasso (m=128) m=2,4,8…256

  13. Results on Synthetic Data • 64 features (8 active) • Error is squared distance between sparsity pattern vectors averaged over 32 datasets lasso(black), bolasso(green), forward greedy(magenta), threshold LS(red), adaptive lasso(blue)

  14. Results on Synthetic Data • 64 samples • 32 features (8 active) • Bolasso-S has soft intersection (90%) MSE prediction 1.24???

  15. Results on UCI data MSE prediction

  16. Some thoughts • Why do they compare bolasso variable selection error to lasso, forward greedy, threshold LS, and adaptive lasso but then compare mean square prediction to lasso, ridge and bagging? • All these results have low dimensional data, we are interested in large amounts of features • This is considered in the 2009 tech. rep. • Based on the plots it seems that its best to use as large as possible (in contrast to Prop3) • Is there any insight to the size of positive constants which have a huge impact? • Based on the results it seems that we really want to use bolasso in the problems where we know this bound to be loose

  17. 2009 Tech Report • Main extensions • Fills in the math details omitted previously • Discusses bootstrap pairs vs. residuals • Proves both consistent in low dimensional data • Show empirical results favouring residuals in high dimensional data • New upper and lower bounds for selecting active components in low dimensional data • Propose similar method for high dimensions • Lasso with high regularization parameter • Then bootstrap within the supports • Discusses implementation details

  18. Bootstrap Recap • Previously we sampled uniformly from the given dataset with replacement to generate bootstrap set • Done in parallel • Bootstrapping can also be done sequentially • We saw this when reviewing Boosting

  19. Bootstrap Residuals • Compute residual errors based on lasso using the current dataset • Compute centered residuals • Create a new dataset from the pairs

  20. Synthetic Results in High Dimensional Data • 64 samples, 128 features (8 active)

  21. Varying Replications in High Dimensional Data lasso(black), bollasom={2,4,8,16,32,64,128,256}(red), m=512(blue)

  22. The End • Thanks for your attention and participation • Questions/Discussion???