1 / 12

Variable Selection: Penalized Regression Last Time

Find the combination of variables that explains the most variability in the simplest possible model. Use automated procedures with caution, such as principal components. Understand how principal components can be used as explanatory variables in a regression model to predict ratings. Apply stepwise regression (mixed) and best subsets techniques for variable selection. Differentiate between AIC and BIC criteria. Learn how to handle insignificant terms in a model and the importance of parsimony and R2adj. Practice problem and lab assignment included.

emeans
Download Presentation

Variable Selection: Penalized Regression Last Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stat 324 – Day 25 Penalized Regression

  2. Last Time - Variable selection • Want to find the combination of variables that explains the most variability in the simplest possible model • Look for variables that explain a higher percentage of the remaining unexplained variation (partial correlation coefficients) • Can use automated procedures … with caution

  3. Principal components • Example: Have ranked communities on 9 variables. What best distinguishes the communities? • Climate and Terrain (higher scores are better) • Housing (lower scores are better) • Health Care & the Environment (higher) • Crime (lower scores are better) • Transportation (higher) • Education (higher) • The Arts (higher) • Recreation (higher) • Economics (higher)

  4. Example • The first principal component formula: • Could then be used as an explanatory variable in a regression model to predict rating • Second component can also be used with the bonus of being orthogonal to the first • *probably should standardize first

  5. Example • Here is how the original variable correlate with the first three principal components Five variables have a strong correlation with PC1 (communities with better housing tend to have better health etc.) PC1 is really about quality of arts PC2 is about health PC3 suggests places with high crime tend to also have better recreation facilities

  6. Stepwise Regression (Mixed)

  7. Best Subsets

  8. Last Time

  9. Last Time: AIC vs. BIC AIC BIC tyer: 322.4 te: 322.7 tye: 324.2 ter: 324.6 • tyer: 311.1 • tiyer: 311.9 • typer: 312.7 • tiyper: 313.9 The idea behind these measures is similar but BIC has a larger penalty for number of variables so tends to be a bit more conservative (often choosing smaller, less complex models)

  10. Other notes • Insignificant terms • Doesn’t really hurt to leave them in the model as long as you clarify that they are not significant • vs. Parsimony, R2adj • Could keep in by request of subject matter expert or for sake of completeness (e.g., lower order terms of polynomial, set of indicator variables, indicators in presence of interactions)

  11. Today • Another method, developed to deal with multicollinearity, is increasingly popular as a form of variable selection as well

  12. To Do • Practice problem • Wednesday/Thursday: Lab Assignment • Email Dr. Chance questions!

More Related