1 / 17

Model Selection and Estimation in Regression with Grouped Variables

Model Selection and Estimation in Regression with Grouped Variables. Remember…. Consider fitting this simple model: with arbitrary explanatory variables X 1 , X 2, X 3 and continuous Y.

halden
Download Presentation

Model Selection and Estimation in Regression with Grouped Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Model Selection and Estimation in Regression with Grouped Variables

  2. Remember….. • Consider fitting this simple model: with arbitrary explanatory variables X1, X2, X3 and continuous Y. • If we want to determine whether X1, X2, X3 are predictive of Y, we need to take into account the groups of variables derived from X1, X2, X3. • 2nd Example: ANOVA (dummy variables of a factor form the groups)

  3. Remember….. • Group LARS proceeds in two steps: • A solution path that is indexed by a tuning parameter λ is built. (Solution path is just a “path” of how the estimated coefficients move in space as a function of λ) 2) The final model is selected on the solution path by some “minimal risk” criterion.

  4. Notation • Model form: • Assume we have J factors/groups of variables • Y is (n x 1) • ε ~MVN(0, σ2) • pj is the number of variables in group j • Xj is (n x pj) design matrix for group j • βj is the coefficient vector for group j • Each Xj is centered/ortho-normalized and Y is centered.

  5. Remember….. Group LARS Solution Path Algorithm (Refresher): • Compute the current ‘most correlated set’ (A) by adding in the factor that maximizes the “correlation” between the current residual and the factor (accounting for factor size). • Move the coefficient vector (β) in the direction of the projection of our current residual onto the factors in (A). • Continue down this path until a new factor (outside (A)) has the same correlation as factors in (A). Add that new factor into (A). • Repeat steps 2-3 until we have no more factors that can be added to (A). • (Note: solution path is piecewise linear, so computationally efficient!)

  6. Cp Criterion (How to Select a Final Model) • In gaussian regression problems, an unbiased estimate of “true risk” is where . • When the full design matrix X is orthonormal, it can be shown that an unbiased estimate for “df” is: • Note the orthonormal Group LARS solution is:

  7. Degree-of-Freedom Calculation (Intuition) • When the full design matrix X is orthonormal, it can be shown that an unbiased estimate for “df” is: • Note the orthonormal Group LARS solution is: • The general formula for “df” is:

  8. Real Dataset Example • Famous Birthweight dataset from Hosmer/Lemeshow. • Y = Baby birthweight, 2 continuous predictors (Age/weight of mother), 6 categorical predictors. • For continuous predictors, use 3rd-order polynomials for “factors”. • For categorical predictors, use “dummy variables” excluding the final group. • 75%/25% train/test split. • Methods Compared: Group LARS, Backward Stepwise (LARS isn’t possible)

  9. Real Dataset Example Minimal Cp

  10. Real Dataset Example • Factors Selected: Group LARS: All factors except Number of Physician Visits during the First Trimester Backward Stepwise: All factors except Number of Physician Visits during the First Trimester & Mother’s Weight

  11. Real Dataset Example

  12. Simulation Example #1 • 17 random variables Z1, Z2,…, Z16, W were independently drawn from a Normal(0,1). • Xi = (Zi + W) / SQRT(2) • Y = X33 + X32 + X3 + (1/3)*X63 - X62 + (2/3)*X6 + ε • ε ~ N(0, 22) • Each simulation has 100 observations, 200 simulations. • Methods Compared: Group LARS, LARS, Least Squares, Backward Stepwise • All 3rd-order main effects are considered.

  13. Simulation Example #1

  14. Simulation Example #2 • 20 random variables X1, X2,…, X20 were generated as in Example #1. • X11, X12,…, X20 are trichotomized as 0, 1, or 2 if they are smaller than the 33rd percentile of a Normal(0,1), larger than the 66th percentile, or in between. • Y = X33 + X32 + X3 + (1/3)*X63 - X62 + (2/3)*X6 + 2 * I(X11 = 0) + I(X11 = 1) + ε • ε ~ N(0, 22) • Each simulation has 100 observations, 200 simulations. • Methods Compared: Group LARS, LARS, Least Squares, Backward Stepwise • All 3rd-order main effects/categorical factors are considered.

  15. Simulation Example #2

  16. Conclusion • Group LARS provides an improvement over the traditional backward stepwise selection + OLS, but still over-selects factors. • In the simulations, stepwise selection tends to under-select factors relative to Group LARS, and performs more poorly. • Simulation #1 suggests LARS over-selects factors because it enters individual variables into the model (and not the full factor). • Group LARS is also computationally efficient due to its piecewise linear solution path algorithm. • is the formula for the “correlation” between a factor j and the current residual r. May select factors if a couple derived inputs are predictive and the rest being redundant.

  17. EL FIN

More Related