1 / 13

Discussion of “Least Angle Regression” by Weisberg

Discussion of “Least Angle Regression” by Weisberg. Mike Salwan November 2, 2006 Stat 882. Introduction. “Notorious” problem of automatic model building algorithms for linear regression Implicit Assumption Replacing Y by something without loss of info Selecting variables Summary.

alexandria
Download Presentation

Discussion of “Least Angle Regression” by Weisberg

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discussion of “Least Angle Regression” by Weisberg Mike Salwan November 2, 2006 Stat 882

  2. Introduction • “Notorious” problem of automatic model building algorithms for linear regression • Implicit Assumption • Replacing Y by something without loss of info • Selecting variables • Summary

  3. Implicit Assumption • We have n x m matrix X and n-vector Y • P is the projection onto the column space • LARS assumes we can replace Y with Ŷ = PY, in large samples F(y|x) = F(y|x’β) • We estimate residual variance by • If this assumption does not hold, then LARS is unlikely to produce useful results

  4. Implicit Assumption (cont) • Alternative: let F(y|x) = F(y|x’B), where B is an m x d rank d matrix. The smallest d is called the structural dimension of the regression problem • The R package dr can be used to estimate d using methods such as sliced inverse regression • Find a smooth function that operates on a variable set of projections • Expanded variables from 10 to 65 in paper such that F(y|x) = F(y|x’β) holds

  5. Implicit Assumption (cont) • LARS relies too much on correlations • Correlation measures degree of linear association (obviously) • Requires linearity in conditional distributions of y and of a’x and b’x for all a and b, otherwise bizarre results can come • Any method replacing Y by PY cannot be sensitive to nonlinearity

  6. Implicit Assumption (cont) • Methods based on PY alone can be strongly influenced by outliers and high leverage cases • Consider • Estimate σ² by • Thus the ith term is given by: • Ŷi is the ith element of PY and hi is the ith leverage which is a diagonal element in P

  7. Implicit Assumption (cont) • From the simulation in the article, we can approximate the covariance term by , where ui is the ith diagonal of the projection matrix on the columns of (1,X) at the current step of the algorithm • Thus, • This is the same formula in another paper by Weisberg where is computed from LARS instead of a projection

  8. Implicit Assumption (cont) • The value of depends on the agreement between and ŷi, the leverage in the subset model and the difference in the leverage between the full and subset models • Neither of the latter two terms has much to do with the problem of interest (study of conditional distribution of y given x), but they are determined by the predictors only

  9. Selecting Variables • We want to decompose x into two parts xu and xa where xa represents the active predictors • We want the smallest xa such that F(y|x) = F(y|xa), often using some criterion • Standard methods are too greedy • LARS permits highly correlated predictors to be used

  10. Selecting Variables (cont) • Example to disprove LARS • Added nine new variables by multiplying original variables by 2.2, then rounding to the nearest integer • LARS method applied to both sets • LARS selects two of the rounded variables including one variable and its rounded variable (BP)

  11. Selecting Variables (cont) • Inclusion or exclusion depends on the marginal distribution of x as much as the conditional distribution of y|x • Ex: Two variables have a high correlation. • LARS selects one for its active set • Modify the other to make it now uncorrelated • Doesn’t change y|x, changes marginal of x • Could change set of active predictors selected by LARS or any method that uses correlation

  12. Selecting Variables (cont) • LARS results are invariant under rescaling, but not under reparameterization of related predictors • By first scaling predictors then adding all cross-products and quadratics, we get a different model if done other way around • This can be solved by considering them simultaneously, but this is self-defeating in terms of subset selection

  13. Summary • Problems gain notoriety because their solution is illusive but of wide interest • LARS nor any other automatic model selection considers the context of the problem • There seems to be no foreseeable solution to this problem

More Related