Translating the S-Plus/R Least Angle Regression package to Mata

Translating the S-Plus/R Least Angle Regression package to Mata Adrian Mander MRC-Human Nutrition Research Unit, Cambridge SUG London 2007 Least Angle Regression

Outline • LARS package • Lasso (the constrained OLS) • Forward Stagewise regression • Least Angle Regression • Translating Hastie & Efron’s code from R to Mata • The lars Stata command SUG London 2007 Least Angle Regression

Lasso Let y be the dependent variable and xj be the m covariates The usual linear predictor Want to minimise the squared differences Subject to this constraint, large t gives OLS solution N.B. Ridge regression does constraint on L2 norm SUG London 2007 Least Angle Regression

Lasso graphically • The constraints can be seen below. One property of this constraint is that there will be coefficients =0 for a subset of variables SUG London 2007 Least Angle Regression

Ridge Regression • The constraints can be seen below. The coefficients are shrunk but does not have the property of parsimony SUG London 2007 Least Angle Regression

Forward Stagewise • Using constraints • The function of current correlations is • Move the mean in the direction of the greatest correlation for some small ε • FORWARD STEPWISE is greedy and selects SUG London 2007 Least Angle Regression

Least Angle Regression • The LARS (S suggesting LaSso and Stagewise) • Starts like classic Forward Selection • Find predictor xj1 most correlated with the current residual • Make a step (epsilon) large enough until another predictor xj2 has as much correlation with the current residual • LARS – now step in the direction equiangular between two predictors until xj3 earns its way into the “correlated set” SUG London 2007 Least Angle Regression

Least Angle Regression Geometrically • Two covariates x1 and x2 and the space L(x1 ,x2) that is spanned by them x2 x2 • Start at μ0 =0 • y2 is the projection of y onto L(x1 ,x2) y2 y1 μ0 μ1 x1 SUG London 2007 Least Angle Regression

Continued… The current correlations only depend on the projection of y on L(x1 ,x2) I.e. y2 SUG London 2007 Least Angle Regression

Programming similarities The code comparing Splus to Mata looks incredibly similar SUG London 2007 Least Angle Regression

Programming similarities There are some differences though Array of arrays… beta[[k]] = array Indexing on the left hand side… beta[positive] = beta0 Being able to “join” null matrices. Row and column vectors are not very strict in Splus. Being able to use the minus sign in indexing beta[-positive] “Local”-ness of mata functions within mata functions? Local is from the first call of Mata Not the easiest language to debug when you don’t know what you are doing (thanks to statalist/Kit to push start me). SUG London 2007 Least Angle Regression

Stata command LARS is very simple to use lars y <varlist>, a(lar) lars y <varlist>, a(lasso) lars y <varlist>, a(stagewise) Not everything in the Splus package is implemented because I didn’t have all the data required to test all the code SUG London 2007 Least Angle Regression

Stata command SUG London 2007 Least Angle Regression

Graph output SUG London 2007 Least Angle Regression

Conclusions • Mata could be a little easier to use • Translating Splus code is pretty simple • Least Angle Regression/Lasso/Forward Stagewise are all very attractive algorithms and certainly an improvement over Stepwise. SUG London 2007 Least Angle Regression

Translating the S-Plus/R Least Angle Regression package to Mata