1 / 28

Anareg week11

Anareg week11. Regression diagnostics. Regression Diagnostics. Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s D, DFBETAS Variance inflation factor Tolerance. NKNW Example. NKNW p 3 89 , section 11 .1 Y is amount of life insurance

lis
Download Presentation

Anareg week11

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anareg week11 • Regression diagnostics

  2. Regression Diagnostics • Partial regression plots • Studentized deleted residuals • Hat matrix diagonals • Dffits, Cook’s D, DFBETAS • Variance inflation factor • Tolerance

  3. NKNW Example • NKNW p 389, section 11.1 • Y is amount of life insurance • X1 is average annual income • X2 is a risk aversion score • n = 18 managers

  4. Partial regression plots • Also called added variable plots or adjusted variable plots • One plot for each Xi

  5. Partial regression plots (2) • Consider X1 • Use the other X’s to predict Y • Use the other X’s to predict X1 • Plot the residuals from the first regression vs the residuals from the second regression

  6. Partial regression plots (3) • These plots can detect • Nonlinear relationships • Heterogeneous variances • Outliers

  7. Output Source DF F Value Pr > F Model 2 542.33 <.0001 Error 15 C Total 17 Root MSE 12.66267 R-Square 0.9864

  8. Output (2) Par St Var Est Err t Pr > |t| Int -205.72 11 -18.06 <.0001 income 6.288.20 30.80 <.0001 risk 4.738 1.3 3.44 0.0037

  9. Plot the residuals vs each Indep Variables • From the regression of Y on X1 and X2 we plot the residual against each of indep. Variable. • The plot of residual against X1 indicates a curvelinear effect. • Therefore, we need to check further by looking at the partial regression plot

  10. Plot the residuals vs Risk

  11. Plot the residuals vs income

  12. The partial regression plots • To generate the partial regression plots • Regress Y and X1 each on X2. • Get the residual from each regression namely e(Y|X2) and e(X1|X2) • Plot e(Y|X2) against e(X1|X2) • Do the same for Y and X2 each on X1.

  13. The partial regression plots (2)

  14. The partial regression plots(3)

  15. Residuals • There are several versions • Residuals ei = Yi – Ŷi • Studentized residuals ei / √MSE • Deleted residuals : di = ei / (1-hii) where hii is the leverage • Studentized deleted residuals • di * = di / s(di) • Where • Or equivalenly

  16. Residuals (2) • We use the notation (i) to indicate that case i has been deleted from the computations • X(i) is the X matrix with case i deleted • MSE(i) is the MSE with case i deleted

  17. Residuals (3) • When we examine the residuals we are looking for • Outliers • Non normal error distributions • Influential observations

  18. Hat matrix diagonals • hii is a measure of how much Yi is contributing to the prediction Yi(hat) • Ŷ1 = h11Y1 + h12 Y2 + h13Y3 + … • hii is sometimes called the leverage of the ith observation

  19. Hat matrix diagonals (2) • 0 < hii< 1Σhii = p • We would like hii to be small • The average value is p/n • Values far from this average point to cases that should be examined carefully

  20. Hat diagonals Hat Diag Obs H 1 0.0693 2 0.1006 3 0.1890 4 0.1316 5 0.0756

  21. DFFITS • A measure of the influence of case i on Ŷi • It is a standardized version of the difference between Ŷi computed with and without case i • It is closely related to hii

  22. Cook’s Distance • A measure of the influence of case i on all of the Ŷi’s • It is a standardized version of the sum of squares of the differences between the predicted values computed with and without case i

  23. DFBETAS • A measure of the influence of case i on each of the regression coefficients • It is a standardized version of the difference between the regression coefficient computed with and without case i

  24. Variance Inflation Factor • The VIF is related to the variance of the estimated regression coefficients • We calculate it for each explanatory variable • One suggested rule is that a value of 10 or more indicates excessive multicollinearity

  25. Tolerance • TOL = (1 – R2k) • Where R2k is the squared multiple correlation obtained in a regression where all other explanatory variables are used to predict Xk • TOL = 1/VIF • Described in comment on p 411

  26. Output (Tolerance) Variable Tolerance Intercept . income 0.93524 risk 0.93524

  27. Last slide • Read NKNW Chapter 11

More Related