1 / 22

Adjusted R 2 , Residuals, and Review

Adjusted R 2 , Residuals, and Review. Adjusted R 2 Residual Analysis Stata Regression Output revisited The Overall Model Analyzing Residuals Review for Exam 2. Exercise Review. Use the caschool.dta dataseet

Download Presentation

Adjusted R 2 , Residuals, and Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adjusted R2, Residuals, and Review • Adjusted R2 • Residual Analysis • Stata Regression Output revisited • The Overall Model • Analyzing Residuals • Review for Exam 2

  2. Exercise Review • Use the caschool.dta dataseet • Run a model in Stata using Average Income (avginc) to predict Average Test Scores (testscr) • Examine the univariate distributions of both variables and the residuals • Walk through the entire interpretation • Build a Stata do-file as you go

  3. Exercise Review, continued

  4. Exercise Review, Continued

  5. Adjusted R2: An Alternative “Goodness of Fit” Measure • Recall that R2 is calculated as: • Hypothetically, as K approaches n, R2 approaches one (why?) – “degrees of freedom” • Adjusted R2 compensates for that tendency “explained sum of squares” “total sum of squares”

  6. Calculating Adjusted R2 • The bigger the sample size (n), the smaller • the adjustment • The more complex the model (the bigger K • is), the larger the adjustment • The bigger R2 is, the smaller the • adjustment

  7. Residual Analysis: Trouble Shooting • Conceptual use of residuals • e, or what the model can’t explain • Visual Diagnostics • Ideal: a “Sneeze plot” • Diagnostics using Residual Plots: • Checking for heteroscedasticity • Checking for non-linearity • Checking for outliers • Saving and Analyzing Residuals in Stata

  8. ei ei=0 X Review: Assumptions Necessary for Estimating Linear Models 1. Errors have identical distributions Zero mean, same variance, across the range of X 2. Errors are independent of X and other ei 3. Errors are normally distributed

  9. e Predicted Y The Ideal: Sneeze Splatter Problems: It is possible to “over-interpret” residual plots; it is also possible to miss patterns when there are large numbers of observations

  10. Problem: Standard errors are not constant; hypothesis tests invalid Heteroscedasticity e Predicted Y

  11. Problem: Biased estimated coefficients, inefficient model Non-Linearity e Predicted Y

  12. Residuals for model with outliers deleted Possible Outliers Checking for Outliers Residuals for model using all data e Predicted Y Problem: Under-specified model; measurement error

  13. Stata Regression Model: Regressing “testscr” onto “avginc”

  14. Regression Plot (again)

  15. Residual Plot

  16. Use the case ID number to find the relevant observation in the data set Examination of Residuals gsort e (or you can use “-e”) list observat testscr avginc yhat e in 1/5 . list observat testscr avginc yhat e in 1/5 +---------------------------------------------------+ observat testscr avginc yhat e --------------------------------------------------- 1. 393 683.4 13.567 650.8699 32.53016 2. 386 681.6 14.177 652.0157 29.5842 3. 419 672.2 9.952 644.0789 28.12111 4. 366 675.7 11.834 647.6143 28.08568 5. 371 676.95 12.934 649.6807 27.26921 +---------------------------------------------------+

  17. Residuals v. Predicted Values Using an “ocular test,” non-linearity seems probable, but heteroscedasticity is not obvious here. But should we trust our eyeballs?

  18. Formal Test for Non-linearity:Omitted Variables Tests whether adding 2nd, 3rd and 4th powers of X will improve the fit of the model: Y=b0+b1X+b2X2+b3X3+b4X4+e

  19. Formal Tests for Heteroscedasticity Tests to see whether the squared standardized residuals are linearly related to the predicted value of Y: std(e2)=b0+b1(Predicted Y)

  20. Case-wise Influence Analysis The Leverage versus Squared Residual Plot

  21. What to Do? • Nonlinearity • Polynomial regression: try X and X2 • Variable transformation: logged variables • Use non-OLS regression (curve fitting) • Heteroscedasticity • Re-specify model • Omitted variables? • Use non-OLS regression (WLS) • Use robust standard errors • Influential and Deviant Cases • Evaluate the cases • Run with controls (multivariate model) • Omit cases (last option)

  22. Next Week • Review regression diagnostics • Introduction to Matrix Algebra • Review for Exam

More Related