html5-img
1 / 20

Multiple Regression (SW Ch. 6)

Multiple Regression (SW Ch. 6). Omitted variable bias Causality and regression analysis Multiple regression and OLS Measures of fit Sampling distribution of the OLS estimator Multicollinearity. The Least Squares Assumptions. LSA #1: E ( u | X = x ) = 0.

Download Presentation

Multiple Regression (SW Ch. 6)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Regression (SW Ch. 6) • Omitted variable bias • Causality and regression analysis • Multiple regression and OLS • Measures of fit • Sampling distribution of the OLS estimator • Multicollinearity

  2. The Least Squares Assumptions

  3. LSA #1: E(u|X = x) = 0

  4. LSA #2: (Xi,Yi), i = 1,…,n are i.i.d. LSA #3: E(X4) < ∞ and E(Y4) < ∞

  5. ˆ b 1 Sampling Distribution of

  6. Measures of Fit

  7. Measures of Fit

  8. Measures of Fit: example

  9. Measures of Fit • Akaike’s Information Criterion (AIC) is an alternative method for adjusting the residual sum of squares for the sample size (n) and number of covariates (k) • Is the improved fit “worth” it?

  10. Example: caschool.dta . reg testscr str, rob Linear regression Number of obs = 420 F( 1, 418) = 19.26 Prob > F = 0.0000 R-squared = 0.0512 Root MSE = 18.581 ------------------------------------------------------------------------------ | Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671 _cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057 ------------------------------------------------------------------------------ . estat ic ----------------------------------------------------------------------------- Model | Obs ll(null) ll(model) df AIC BIC -------------+--------------------------------------------------------------- . | 420 -1833.296 -1822.25 2 3648.499 3656.58 -----------------------------------------------------------------------------

  11. Example: caschool.dta . reg testscr str el_pct, rob Linear regression Number of obs = 420 F( 2, 417) = 223.82 Prob > F = 0.0000 R-squared = 0.4264 Root MSE = 14.464 ------------------------------------------------------------------------------ | Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- str | -1.101296 .4328472 -2.54 0.011 -1.95213 -.2504616 el_pct | -.6497768 .0310318 -20.94 0.000 -.710775 -.5887786 _cons | 686.0322 8.728224 78.60 0.000 668.8754 703.189 ------------------------------------------------------------------------------ . estat ic ----------------------------------------------------------------------------- Model | Obs ll(null) ll(model) df AIC BIC -------------+--------------------------------------------------------------- . | 420 -1833.296 -1716.561 3 3439.123 3451.243 -----------------------------------------------------------------------------

  12. The Least Squares Assumptions for Multiple Regression

  13. “.” treated as +∞ in STATA . gen incq1 = 1 if avginc <10.639 (314 missing values generated) . replace incq1 = 0 if avginc>=10.639 & avginc < . (314 real changes made) . gen incq2 = 1 if avginc < 13.727 & avginc >=10.639 (316 missing values generated) . replace incq2 = 0 if avginc < 10.639 & avginc >= 13.727 & avginc < . (0 real changes made) . replace incq2 = 0 if avginc < 10.639 | (avginc >= 13.727 & avginc < .) (316 real changes made) . gen incq3 = 1 if avginc < 17.638 & avginc >=13.727 (315 missing values generated) . replace incq3 = 0 if avginc < 13.727 | (avginc >= 17.638 & avginc < .) (315 real changes made) . gen incq4 = 1 if avginc >= 17.638 & avginc < . (315 missing values generated) . replace incq4 = 0 if avginc < 17.638 (315 real changes made) . gen testdum = incq1 + incq2 + incq3 + incq4 . sum avginc inc* testdum Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- avginc | 420 15.31659 7.22589 5.335 55.328 incq1 | 420 .252381 .4348967 0 1 incq2 | 420 .247619 .4321441 0 1 incq3 | 420 .25 .4335291 0 1 incq4 | 420 .25 .4335291 0 1 -------------+-------------------------------------------------------- testdum | 420 1 0 1 1

  14. Dummy Variable Trap . reg testscr str incq1 incq2 incq3 incq4, robust note: incq3 omitted because of collinearity Linear regression Number of obs = 420 F( 4, 415) = 72.03 Prob > F = 0.0000 R-squared = 0.4468 Root MSE = 14.24 ------------------------------------------------------------------------------ | Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- str | -1.417963 .400663 -3.54 0.000 -2.205545 -.6303814 incq1 | -16.97711 1.953708 -8.69 0.000 -20.81751 -13.13672 incq2 | -6.795768 1.83231 -3.71 0.000 -10.39753 -3.194003 incq3 | (omitted) incq4 | 16.17749 1.880508 8.60 0.000 12.48098 19.87399 _cons | 683.929 8.136528 84.06 0.000 667.9351 699.923 ------------------------------------------------------------------------------ • Solution #1 is to … • Interpretation is then …

  15. Dummy Variable Trap . reg testscr str incq1 incq2 incq3 incq4, robust noconstant Linear regression Number of obs = 420 F( 5, 415) = . Prob > F = 0.0000 R-squared = 0.9995 Root MSE = 14.24 ------------------------------------------------------------------------------ | Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- str | -1.417963 .400663 -3.54 0.000 -2.205545 -.6303814 incq1 | 666.9519 7.862759 84.82 0.000 651.4961 682.4077 incq2 | 677.1333 7.931178 85.38 0.000 661.543 692.7236 incq3 | 683.929 8.136528 84.06 0.000 667.9351 699.923 incq4 | 700.1065 8.014253 87.36 0.000 684.3529 715.8601 ------------------------------------------------------------------------------ • Solution #2 is to … • Interpretation is then …

  16. The Sampling Distribution of the OLS Estimator in Multiple Reg

  17. Imperfect Multicollinearity

  18. Detection and Remedies for Imperfect Multicollinearity • Detection • calculate all the pairwise correlation coefficients • > .7 or .8 is some cause for concern • Variance Inflation Factors (VIFs) can be calculated • Hallmark is high R2 but insignificant t-statistics • Remedy • Do nothing • Drop a variable • Transform multicollinear variables • need to have same sign and magnitudes • Get more data (i.e., increase the sample size)

More Related