html5-img
1 / 34

Methods Seminar: Heteroskedasticity & Autocorrelation

Methods Seminar: Heteroskedasticity & Autocorrelation. Kira R. Fabrizio Fall 2008. Today’s Agenda. Introduce Heteroskedasticity (H) and Autocorrelation (AC). For each: What is it? Why do we care? Why does it occur? What can we do about it?

holly
Download Presentation

Methods Seminar: Heteroskedasticity & Autocorrelation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Methods Seminar: Heteroskedasticity & Autocorrelation Kira R. Fabrizio Fall 2008

  2. Today’s Agenda • Introduce Heteroskedasticity (H) and Autocorrelation (AC). • For each: • What is it? • Why do we care? • Why does it occur? • What can we do about it? • Application: Data exercise for next class (Nov. 17th)

  3. What are H & AC? • Basic OLS model: • We assume: (or) (or)

  4. What are H & AC? • Basic OLS model: • We assume: Heteroskedasticity (or) (or)

  5. What are H & AC? • Basic OLS model: • We assume: (or) (or) Autocorrelation

  6. Why do we care? • OLS estimator no longer BLU • Heteroskedasticity: • OLS not the best estimator • Unbiased, but inefficient • Estimated SEs biased (too small if error variance increases w/ X) • Autocorrelation: • OLS not the best estimator • Unbiased, but inefficient • Will underestimate the true variance • SEs smaller than they should be • Reject H0 when you should not • BOTH: Invalid hypothesis testing.

  7. When is H likely? • Heteroskedasticity: • When variance of dependent variable varies across observations. • Ex: Savings as a function of income. • Average savings increases with income. • But variability of savings may also increase with income. • Omitted variable not correlated with included variables, but have differing order of magnitude for different (groups of) observations. • Ex: Cross-sectional data on units of different size (e.g. states, cities). Omitted variables may be larger for more populous states / cities.

  8. Example of H: Professor Pay • Data on 222 professors from 7 schools • Years of experience • Salary • Graph of salary versus years of experience • Graph of ln(salary) versus years of experience

  9. Example of H: Professor Pay • Estimate equation: Y = ln(salary) X= work experience Regression output Graph

  10. What can we do about it? • In general • Assume nothing is wrong, run OLS model • Examine the residuals • Plot squared residuals against the independent variable(s) – any evidence of a relationship? Noise?

  11. Dealing w/ Heteroskedasticity • Tests: • Breusch-Pagan test / White test • Glejser test • Harvey-Godfrey test • Corrections: • When σ is known • Whenσ is not known

  12. Dealing w/ Heteroskedasticity • Tests: • Breusch-Pagan test • Glejser test • Harvey-Godfrey test

  13. Test Procedure • Estimate by OLS, obtain residuals ei, i=1,…,n • Estimate linear regression of ei2(BP), |ei| (G), or ln ei2 (HG) on constant and vector of Z’s and compute R2 • Compute test statistic LM=nR2 for H0:α2=0,…, αL=0. [Chi-squared dist w/ L-1 df]

  14. Test Procedure • Example of BP test Z2=years, Z3=years squared • Example of White test Add Z4=years*years squared • In stata: estat hettest

  15. Dealing w/ Heteroskedasticity • Corrections when σi is known • Transform the model to get rid of the heteroskedasticity • Weighted least squares (GLS): • give less weight to the observations with greater variance and more weight to the observations with less variance. • Generates BLU estimators

  16. Dealing w/ Heteroskedasticity • Corrections when σi is not known • Run OLS, but correct the SE estimates: • White heteroskedasticity-consistent SEs • Corrects the var-cov matrix for differences in variance • Use “robust” option command in stata • If suspect variance based on groups (e.g. states, firms), use “cluster(var)” option in stata • Example comparison of results

  17. What are H & AC? • Basic OLS model: • We assume: (or) (or) Autocorrelation

  18. When is AC likely? • Autocorrelation: • Omitted variable • Mis-specified functional form (e.g. straight line fitted where curve should be) • Spatial or time pattern to the data • Ex: Observation at t correlated with t-1

  19. What can we do about it? • In general • Assume nothing is wrong, run OLS model • Examine the residuals • Plot residuals over time [or against the independent variable(s)] – any evidence of a relationship? Noise?

  20. Example: Patent data • Panel data, 546 technology classes for the 21 years 1980-2000 (11,466 obs). • # patents in class-year (#Patsk,t) • # university patents in class-year (#UnivPatk,t) • Average # citations to “science” in class-year (#Sciencek,t) • What is the relationship between the number of citations to “science” and the number of university patents in a tech class?

  21. Example: Patent data • Panel data: tsset the data in stata • Tech class fixed effects model • Examine residuals • Scatter1 • Scatter2 • Scatter3

  22. Regression Result • . xi: xtreg NumUnivPats NumPats avgscience i.appyear, fe i(ipc_num) • i.appyear _Iappyear_1980-2000 (naturally coded; _Iappyear_1980 omitted) • Fixed-effects (within) regression Number of obs = 11466 • Group variable (i): ipc_num Number of groups = 546 • R-sq: within = 0.1369 Obs per group: min = 21 • between = 0.1743 avg = 21.0 • overall = 0.1213 max = 21 • F(22,10898) = 78.58 • corr(u_i, Xb) = -0.4119 Prob > F = 0.0000 • ------------------------------------------------------------------------------ • NumUnivPats | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • NumPats | .0092758 .0002828 32.80 0.000 .0087214 .0098301 • avgscience | -.9008765 .0640646 -14.06 0.000 -1.026455 -.7752982 • _cons | -.7809801 .2910149 -2.68 0.007 -1.351422 -.2105379 • -------------+---------------------------------------------------------------- • sigma_u | 3.6099338 • sigma_e | 6.7349259 • rho | .22317918 (fraction of variance due to u_i) • ------------------------------------------------------------------------------ • F test that all u_i=0: F(545, 10898) = 3.92 Prob > F = 0.0000 • **Year variables excluded from output in the interest of space.

  23. Dealing w/ AC • Tests: • Durbin-Watson Statistic (estat dwatson) for 1st order serial correlation. • Breusch-Godfret test (estat bgodfrey) for higher order serial correlation. • In panel data: Wooldridge test for serial correlation (xtserial).

  24. Dealing w/ AC • Test: Durbin-Watson Statistic When N large, 1st 2 terms in numerator almost =, so • For strong positive AC, ρ=1 and DW=0 • For strong negative AC, ρ=-1 and DW=4 • For no autocorrelation, ρ=0 and DW=2

  25. Dealing w/ AC • Test for Panel Data: Wooldridge test for serial correlation (xtserial). Stata Output Wooldridge test for autocorrelation in panel data H0: no first order autocorrelation F( 1, 545) = 1096.592 Prob > F = 0.0000

  26. Dealing w/ AC • Corrections: • Transform model with estimated ρ, run FGLS • Durbin-Watson method • Cochran-Orcutt method

  27. Dealing w/ AC • Corrections: Transform model with estimated ρ • Durbin-Watson method: • Cochran-Orcutt method: with OLS, then obtain

  28. Dealing w/ AC • Corrections: • Prais Winston (GLS) estimation (prais in stata), with consistent errors in the presence of AR(1) serial correlation. • Has option for Cochrane-Orcutt option

  29. Prais Winston Regression • . prais NumUnivPats NumPats avgscience appyear* • Number of gaps in sample: 545 (gap count includes panel changes) • (note: computations for rho restarted at each gap) • Prais-Winsten AR(1) regression -- iterated estimates • Source | SS df MS Number of obs = 11466 • -------------+------------------------------ F( 22, 11443) = 43.46 • Model | 12933.8561 22 587.902549 Prob > F = 0.0000 • Residual | 154801.107 11443 13.5280177 R-squared = 0.0771 • -------------+------------------------------ Adj R-squared = 0.0753 • Total | 167734.963 11465 14.6301756 Root MSE = 3.678 • ------------------------------------------------------------------------------ • NumUnivPats | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • NumPats | .0075549 .0003296 22.92 0.000 .0069088 .008201 • avgscience | -.3079257 .0327598 -9.40 0.000 -.3721406 -.2437108 • _cons | -.7945959 .7144436 -1.11 0.266 -2.195028 .6058359 • -------------+---------------------------------------------------------------- • rho | .9753461 • ------------------------------------------------------------------------------ • Durbin-Watson statistic (original) 0.272618 • Durbin-Watson statistic (transformed) 1.378562

  30. Prais Winston Regression • . prais DNumUnivPats DNumPats Davgscience Dappyear* • Number of gaps in sample: 545 (gap count includes panel changes) • (note: computations for rho restarted at each gap) • Prais-Winsten AR(1) regression -- iterated estimates • Source | SS df MS Number of obs = 11466 • -------------+------------------------------ F( 22, 11443) = 43.45 • Model | 12833.4183 22 583.337194 Prob > F = 0.0000 • Residual | 153628.16 11443 13.4255143 R-squared = 0.0771 • -------------+------------------------------ Adj R-squared = 0.0753 • Total | 166461.578 11465 14.5191085 Root MSE = 3.6641 • ------------------------------------------------------------------------------ • DNumUnivPats | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • DNumPats | .0074327 .0003337 22.27 0.000 .0067785 .0080868 • Davgscience | -.3304883 .0333152 -9.92 0.000 -.3957917 -.2651849 • _cons | -2.61e-09 .3323198 -0.00 1.000 -.6514038 .6514038 • -------------+---------------------------------------------------------------- • rho | .9272602 • ------------------------------------------------------------------------------ • Durbin-Watson statistic (original) 0.323533 • Durbin-Watson statistic (transformed) 1.343161

  31. Application: Data • Panel data: Annual (1981-1999) data on 278 US electricity generating plants (5282 obs). • Inputs: fuel, employees • Output: Megawatt hours (Mwhs) of electricity • Plant characteristics: MW size, fuel (gas, coal, etc) • Goal: Estimate labor productivity (Mwhs per employee) at the plant, controlling for plant characteristics (MW size and fuel type) and year effects.

  32. Application: Data • Regression: • Expectations?

  33. Application: Assignment • Get to know the structure of the data. Do you expect heteroskedasticity and / or autocorrelation in this data? Why or why not? • TEST for the presence of each – what do your tests indicate? • Run regression (w/o correction) • Assuming H and/or AC are present, what impact are they having on your results? • Modify regression to deal with H / AC • How do results change? • How do you know whether you solved the problem(s)?

  34. Next Class • Please bring • Written notes from assignment • Print out of “log” from stata & graphs • THOUGHTS about what you did • QUESTIONS about this and other applications

More Related