1 / 54

Heteroskedasticity

Heteroskedasticity. Objectives. What is heteroskedasticity? What are the consequences? How is heteroskedasticity identified? How is heteroskedasticity corrected?. Main empirical m odel for Unit 10 : foodexp i =  0 +  1 income i +  i. foodexp: Family food expenditure

kaden-cole
Download Presentation

Heteroskedasticity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Heteroskedasticity Objectives • What is heteroskedasticity? • What are the consequences? • How is heteroskedasticity identified? • How is heteroskedasticity corrected? ECON 7710, 2010

  2. Main empirical model for Unit 10: foodexpi = 0 + 1incomei + i. foodexp: Family food expenditure income: Family income Least squares estimates, US data (UE_Tab0301) Is this the best estimated equation? ECON 7710, 2010

  3. billion million 1. The Nature of Heteroskedasticity In a regression about firms, for the same mistake, ECON 7710, 2010

  4. Heteroskedasticity is a problem that occurs when the error term does not have a constant variance. CLRM: Each error term comes from the same probability distribution. Assumption CLRM.5 is violated! ECON 7710, 2010

  5. cov(i, j|X1i,X2i,X1j,X2j) = 0 no autocorrelation: i = j Regression Model Yi = b0 + b1X1i + b2X2i + i E(i|X1i,X2i) = 0 zero mean: var(i|X1i,X2i) = s2 homoskedasticity: ECON 7710, 2010

  6. Identical distributions for observations i and j Distribution for i Distribution for j ECON 7710, 2010

  7. Y f(Y) . . . . X1 X2 X3 X4 0 X HomoskedasticityYi = 0 + 1Xi + ivar(i|Xi) = s2 for all i Conditional Distribution ECON 7710, 2010

  8. HeteroskedasticityYi = 0 + 1Xi + ivar(i|Xi) = si2 for all i Conditional Distribution ECON 7710, 2010

  9. ECON 7710, 2010

  10. ECON 7710, 2010

  11. Pure heteroskedasticity Different variances of the error term. Correctly specified PRF. Impure heteroskedasticity Different variances of the error term. Specification error. ECON 7710, 2010

  12. 2. Detecting Heteroscedasticity 2.1 Graphical Method Plotting foodexp against income (for one regressor) Example 1: Food expenditure, US Data (UE_Tab0301) ECON 7710, 2010

  13. Example 1: Food expenditure, US Data, UE_Tab0301 Plotting e against income. Plotting e2 against income. ECON 7710, 2010

  14. Example 2: textbook data, (Woody3) ECON 7710, 2010

  15. 3.2 Park Test Model Yi = 0 + 1X1i + … + KXKi + t i = 1,…,N (*) Suppose it is suspected that var(i) depends on Zi in the form of var(i) = i2= 2Zi1evi lni2 = ln2 + 1lnZki + vi Ho: 1 = 0 (Homoskedastic errors); HA: 1  0(Heteroskedastic errors). ECON 7710, 2010

  16. Step 1:Estimate the equation (*) with OLS and obtain the residuals. Step 2: Regress the natural log of squared residuals on the natural log of a possible proportionality factor ln(ei2) = 0 + 1lnZi + vi where vi is an error term satisfying all classical assumptions. ECON 7710, 2010

  17. Step 3 If the coefficient of lnZ is significantly different from zero, then it would suggest that there is heteroscedastic pattern in the residuals with respect to Z. Otherwise, homoscedastic errors cannot be rejected. Example 3: Park Test: US data (UE_Tab0301) ^ ln(e2) = -7.46 + 2.07** ln(income) t (2.28) p-value (0.0284) ECON 7710, 2010

  18. Advantages of the Park test: • The test is simple. • It provides information about the variance structure. • Limitations of the Park test: • The distribution of the dependent variable is problematic. • It assumes a specific functional form. • It does not work when the variance depends on two or more variables. • The correct variable with which to order the observations must be identified first. • It cannot handle partitioned data. ECON 7710, 2010

  19. 3.3 White’s Test Model Yi = 0 + 1X1i + 2X2i + i i = 1,…,N (*) Suppose it is suspected there may be heteroskedasticity but we are not sure of its functional form. Ho: The conditional variance of iis constant. HA: The conditional variance of iis not constant. ECON 7710, 2010

  20. Step 1:Estimate the equation (*) with OLS and obtain the residuals. Step 2: Regress the squared residuals on all explanatory variables, all cross product terms and the square of each explanatory variable. ei2 = 0 + 1X1i + 2X2i + 3X1i2 + 4X2i2 + 5X1iX2i + vi ECON 7710, 2010

  21. Step 3: Test the overall significance of the equation in Step 2. (df = number of regressors) Statistic = NR2white ~ 2df Critical value (cv) = 2df, Reject the hypothesis of homoskedasticity if NR2err > cv. Example 4: White test: US data (UE_Tab0301) ^ e2 = 1924 – 7.4 income + 0.0088income2* R2 = 0.3646, N = 40, NR2 = 14.58 cv = 2(2, 0.01) = 9.21. ECON 7710, 2010

  22. Advantages of the White test: a. It does not assume a specific functional form. b. It is applicable when the variance depends on two or more variables. • Limitations of the White test: • It is an large-sample test. • It provides no information about the variance structure. • It loses many degrees of freedom when there are many regressors. • It cannot handle partitioned data. • It also captures specification errors. ECON 7710, 2010

  23. 3. Consequences of Heteroskedasticity If heteroskedasticity appears but OLS is used for estimation, how are the OLS estimates affected? Unaffected: OLS estimators are still linear and unbiased because, on average, overestimates are as likely as underestimates. ECON 7710, 2010

  24. 3.1 OLS estimators are inefficient. Some fluctuations of the error term are attributed to the variation in independent variables. There are other linear and unbiased estimators that have smaller variances than the OLS estimator. ECON 7710, 2010

  25. 3.2 Unreliable Hypothesis Testing  unreliable testing conclusion ECON 7710, 2010

  26. 4. Remedies 4.1 Heteroskedasticity-Corrected Standard Errors Yi = b0 + b1X1i + b2X2i + i var(i) = si2 heteroskedasticity: OLS estimators are unbiased. The standard errors of OLS are biased. ECON 7710, 2010

  27. A heteroskedasticity-consistent (HC) standard error of an estimated coefficient is a standard error of an estimated coefficient adjusted for heteroskedasticity. a. HC standard errors are consistent for any type of heteroskedasticity. b. Hypothesis tests are valid with HC standard errors in large samples. c. Typically, HC se > OLS se ECON 7710, 2010

  28. Example 5: Yi = 0 + 1Xi + i, var(i|Xi) = i. incorrect variance formula: correct variance formula: ECON 7710, 2010

  29. HC estimator of the variance of the slope coefficient in a simple regression model Example 6: HCStandard Errors, US data (UE_Tab0301) ECON 7710, 2010

  30. E(i) = 0 cov(t, s) = 0 t=s var(i) = si2 4.2 Weighted Least Squares Yi = b0 + b1X1i + b2X2i + i The variance is assumed to be proportional to the value of Zi2 si2 = cZi 2 ECON 7710, 2010

  31. Step 1: Decide which variable is proportional to the heteroskedasticity. Step 2: Divide all terms in the original model by that variable (divide by Zi ). ECON 7710, 2010

  32. Step 3: Run least squares on the transformed model which has new variables. Note that the transformed model have an intercept only if Z is one of the explanatory variables. For example, if Zi = X2i, then ECON 7710, 2010

  33. Example 7: WLS: US data (UE_Tab0301) What are values of the estimated coefficients of the original model? Has the problem of heteroskedasticity solved? ECON 7710, 2010

  34. Comparing different estimates: US data (UE_Tab0301) The WLS estimates have improved upon those of OLS. ECON 7710, 2010

  35. Other possibilities • var(i) = cZi • var(i) = cZi • var(i) = c(a1X1i + a2X2i) ECON 7710, 2010

  36. In large samples HC standard errors areconsistent measures for any type of heteroscedasticity. CI & t-test are valid. ECON 7710, 2010

  37. 4.3 Re-specifying the Regression Model The heteroskedasticity may be impure. 4.3.1 Use another functional form E.g., Double-log: Less variation Example 8: US data (UE_Tab0301) The hypothesis of constant variance can be rejected. ECON 7710, 2010

  38. Example 9: India data (Food_India55) Empirical model: foodexpi = 0 + 1totexpi + i. The hypothesis of homoskedasticity can be rejected by the Park and White tests. ECON 7710, 2010

  39. Which model is the best? Double-log HC WLS ECON 7710, 2010

  40. 4.3.2 Other reformulations E.g., take average of variables related to the size of observed units, adding more variables Example 10: Data set “Concert” The concert tour of a singer in the US revenue = 0 + 1adv + 2stad + 3cd + 4radio + 5weekend + . ECON 7710, 2010

  41. (1) (2) (3) ECON 7710, 2010

  42. Remarks: • The variable Z is difficult to identify.The functional relationship between the error and Z is not known. Use WLS at last. • With correct WLS, we expect the standard errors of the regression coefficients will be smaller than the OLS counterparts. • A log transformation usually reduces the degree of heteroskedasticity. • The hypothesis of homoskedasticity should not be rejected in the new model. ECON 7710, 2010

  43. 5. A Complete Example Sources: Section 8.2.2 (pp. 255 – 256) Section 10.5 (pp. 369 – 376) Empirical regression model pconi = 0 + 1regi + 2taxi +3uhmi + i. pconi1: petroleum consumption in the ith state regi : motor vehicle registrations in the ith state (‘000) taxi : the gasoline tax rate in the ith state(cents per gallon) uhm : urban highway miles wihtin the ith state ECON 7710, 2010

  44. ^ pcon = 389.57*** – 0.061reg – 36.47***tax + 60.76***uhm se, vif (0.04, 24.3) (13.15, 1.1) (10.26, 24.9) Adj. R2 = 0.9192, N = 50. ^ pcon = 551.69*** + 0.19***reg – 53.59***tax se (0.012) (16.86) Adj. R2 = 0.8607, N = 50. Equation 1 Equation 2 ECON 7710, 2010

  45. Graphical investigation ECON 7710, 2010

  46. ^ ln(e2) = 1.65 + 0.95***ln(REG) R2 = 0.1657, N = 50 se (0.3083) ^ e2 = 11,098,291 + 140REG – 0.0005REG2 – 12.84REGTAX – 237,873TAX + 12347TAX2. R2 = 0.6645, N = 50, NR2 = 33.22. Park test White test Checking for other specifications: Double log, quadratic ECON 7710, 2010

  47. ^ pcon = 551.69*** + 0.19***reg – 53.59***tax hc se (0.022) (23.90) R2 = 0.8664, N = 50. (4) (5) (6) ECON 7710, 2010

  48. Selected Exercises Ch. 10: Q. 1, 3, 4, 5, 8, 10, 12, 14 ECON 7710, 2010

  49. cov(i, j|X1i,X2i,X1j,X2j) = 0 no autocorrelation: i = j var(i|X1i,X2i) = si2 heteroskedasticity: Regression Model Yi = b0 + b1X1i + b2X2i + i E(i|X1i,X2i) = 0 zero mean: var(i|X1i,X2i) = s2 homoskedasticity: ECON 7710, 2010

  50. . . . X1 X2 X3 HeteroskedasticityYi = 0 + 1Xi + ivar(i|Xi) = si2 for all i f(Y) Y 0 X Conditional Distribution ECON 7710, 2010

More Related