IX.  Heteroscedasticity

IX. Heteroscedasticity PowerPoint PPT Presentation


  • 104 Views
  • Uploaded on
  • Presentation posted in: General

Download Presentation

IX. Heteroscedasticity

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


1. Proportional (multiplicative) ht vs. Partitioned(groupwise) ht Estimated GLS/Weighted Least Squares…..Proportional (multiplicative) ht vs. Partitioned(groupwise) ht Estimated GLS/Weighted Least Squares…..

2. We start with MR3, var of yt and et are all same across t because sigma square is Not t-specific! “homoscedasticity” assumption.We start with MR3, var of yt and et are all same across t because sigma square is Not t-specific! “homoscedasticity” assumption.

3. Let’s consider the milk demand function, where yt is the milk consumption and Xt is people’s income level. The graph indicates that the residual must remain in absolute value as income grows. As income grows, the observed yt tend to have the same deviation from the fitted line. IN other words, the coefficient b2 explains milk consumption pattern across different income groups. For example, since we have a constant parameter model here, if high income people have a different actual consumption pattern, yt, compared to low income people, then et square between high and low income people would NOT the same! Let’s consider the milk demand function, where yt is the milk consumption and Xt is people’s income level. The graph indicates that the residual must remain in absolute value as income grows. As income grows, the observed yt tend to have the same deviation from the fitted line. IN other words, the coefficient b2 explains milk consumption pattern across different income groups. For example, since we have a constant parameter model here, if high income people have a different actual consumption pattern, yt, compared to low income people, then et square between high and low income people would NOT the same!

4. In fact, in general, income is less important as an explanatory variable for food expenditure of high-income families. Therefore, estimate b2 does NOT explain very well the milk consumption pattern for the higher income people. This is well explained in the graph! SO, we got a problem! One of the OLS assumptions is violated!In fact, in general, income is less important as an explanatory variable for food expenditure of high-income families. Therefore, estimate b2 does NOT explain very well the milk consumption pattern for the higher income people. This is well explained in the graph! SO, we got a problem! One of the OLS assumptions is violated!

5. f(y1|x1), f(y2|x2),……, the probability density function f(y1|x1) is such that y1 will be close to E(y1), or in other words, the conditional pdf of y1 given x1 for errors does NOT change.f(y1|x1), f(y2|x2),……, the probability density function f(y1|x1) is such that y1 will be close to E(y1), or in other words, the conditional pdf of y1 given x1 for errors does NOT change.

6. Researchers often face the heteroscedastic problem when using cross-sectional data. Why Not in time-series data?Researchers often face the heteroscedastic problem when using cross-sectional data. Why Not in time-series data?

8. A. Least squares estimators still unbiased and consistent. 2. What is the Consequences of Heteroskedasticity? Claim: if an estimator is asymptotically unbiased and have variance that converges to zero, the estimator is consistent by Chevysheff theorem Two matrices tend to be finite positive definite as long as other OLS assumptions are met. Non convergence case P360 of Green: Var(y)=(sigma squared/T)(1-p-Tp)=>the limit of this expression becomes p multiplied by sigma squared Claim: if an estimator is asymptotically unbiased and have variance that converges to zero, the estimator is consistent by Chevysheff theorem Two matrices tend to be finite positive definite as long as other OLS assumptions are met. Non convergence case P360 of Green: Var(y)=(sigma squared/T)(1-p-Tp)=>the limit of this expression becomes p multiplied by sigma squared

9. Note that we do Not assume the normality or any assumption on the mean of the sampled data. Prob of x bar exist around nu within pus minus epsilon. As T increases delta becomes smaller and smaller The prob of x bar lying some small interval around nu can be as close to one as we increase the sample size sufficiently large The sample mean is “consistent estimator” of the population mean nu, and this process is called “convergence in prob”Note that we do Not assume the normality or any assumption on the mean of the sampled data. Prob of x bar exist around nu within pus minus epsilon. As T increases delta becomes smaller and smaller The prob of x bar lying some small interval around nu can be as close to one as we increase the sample size sufficiently large The sample mean is “consistent estimator” of the population mean nu, and this process is called “convergence in prob”

10. Illustration of consistency vs. biasedness and efficiency graphically (p447 of GHJ) Illustration of consistency vs. biasedness and efficiency graphically (p447 of GHJ)

12. How the pdf of the sample mean, x bar, behaves the increasing T? The limiting prob of the statistic is less than or equal to some value y is represented as the standard normal with N(0, 1) Xt is asymptotically normally distributed with mean mu and variance sigma squared /T Show the graph p 56 of D and J or Y’s lecture noteHow the pdf of the sample mean, x bar, behaves the increasing T? The limiting prob of the statistic is less than or equal to some value y is represented as the standard normal with N(0, 1) Xt is asymptotically normally distributed with mean mu and variance sigma squared /T Show the graph p 56 of D and J or Y’s lecture note

14. How do I know if heteroscdasticity might be a problem got my model and data? Is there any way to detect this problem so that I can use GLS technique. We are going to discuss two ways to do this: residual plot and Goldfeld-Quandt test.How do I know if heteroscdasticity might be a problem got my model and data? Is there any way to detect this problem so that I can use GLS technique. We are going to discuss two ways to do this: residual plot and Goldfeld-Quandt test.

15. Residual Plots

16. . If heteroskedasticty exists, some observations will have large variances and others will have small variances. How would you separate high variance subsample vs. low variance subsample? For example, in the food expenditure model, we expect the variances are related to xt. Then, we should sort the data according to the level of xt; the T/2 obs with the largest values of xt; the rest should be T/2 obs with the smallest values of xt. 1.If a null hypothesis of equal variances is not true, we expect sigma1 square/sigma2 square to be large. . If heteroskedasticty exists, some observations will have large variances and others will have small variances. How would you separate high variance subsample vs. low variance subsample? For example, in the food expenditure model, we expect the variances are related to xt. Then, we should sort the data according to the level of xt; the T/2 obs with the largest values of xt; the rest should be T/2 obs with the smallest values of xt. 1.If a null hypothesis of equal variances is not true, we expect sigma1 square/sigma2 square to be large.

17. ·       We overcome this problem by making a further assumption about the . Our earlier inspection of the least squares residuals suggested that the error variance increases as income increases. A reasonable model for such a variance relationship is ·       Under heteroskedasticity the least squares estimator is not the best linear unbiased estimator. One way of overcoming this dilemma is to change or transform our statistical model into one with homoskedastic errors. Leaving the basic structure of the model intact, it is possible to turn the heteroskedastic error model into a homoskedastic error model. Once this transformation has been carried out, application of least squares to the transformed model gives a best linear unbiased estimator. ·       We overcome this problem by making a further assumption about the . Our earlier inspection of the least squares residuals suggested that the error variance increases as income increases. A reasonable model for such a variance relationship is ·       Under heteroskedasticity the least squares estimator is not the best linear unbiased estimator. One way of overcoming this dilemma is to change or transform our statistical model into one with homoskedastic errors. Leaving the basic structure of the model intact, it is possible to turn the heteroskedastic error model into a homoskedastic error model. Once this transformation has been carried out, application of least squares to the transformed model gives a best linear unbiased estimator.

18. REMARK: The above test is a one-sided test because the alternative hypothesis suggested which sample partition will have the larger variance. If we suspect that two sample partitions could have different variances, but we do not know which variance is potentially larger, then a two-sided test with alternative hypothesis sigma1 square is not equal to sigma2 square is more appropriate. To perform a two-sided test at the 5 percent significance level, we have to look up 2.5 percent table. I.e P[F>Fc]=0.025 REMARK: The above test is a one-sided test because the alternative hypothesis suggested which sample partition will have the larger variance. If we suspect that two sample partitions could have different variances, but we do not know which variance is potentially larger, then a two-sided test with alternative hypothesis sigma1 square is not equal to sigma2 square is more appropriate. To perform a two-sided test at the 5 percent significance level, we have to look up 2.5 percent table. I.e P[F>Fc]=0.025

21. The joint prob density function f(y1…yT) is….The joint prob density function f(y1…yT) is….

23. Why not (?) because we know E(e’e)=T-k sigma squared Gradient vectorWhy not (?) because we know E(e’e)=T-k sigma squared Gradient vector

26. The limitation of GQ test: (1) it cannot accommodate situations where several variables jointly cause heteroscedasticity, I.e., GQ test has problem when we have an error variance that is related to more than one variable as indicated in above slide, because it is impossible to order the observations according to several variables at the same time.The limitation of GQ test: (1) it cannot accommodate situations where several variables jointly cause heteroscedasticity, I.e., GQ test has problem when we have an error variance that is related to more than one variable as indicated in above slide, because it is impossible to order the observations according to several variables at the same time.

27. B-P test: SSR/2 is close to Trsquared or use F-test for the model significance. R-squared is from auxiliary regressionB-P test: SSR/2 is close to Trsquared or use F-test for the model significance. R-squared is from auxiliary regression

29. Limitation of LM/GQ tests: require a prior knowledge of what might be causing heteroscedastic problemLimitation of LM/GQ tests: require a prior knowledge of what might be causing heteroscedastic problem

30. Include yhatt and y hat t squared or simply exclude cross products. Limitation of white test: no indication of the form of the heteroscedasticity, therefore, no guide to an appropriate GLS estimator White’s test tends to perform poorly with small sample size.Include yhatt and y hat t squared or simply exclude cross products. Limitation of white test: no indication of the form of the heteroscedasticity, therefore, no guide to an appropriate GLS estimator White’s test tends to perform poorly with small sample size.

31. Based on asymptotic normality of the MLE R? determines determines hypotheses to test with the known (kx1) vector r; q hypotheses to test here.Based on asymptotic normality of the MLE R? determines determines hypotheses to test with the known (kx1) vector r; q hypotheses to test here.

32. The restricted likelihood maximum cannot exceed the unrestricted LM, but the null is true, two values would be close. We expect to reject if lambda is small. Therefore, if LR is big enough compared to chi-squared critical value at df q, reject null against alternative hypothesis.The restricted likelihood maximum cannot exceed the unrestricted LM, but the null is true, two values would be close. We expect to reject if lambda is small. Therefore, if LR is big enough compared to chi-squared critical value at df q, reject null against alternative hypothesis.

33. We proved that the denominator is equal to SSE®-SSE(U) ad sgima hat squared is SSE(U)/T Reject H0 if W stat is greater than chi squared critical value at df q. Do you want to teach the extension of Wald test to nonlinear functions of beta? See Yoder’s note. Do you want to teach the extension of all Wald, LM, and LR tests for simple regression case see Maddalla, p117-121We proved that the denominator is equal to SSE®-SSE(U) ad sgima hat squared is SSE(U)/T Reject H0 if W stat is greater than chi squared critical value at df q. Do you want to teach the extension of Wald test to nonlinear functions of beta? See Yoder’s note. Do you want to teach the extension of all Wald, LM, and LR tests for simple regression case see Maddalla, p117-121

34. See Appendix 5.2 and 5.3 of D&J If e tilda doesn’t have zero mean, Rsquare is the uncentered Rsquare. However, the mean of e tilda maintains “zero” because most restrictions are on slope coefficients allowing the intercept to adjust to meet the OLS assumption on error term. Therefore, Rsquare here becomes the usual centered R-square. => Run restricted OLS, then regress e tilda on X to get R-square See Appendix 5.2 and 5.3 of D&J If e tilda doesn’t have zero mean, Rsquare is the uncentered Rsquare. However, the mean of e tilda maintains “zero” because most restrictions are on slope coefficients allowing the intercept to adjust to meet the OLS assumption on error term. Therefore, Rsquare here becomes the usual centered R-square. => Run restricted OLS, then regress e tilda on X to get R-square

35. In reality we frequently have some information outside the sample. This nonsample information may come from many places, such as economic principles, experience, or empirical observations. When the information is available, it seems appropriate to find a way to use it. This should increase the precision of model estimates. In reality we frequently have some information outside the sample. This nonsample information may come from many places, such as economic principles, experience, or empirical observations. When the information is available, it seems appropriate to find a way to use it. This should increase the precision of model estimates.

36. P380 of GHJP380 of GHJ

40. Incorrect OLS estimators tend to show “overly accurate,” incorrectly smaller than GLS estimatorsIncorrect OLS estimators tend to show “overly accurate,” incorrectly smaller than GLS estimators

41. Can we handle this non-spherical error problem using least squares? Yeah, it is GLS.Can we handle this non-spherical error problem using least squares? Yeah, it is GLS.

42. ML would have the same estimators asymptotically. The sigma squared hat is different from the biased ML estimator by T/(T-k) We will talk about the characteristics of omega: heteroscedasticity, AR; and how to detect “omega” in the next chapter.ML would have the same estimators asymptotically. The sigma squared hat is different from the biased ML estimator by T/(T-k) We will talk about the characteristics of omega: heteroscedasticity, AR; and how to detect “omega” in the next chapter.

43. 1. Decide which variable is proportional to the heteroskedasticity (xt in previous example). 2. Divide all terms in the original model by the square root of that variable (divide by xt ) including intercept 1. 3. Run least squares on the transformed model which has new, yt, xt1 and xt2 variables. That is we weighted our variables 1 over square root of xt. Remark: In the transformed model That is, the variable associated with the intercept parameter is no longer equal to “1”. Since least squares software usually automatically inserts a “1” for the intercept, when dealing with transformed variables you will need to learn how to turn this option off. If you use a “weighted” or “generalized” least squares option on your software, the computer will do both the transforming and the estimating. In this case suppressing the constant will not be necessary. That is we weighted our variables 1 over square root of xt. Remark: In the transformed model That is, the variable associated with the intercept parameter is no longer equal to “1”. Since least squares software usually automatically inserts a “1” for the intercept, when dealing with transformed variables you will need to learn how to turn this option off. If you use a “weighted” or “generalized” least squares option on your software, the computer will do both the transforming and the estimating. In this case suppressing the constant will not be necessary.

44. ·       We overcome this problem by making a further assumption about the . Our earlier inspection of the least squares residuals suggested that the error variance increases as income increases. A reasonable model for such a variance relationship is ·       Under heteroskedasticity the least squares estimator is not the best linear unbiased estimator. One way of overcoming this dilemma is to change or transform our statistical model into one with homoskedastic errors. Leaving the basic structure of the model intact, it is possible to turn the heteroskedastic error model into a homoskedastic error model. Once this transformation has been carried out, application of least squares to the transformed model gives a best linear unbiased estimator. ·       We overcome this problem by making a further assumption about the . Our earlier inspection of the least squares residuals suggested that the error variance increases as income increases. A reasonable model for such a variance relationship is ·       Under heteroskedasticity the least squares estimator is not the best linear unbiased estimator. One way of overcoming this dilemma is to change or transform our statistical model into one with homoskedastic errors. Leaving the basic structure of the model intact, it is possible to turn the heteroskedastic error model into a homoskedastic error model. Once this transformation has been carried out, application of least squares to the transformed model gives a best linear unbiased estimator.

45. ·       The transformed error term will retain the properties et star = 0 and zero correlation between different observations, cov(ei, ej)=0 for i ? j. ·       As a consequence, we can apply least squares to the transformed variables, and to obtain the best linear unbiased estimator for ?1 and ?2. ·       The transformed model is linear in the unknown parameters ?1 and ?2. These are the original parameters that we are interested in estimating. ·       The transformed model satisfies the conditions of the Gauss-Markov Theorem, and the least squares estimators defined in terms of the transformed variables are B.L.U.E. ·       The estimator obtained in this way is called a generalized least squares estimator. ·       The transformed error term will retain the properties et star = 0 and zero correlation between different observations, cov(ei, ej)=0 for i ? j. ·       As a consequence, we can apply least squares to the transformed variables, and to obtain the best linear unbiased estimator for ?1 and ?2. ·       The transformed model is linear in the unknown parameters ?1 and ?2. These are the original parameters that we are interested in estimating. ·       The transformed model satisfies the conditions of the Gauss-Markov Theorem, and the least squares estimators defined in terms of the transformed variables are B.L.U.E. ·       The estimator obtained in this way is called a generalized least squares estimator.

46. ·       It is important to recognize that the interpretations for ?1 and ?2 are the same in the transformed model in (11.3.5) as they are in the untransformed model in (11.3.1). The standard errors in (11.3.8), namely se(b1) = 17.986 and se(b2) = 0.0270 are both lower than their least squares counterparts that were calculated from White's estimator, namely se(b1) = 23.704 and se(b2) = 0.0382. Since generalized least squares is a better estimation procedure than least squares, we do expect the generalized least squares standard errors to be lower. ·    The smaller standard errors have the advantage of producing narrower more informative confidence intervals. For example, using the generalized least squares results, a 95% confidence interval for ?2 is given by = 0.1410 ? 2.024(0.0270) = [0.086, 0.196] The least squares confidence interval computed using White's standard errors was [0.051, 0.206]. ·       It is important to recognize that the interpretations for ?1 and ?2 are the same in the transformed model in (11.3.5) as they are in the untransformed model in (11.3.1). The standard errors in (11.3.8), namely se(b1) = 17.986 and se(b2) = 0.0270 are both lower than their least squares counterparts that were calculated from White's estimator, namely se(b1) = 23.704 and se(b2) = 0.0382. Since generalized least squares is a better estimation procedure than least squares, we do expect the generalized least squares standard errors to be lower. ·    The smaller standard errors have the advantage of producing narrower more informative confidence intervals. For example, using the generalized least squares results, a 95% confidence interval for ?2 is given by = 0.1410 ? 2.024(0.0270) = [0.086, 0.196] The least squares confidence interval computed using White's standard errors was [0.051, 0.206].

47. Quantity supplied depends upon the production technology of the firm, on the price of wheat or expectations about the price of wheat, and on weather conditions. ·    Because there is no obvious index of production technology, some kind of proxy needs to be used for this variable. We use a simple linear time-trend, a variable that takes the value 1 in year 1, 2 in year 2, and so on, up to 26 in year 26. ·    An obvious weather variable is also unavailable; thus, in our statistical model, weather effects will form part of the random error term. Using these considerations, we specify the linear supply function Qt is the quantity of wheat produced in year t, Pt is the price of wheat guaranteed for year t, t is a trend variable introduced to capture changes in production technology, and et is a random error term that includes, among other things, the influence of weather. Quantity supplied depends upon the production technology of the firm, on the price of wheat or expectations about the price of wheat, and on weather conditions. ·    Because there is no obvious index of production technology, some kind of proxy needs to be used for this variable. We use a simple linear time-trend, a variable that takes the value 1 in year 1, 2 in year 2, and so on, up to 26 in year 26. ·    An obvious weather variable is also unavailable; thus, in our statistical model, weather effects will form part of the random error term. Using these considerations, we specify the linear supply function Qt is the quantity of wheat produced in year t, Pt is the price of wheat guaranteed for year t, t is a trend variable introduced to capture changes in production technology, and et is a random error term that includes, among other things, the influence of weather.

49. If the test shows the existence of heteroscedasticity, then need to conduct “partitioned GLS”If the test shows the existence of heteroscedasticity, then need to conduct “partitioned GLS”

50. This can be applied to many groups. I.e., you can divide each group’s data by each se (sigma) and proceed OLS, or directly use MLE Note that there is NO INTERCEPT term here.This can be applied to many groups. I.e., you can divide each group’s data by each se (sigma) and proceed OLS, or directly use MLE Note that there is NO INTERCEPT term here.

51. Slide figure 11.3 T1=T2=13, K=3Slide figure 11.3 T1=T2=13, K=3

53. Substituting these estimates for the true values causes no difficulties in large samples. Therefore, the variance for the transformed model should be close to one in large sample case. Remark: A word of warning about calculation of the standard errors is necessary. As demonstrated below (11.5.5), the transformed errors in (11.5.5) have a variance equal to one. However, when you transform your variables using and , and apply least squares to the transformed variables for the complete sample, your computer program will automatically estimate a variance for the transformed errors. This estimate will not be exactly equal to one. The standard errors in (R11.8) were calculated by forcing the computer to use one as the variance of the transformed errors. Most software packages will have options that let you do this, but it is not crucial if your package does not; the variance estimate will usually be close to one anyway. Substituting these estimates for the true values causes no difficulties in large samples. Therefore, the variance for the transformed model should be close to one in large sample case. Remark: A word of warning about calculation of the standard errors is necessary. As demonstrated below (11.5.5), the transformed errors in (11.5.5) have a variance equal to one. However, when you transform your variables using and , and apply least squares to the transformed variables for the complete sample, your computer program will automatically estimate a variance for the transformed errors. This estimate will not be exactly equal to one. The standard errors in (R11.8) were calculated by forcing the computer to use one as the variance of the transformed errors. Most software packages will have options that let you do this, but it is not crucial if your package does not; the variance estimate will usually be close to one anyway.

54. FGLS-Estimated GLS GLS divide all variables by square root of zt (when alpha2 is 2, divide by zt) and do OLS or MLE using V hat.FGLS-Estimated GLS GLS divide all variables by square root of zt (when alpha2 is 2, divide by zt) and do OLS or MLE using V hat.

55. Do either GLS or MLE WLS and ML may produce different estimators because there is no more “homoscedasticity” assumption, and e’e should be divided by above equation which is a function of Z (or X).Do either GLS or MLE WLS and ML may produce different estimators because there is no more “homoscedasticity” assumption, and e’e should be divided by above equation which is a function of Z (or X).

59. Make sure that you include 1/deflatorMake sure that you include 1/deflator

  • Login