1 / 30

Lecture # 8

Lecture # 8. Studenmund (2006) : Chapter 8. Multicollinearity. Objectives. Perfect and imperfect multicollinearity Effects of multicollinearity Detecting multicollinearity Remedies for multicollinearity. The nature of M ulticollinearity. Perfect multicollinearity :

jens
Download Presentation

Lecture # 8

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture#8 Studenmund(2006): Chapter 8 Multicollinearity Objectives • Perfect and imperfect multicollinearity • Effects of multicollinearity • Detecting multicollinearity • Remedies for multicollinearity

  2. The nature of Multicollinearity Perfect multicollinearity: When there are some functional relationships existing among independent variables, that isiXi = 0 or 1X1+ 2X2 + 3X3 +…+ iXi = 0 Such as 1X1+ 2X2= 0 X1= -2X2 If multicollinearity isperfect, the regression coefficients of the Xi variables, is, areindeterminate and their standard errors, Se(i)s, are infinite.

  3. Example: 3-variable Case: Y = 0 + 1X1 + 2X2 +  ^ ^ ^ ^ (yx1)(x22) - (yx2)(x1x2) = (x12)(x22) - (x1x2)2 If x2 =x1, Indeterminate (yx2)(x12) - (yx1)(x1x2) = (x12)(x22) - (x1x2)2 Similarly If x2 =x1 (yx1)(x12) - (yx1)(x1x1) (yx1)(2x12) - (yx1)(x1x1) 0 0 = = = = ^ ^ ^ ^ (x12)(2 x12) - 2(x1x1)2 (x12)(2 x12) - 2(x1x1)2 0 0 1 2 1 2 Indeterminate

  4. (yx1)(2x12 + 2 ) - ( yx1 + y)( x1x1+x1) = (x12)(2 x12 + 2 ) - ( x1x1 +x1)2 ^  0 1 = 0 (Why?) If multicollinearity is imperfect, x2 = 1x1+ where  is a stochastic error (or x2 = 0+ 1x1+ ) Then the regression coefficients, although determinate, possess large standard errors, which means the coefficients can be estimated but with lessaccuracy.

  5. Example: Production functionYi = 0 + 1X1i + 2X2i + 3X3i + i Y: Output X1: Capital X2: Labor X3: Land X1 = 5X2

  6. Example: Perfect multicollinearity a. Suppose D1, D2, D3 and D4 = 1 for spring, summer, autumn and winter, respectively. Yi = 0 + 1D1i + 2D2i + 3D3i + 4D4i + 1X1i + i. b. Yi = 0 + 1X1i + 2X2i + 3X3i + i X1: Nominal interest rate; X2: Real interest rate; X3: CPI c. Yt = 0 + 1Xt + 2Xt + 3Xt-1 + t Where Xt = (Xt – Xt-1) is called “first different”

  7. Imperfect Multicollinearity Yi = 0 + 1X1i + 2X2i + … + KXKi + i When some independent variables are linearly correlated but the relation is not exact, there is imperfect multicollinearity. 0 + 1X1i+ 2X2i +  + KXKi + ui = 0 where u is a random error term and k  0 for some k. When will it be a problem?

  8. Can be detected from regression results Consequences of imperfect multicollinearity 1. The estimated coefficients are still BLUE, however, OLS estimators have large variances and covariances, thus making the estimation withless accuracy. 2. The estimation confidence intervals tend to be much wider, leading to accept the “zero null hypothesis” more readily. 3. The t-statistics of coefficients tend to be statistically insignificant. 4. The R2 can be very high. 5. The OLS estimators and their standard errors can be sensitive to small change in the data.

  9. OLS estimators are still BLUE under imperfect multicollinearity Why??? • Remarks: • Unbiasedness is a repeated sampling property, not about the properties of estimators in any given sample • Minimum variance does not mean small variance • Imperfect multicollinearity is just a sample phenomenon

  10. Effects of Imperfect Multicollinearity • Unaffected: • OLS estimators are still BLUE. • The overall fit of the equation • The estimation of the coefficients of non-multicollinear variables

  11. The variances of OLS estimators increase with the degree of multicollinearity Regression model: Yi = 0 + 1X1i + 2X2i + i • High correlation between X1 and X2 • Difficult to isolate effects of X1 and X2 from each other

  12. Closer relation between X1 and X2 • larger r212 • larger VIF • larger variances where VIFk=1/(1-Rk²), k=1,...,K and Rk² is the coefficient of determination of regressing Xk on all other (K-1) explanatory variables.

  13. Larger tends to be large a. More likely to get unexpected signs. Larger variances tend to increase the standard errors of estimated coefficients. c. Larger standard errors  Lower t-values

  14. d.Larger standard errors Wider confidence intervals Less precise interval estimates.

  15. Detection of Multicollinearity Example: Data set: CONS8 (pp. 254 – 255) COi = 0 + 1Ydi + 2LAi + i CO: Annual consumption expenditure Yd: Annual disposable income LA: Liquid assets

  16. Studenmund (2006) - Eq. 8.9, pp254 Results: High R2 and Adjusted R2 Less significant t-values Since LA (liquid assets, saving, etc.) is highly related to YD (disposable income) Drop one variable

  17. OLS estimates and SE’s can be sensitive to specification and small changes in data Specification changes: Add or drop variables Small changes: Add or drop some observations Change some data values

  18. High Simple Correlation Coefficients Remark: High rij for any i and j is a sufficient indicator for the existence of multicollinearity but not necessary.

  19. Obtain Variance Inflation Factors (VIF) method Procedures: Rule of thumb:VIF > 5  multicollinearity Notes: (a.) Using VIF is not a statistical test. (b.) The cutting point is arbitrary.

  20. Remedial Measures 1. Drop the Redundant Variable Using theories to pick the variable(s) to drop. Do not drop a variable that is strongly supported by theory. (Danger of specification error)

  21. Insignificant Insignificant Since M1 and M2 are highly related Other examples: CPI <=> WPI; CD rate <=> TB rate GDP  GNP  GNI

  22. Check after dropping variables: • The estimation of the coefficients of other variablesare not affected. (necessary) • R2 does not fall much when some collinear variables are dropped. (necessary) • More significant t-values vs. smaller standard errors (likely)

  23. 2. Redesigning the Regression Model There is no definite rule for this method. Example (Studenmund(2006), pp.268) Ft = average pounds of fish consumed per capita PFt = price index for fish PBt = price index for beef Ydt = real per capita disposable income N = the # of Catholic P = dummy = 1 after the Pop’s 1966 decision, = 0 otherwise

  24. High correlations VIFPF = 43.4 VIFlnYd =23.3 VIFPB = 18.9 VIFN =18.5 VIFP =4.4 Signs are unexpected Most t-values are insignificant

  25. Use the Relative Prices (RPt = PFt/PBt) Ft = 0 + 1RPt + 2lnYdt + 3Pt + t Drop N, but not improved Improved

  26. Improved much Using the lagged term of RP to allow the lag effect in the regression Ft = 0 + 1RPt-1 + 2lnYdt + 3Pt + t

  27. 3.Using APriori Information From previous empirical work, e.g. Consi = 0 + 1Incomei + 2Wealthi + i and a priori information: 2 = 0.1. Then construct a new variable or proxy, (Cons*i = Consi– 0.1Wealthi) Run OLS:Cons*i = 0 + 1Incomei + i

  28. 4. Transformation of the Model Taking first differences of time series data. Origin regression model: Yt = 0 + 1X1t + 2X2t + t Transforming model: First differencing Yt = ’0+’1X1t + ’2X2t + ut Where Yt = Yt- Yt-1, (Yt-1 is called a lagged term) X1t = X1t- X1,t-1, X2t = X2t- X2,t-1,

  29. 5. Collect More Data (expand sample size) Larger sample size means smaller variance of estimators. 6. Doing Nothing: Unless multicollearity causes serious biased, and the change of specification give better results.

More Related