AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH

Download Presentation

AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH

Loading in 2 Seconds...

- 52 Views
- Uploaded on
- Presentation posted in: General

AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

AAEC 4302ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH

Chapter 13.3

Multicollinearity

- When two or more independent variables in a regression model are highly correlated to each other, it is difficult to determine if each of these variables, individually, has an effect on Y, and to quantify the magnitude of that effect
- Intuitively, for example, if all farms in a sample that use a lot of fertilizer also apply large amounts of pesticides (and vice versa), it would be hard to tell if the observed increase in yield is due to higher fertilizer or to higher pesticide use

- In economics, when interest and inflation rates are used as independent variables, it is often hard to quantify their individual effects on the dependent variable because they are highly correlated to each other
- Mathematically, this occurs because, everything else being constant, the standard error associated to a given OLS parameter estimate will be higher if the corresponding independent variable is more highly correlated to the other independent variables in the model

- This potential problem is known as multicollinearity
- It could be the reason why independent variables that are believed to be key in determining the value of a given dependent variable do not result statistically significant when conducting the basic significance test

- It is not a mistake in the model specification, but due to the nature of the data at hand
- It is more common in time-series data models because time often affects the values taken by many of the independent variables in these type of models causing them to be highly correlated to each other

- Perfect multicollinearity occurs when there is a perfect linear correlation between two or more independent variables, i.e. when an independent variable is actually a linear function of one or more of the others

- When including total annual rainfall as well as rainfall during each of the seasons as independent variables in an (annual) time series model;
- When independent variable takes a constant value in all observations

- If there is perfect multicollinearity, the OLS method can not produce parameter estimates
- All cases of perfect multicollinearity are the result of making a mistake when specifying the model and can be easily corrected by properly specifying the models

- A certain degree of correlation (multicollinearity) between the independent variables is normal and expected in most cases
- Multicollinearity is considered severe and becomes a problem when this correlation is high and interferes with the estimation of the model’s parameters at the desired level of statistical certainty

- The following, when taken together, are considered symptoms of a multicollinearity problem
- independent variable(s) considered critical in explaining the model’s dependent variable are not statistically significant according to the tests

- High R2, highly significant F-test, but few or no statistically significant t tests
- Parameter estimates drastically change values and become statistically significant when excluding some independent variables from the regression

- A simple test for multicollinearity is to conduct “artificial” regressions between each independent variable (as the “dependent” variable) and the remaining independent variables
- Variance Inflation Factors (VIFj) are calculated as:
- where Rj2 is the R2 of the artificial regression with the jth independent variable as a “dependent” variable

- VIFj = 2, for example, means that variance is twice what it would be if Xj, was not affected by multicollinearity
- A VIFj>10 is clear evidence that the estimation of Bj is being affected by multicollinearity

- Although it is useful to be aware of the presence of multicollinearity, it is not easy to remedy severe (non-perfect) multicollinearity
- If possible, adding observations or taking a new sample might help lessen multicollinearity

- Exclude the independent variables that appear to be causing the problem
- Even if this is not the case, recall that omitting relevant independent variables from the model will make the OLS estimators biased

- Modifying the model specification sometimes help, for example:
- using real instead of nominal economic data
- using a reciprocal instead of a polynomial specification on a given independent variable

- Remember that there is usually certain degree of multicollinearity in any model, and one should not worry about it as long as the statistical inferences are not disappointing