Assumptions of Regression Analysis

Assumptions of Regression Analysis • The independent variables do not form a linearly dependent set--i.e. the explanatory variables are not perfectly correlated. • Homoscedasticity--the probability distributions of the error term have a constant variance for all values of the independent variables (Xi's).

Perfect multicollinearity is a violation of assumption (1).Heteroscedasticity is a violation of assumption (2)

Multicollinearity is a problem with time series regression Suppose we wanted to estimate the following specification using quarterly time series data: Auto Salest = 0 + 1Incomet + 2Pricest where Incomet is (nominal) income in quarter t and Pricest is an index of auto prices in quarter t. The data reveal there is a strong(positive) correlation betweennominal income and car prices

Approximate linear relationship between explanatory variables Car prices 0 (Nominal) income

Why is multicollinearity a problem? • In the case of perfectly collinear explanatory variables, OLS does not work. • In the case where there is an approximate linear relationship among the explanatory variables (Xi’s), the estimates of the coefficients are still unbiased, but you run into the following problems: • High standard errors of the estimates of the coefficients—thus low t-ratios • Co-mingling of the effects of explanatory variables. • Estimates of the coefficients tends to be “unstable.”

What do about multicollinearity • Increase sample size • Delete one or more explanatory variables

Understanding heteroscedasticity This problem pops up when using cross sectional data

Consider the following model: Yi is the “determined” part of the equation and εi is the error term. Remember we assume in regression that :E(εi) =0

JAR #1 JAR #2 4 400 -4 -400 0 0 -2 -200 200 2  = 0  = 0 Two distributions with the same mean and different variances

The disturbance distributions of heteroscedasticity f(x) Y 0 X1 X2 X2 X

Scatter diagram of ascending heteroscedasticity Spending for electronics Household Income

Why is heteroscedasticity a problem? • Heteroscedasticity does not give us biased estimates of the coefficients--however, it does make the standard errors of the estimates unreliable. That is, we will understate the standard errors. • Due to the aforementioned problem, t-tests cannot be trusted. We run the risk of rejecting a null hypothesis that should not be rejected.

Assumptions of Regression Analysis