Week 4

Week 4 Bivariate Regression, Least Squares and Hypothesis Testing

Lecture Outline • Method of Least Squares • Assumptions • Normality assumption • Goodness of fit • Confidence Intervals • Tests of Significance • alpha versus p IS 620 Spring 2006

Recall . . . • Regression curve as “line connecting the mean values” of y for a given x • No necessary reason for such a construction to be a line • Need more information to define a function IS 620 Spring 2006

Method of Least Squares • Goal:describe the functional relationship between y and x • Assume linearity (in the parameters) • What is the best line to explain the relationship? • Intuition:The line that is “closest” or “fits best” the data IS 620 Spring 2006

“Best” line, n = 2 IS 620 Spring 2006

“Best” line, n > 2 ? IS 620 Spring 2006

“Best” line, n > 2 IS 620 Spring 2006

u2 u3 u1 Least squares: intuition IS 620 Spring 2006

Least squares, n > 2 IS 620 Spring 2006

Why sum of squares? • Sum of residuals may be zero • Emphasize residuals that are far away from regression line • Better describes spread of residuals IS 620 Spring 2006

Least-squares estimates Intercept Residuals Effect of x on y (slope) IS 620 Spring 2006

Gauss-Markov Theorem • Least-squares method produces best, linear unbiased estimators (BLUE) • Also most efficient (minimum variance) • Provided classic assumptions obtain IS 620 Spring 2006

Classical Assumptions • Focus on #3, #4, and #5 in Gujarati • Implications for estimators of violations • Skim over #1, #2, #6 through #10 IS 620 Spring 2006

#3: Zero mean value of ui • Residuals are randomly distributed around the regression line • Expected value is zero for any given observation of x • NOTE: Equivalent to assuming the model is fully specified IS 620 Spring 2006

#3: Zero mean value of ui IS 620 Spring 2006

Violation of #3 • Estimated betas will be • Unbiased but • Inconsistent • Inefficient • May arise from • Systematic measurement error • Nonlinear relationships (Phillips curve) IS 620 Spring 2006

#4: Homoscedasticity • The variance of the residuals is the same for all observations, irrespective of the value of x • “Equal variance” • NOTE: #3 and #4 imply (see “Normality Assumption”) IS 620 Spring 2006

#4: Homoscedasticity IS 620 Spring 2006

Violation of #4 • Estimated betas will be • Unbiased • Consistent but • Inefficient • Arise from • Cross-sectional data IS 620 Spring 2006

#5: No autocorrelation • The correlation between any two residuals is zero • Residual for xi is unrelated to xj IS 620 Spring 2006

#5: No autocorrelation IS 620 Spring 2006

Violations of #5 • Estimated betas will be • Unbiased • Consistent • Inefficient • Arise from • Time-series data • Spatial correlation IS 620 Spring 2006

Other Assumptions (1) • Assumption 6: zero covariance between xi and ui • Violations cause of heteroscedasticity • Hence violates #4 • Assumption 9: model correctly specified • Violations may violate #1 (linearity) • May also violate #3: omitted variables? IS 620 Spring 2006

Other Assumptions (2) • #7: n must be greater than number of parameters to be estimated • Key in multivariate regression • King, Keohane and Verba’s (1996) critique of small n designs IS 620 Spring 2006

Normality Assumption • Distribution of disturbance is unknown • Necessary for hypothesis testing of I.V.s • Estimates a function of ui • Assumption of normality is necessary for inference • Equivalent to assuming model is completely specified IS 620 Spring 2006

Normality Assumption • Central Limit Theorem: M&Ms • Linear transformation of a normal variable itself is normal • Simple distribution (mu, sigma) • Small samples IS 620 Spring 2006

Assumptions, Distilled • Linearity • DV is continuous, interval-level • Non-stochastic: No correlation between independent variables • Residuals are independently and identically distributed (iid) • Mean of zero • Constant variance IS 620 Spring 2006

If so, . . . • Least-squares method produces BLUE estimators IS 620 Spring 2006

Goodness of Fit • How “well” the least-squares regression line fits the observed data • Alternatively: how well the function describes the effect of x on y • How much of the observed variation in y have we explained? IS 620 Spring 2006

Coefficient of determination • Commonly referred to as “r2” • Simply, the ratio of explained variation in y to the total variation in y IS 620 Spring 2006

Components of variation explained total residual IS 620 Spring 2006

Components of variation • TSS: total sum of squares • ESS: explained sum of squares • RSS: residual sum of squares IS 620 Spring 2006

Hypothesis Testing • Confidence Intervals • Tests of significance • ANOVA • Alpha versus p-value IS 620 Spring 2006

Confidence Intervals • Two components • Estimate • Expression of uncertainty • Interpretation: • Gujarati, p. 121: “The probability of constructing an interval that contains Beta is 1-alpha” • NOT: “The p that Beta is in the interval is 1-alpha” IS 620 Spring 2006

C.I.s for regression • Depend upon our knowledge or assumption about the sampling distribution • Width of interval proportional to standard error of the estimators • Typically we assume • The t distribution for Betas • The chi-square distribution for variances • Due to unknown true standard error IS 620 Spring 2006

Week 4

Week 4

Presentation Transcript

Week 4

Week 4

Week 4

Week 4

WEEK 4

Week 4

Week 4

Week 4

Week 4

Week 4

Week 4

Week 4

Week 4

WEEK 4

Week 4

Week 4

Week 4

Week 4

Week 4

Week 4

Week 4

Week 4