Week 4
Download
1 / 63

Week 4 - PowerPoint PPT Presentation


  • 124 Views
  • Uploaded on

Week 4. Bivariate Regression, Least Squares and Hypothesis Testing. Lecture Outline. Method of Least Squares Assumptions Normality assumption Goodness of fit Confidence Intervals Tests of Significance alpha versus p. Recall.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Week 4' - pisces


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Week 4

Week 4

Bivariate Regression,

Least Squares and

Hypothesis Testing


Lecture outline
Lecture Outline

  • Method of Least Squares

  • Assumptions

  • Normality assumption

  • Goodness of fit

  • Confidence Intervals

  • Tests of Significance

  • alpha versus p

IS 620 Spring 2006


Recall
Recall . . .

  • Regression curve as “line connecting the mean values” of y for a given x

    • No necessary reason for such a construction to be a line

    • Need more information to define a function

IS 620 Spring 2006


Method of least squares
Method of Least Squares

  • Goal:describe the functional relationship between y and x

    • Assume linearity (in the parameters)

  • What is the best line to explain the relationship?

  • Intuition:The line that is “closest” or “fits best” the data

IS 620 Spring 2006


Best line n 2
“Best” line, n = 2

IS 620 Spring 2006


Best line n 21
“Best” line, n = 2

IS 620 Spring 2006


Best line n 22
“Best” line, n > 2

?

IS 620 Spring 2006


Best line n 23
“Best” line, n > 2

IS 620 Spring 2006


Least squares intuition

u2

u3

u1

Least squares: intuition

IS 620 Spring 2006


Least squares n 2
Least squares, n > 2

IS 620 Spring 2006


Why sum of squares
Why sum of squares?

  • Sum of residuals may be zero

  • Emphasize residuals that are far away from regression line

  • Better describes spread of residuals

IS 620 Spring 2006


Least squares estimates
Least-squares estimates

Intercept

Residuals

Effect of x on y

(slope)

IS 620 Spring 2006


Gauss markov theorem
Gauss-Markov Theorem

  • Least-squares method produces best, linear unbiased estimators (BLUE)

  • Also most efficient (minimum variance)

  • Provided classic assumptions obtain

IS 620 Spring 2006


Classical assumptions
Classical Assumptions

  • Focus on #3, #4, and #5 in Gujarati

    • Implications for estimators of violations

  • Skim over #1, #2, #6 through #10

IS 620 Spring 2006


3 zero mean value of u i
#3: Zero mean value of ui

  • Residuals are randomly distributed around the regression line

  • Expected value is zero for any given observation of x

  • NOTE: Equivalent to assuming the model is fully specified

IS 620 Spring 2006


3 zero mean value of u i1
#3: Zero mean value of ui

IS 620 Spring 2006


3 zero mean value of u i2
#3: Zero mean value of ui

IS 620 Spring 2006


3 zero mean value of u i3
#3: Zero mean value of ui

IS 620 Spring 2006


3 zero mean value of u i4
#3: Zero mean value of ui

IS 620 Spring 2006


3 zero mean value of u i5
#3: Zero mean value of ui

IS 620 Spring 2006


3 zero mean value of u i6
#3: Zero mean value of ui

IS 620 Spring 2006


3 zero mean value of u i7
#3: Zero mean value of ui

IS 620 Spring 2006


3 zero mean value of u i8
#3: Zero mean value of ui

IS 620 Spring 2006


Violation of 3
Violation of #3

  • Estimated betas will be

    • Unbiased but

    • Inconsistent

    • Inefficient

  • May arise from

    • Systematic measurement error

    • Nonlinear relationships (Phillips curve)

IS 620 Spring 2006


4 homoscedasticity
#4: Homoscedasticity

  • The variance of the residuals is the same for all observations, irrespective of the value of x

  • “Equal variance”

  • NOTE: #3 and #4 imply (see “Normality Assumption”)

IS 620 Spring 2006


4 homoscedasticity1
#4: Homoscedasticity

IS 620 Spring 2006


4 homoscedasticity2
#4: Homoscedasticity

IS 620 Spring 2006


4 homoscedasticity3
#4: Homoscedasticity

IS 620 Spring 2006


4 homoscedasticity4
#4: Homoscedasticity

IS 620 Spring 2006


4 homoscedasticity5
#4: Homoscedasticity

IS 620 Spring 2006


Violation of 4
Violation of #4

  • Estimated betas will be

    • Unbiased

    • Consistent but

    • Inefficient

  • Arise from

    • Cross-sectional data

IS 620 Spring 2006


5 no autocorrelation
#5: No autocorrelation

  • The correlation between any two residuals is zero

  • Residual for xi is unrelated to xj

IS 620 Spring 2006


5 no autocorrelation1
#5: No autocorrelation

IS 620 Spring 2006


5 no autocorrelation2
#5: No autocorrelation

IS 620 Spring 2006


5 no autocorrelation3
#5: No autocorrelation

IS 620 Spring 2006


5 no autocorrelation4
#5: No autocorrelation

IS 620 Spring 2006


Violations of 5
Violations of #5

  • Estimated betas will be

    • Unbiased

    • Consistent

    • Inefficient

  • Arise from

    • Time-series data

    • Spatial correlation

IS 620 Spring 2006


Other assumptions 1
Other Assumptions (1)

  • Assumption 6: zero covariance between xi and ui

    • Violations cause of heteroscedasticity

    • Hence violates #4

  • Assumption 9: model correctly specified

    • Violations may violate #1 (linearity)

    • May also violate #3: omitted variables?

IS 620 Spring 2006


Other assumptions 2
Other Assumptions (2)

  • #7: n must be greater than number of parameters to be estimated

    • Key in multivariate regression

    • King, Keohane and Verba’s (1996) critique of small n designs

IS 620 Spring 2006


Normality assumption
Normality Assumption

  • Distribution of disturbance is unknown

  • Necessary for hypothesis testing of I.V.s

    • Estimates a function of ui

  • Assumption of normality is necessary for inference

  • Equivalent to assuming model is completely specified

IS 620 Spring 2006


Normality assumption1
Normality Assumption

  • Central Limit Theorem: M&Ms

  • Linear transformation of a normal variable itself is normal

  • Simple distribution (mu, sigma)

  • Small samples

IS 620 Spring 2006


Assumptions distilled
Assumptions, Distilled

  • Linearity

  • DV is continuous, interval-level

  • Non-stochastic: No correlation between independent variables

  • Residuals are independently and identically distributed (iid)

    • Mean of zero

    • Constant variance

IS 620 Spring 2006


If so
If so, . . .

  • Least-squares method produces BLUE estimators

IS 620 Spring 2006


Goodness of fit
Goodness of Fit

  • How “well” the least-squares regression line fits the observed data

  • Alternatively: how well the function describes the effect of x on y

  • How much of the observed variation in y have we explained?

IS 620 Spring 2006


Coefficient of determination
Coefficient of determination

  • Commonly referred to as “r2”

  • Simply, the ratio of explained variation in y to the total variation in y

IS 620 Spring 2006


Components of variation
Components of variation

explained

total

residual

IS 620 Spring 2006


Components of variation1
Components of variation

  • TSS: total sum of squares

  • ESS: explained sum of squares

  • RSS: residual sum of squares

IS 620 Spring 2006


Hypothesis testing
Hypothesis Testing

  • Confidence Intervals

  • Tests of significance

  • ANOVA

  • Alpha versus p-value

IS 620 Spring 2006


Confidence intervals
Confidence Intervals

  • Two components

    • Estimate

    • Expression of uncertainty

  • Interpretation:

    • Gujarati, p. 121: “The probability of constructing an interval that contains Beta is 1-alpha”

    • NOT: “The p that Beta is in the interval is 1-alpha”

IS 620 Spring 2006


C i s for regression
C.I.s for regression

  • Depend upon our knowledge or assumption about the sampling distribution

  • Width of interval proportional to standard error of the estimators

  • Typically we assume

    • The t distribution for Betas

    • The chi-square distribution for variances

    • Due to unknown true standard error

IS 620 Spring 2006


Confidence intervals in ir
Confidence Intervals in IR

  • Examples?

IS 620 Spring 2006


The worst weatherman in the world
The worst weatherman in the world

  • “Three-degree guarantee”

  • If his forecast high is off by more than three degrees, someone wins an umbrella

  • Woo hoo

IS 620 Spring 2006


How many umbrellas
How Many Umbrellas?

  • Data: mean daily temperature in February for Washington, DC

    • Daily observations from 1995 to 2005 (n = 311)

    • Mean: 47.91 degrees F

    • Standard deviation: 10.58

  • The interval: +/- 3.5 degrees F

    • Due to rounding

    • Note: spread of seven (eight?) degrees

IS 620 Spring 2006


The t value
The t value

  • We don’t know alpha: level of confidence

  • Assume t distribution

IS 620 Spring 2006


The answer
The answer

  • From the t table:

Tom will give away an umbrella on

average about once every 26,695,141days.

Thanks, Tom.

IS 620 Spring 2006


Tests of significance
Tests of Significance

  • A hypothesis about a point value rather than an interval

    • Does the observed sample value differ from the hypothesized value?

  • Null hypothesis (H0): no difference

  • Alternative hypothesis (Ha): significant difference

IS 620 Spring 2006


Regression interpretation
Regression Interpretation

  • Is the hypothesized causal effect (beta) significantly different than zero?

    • Ho: no effect (β= 0)

    • Ha: effect (β≠ 0)

  • The “zero” null hypothesis

IS 620 Spring 2006


Two tail v one tail tests

Two-tail

Ha is not concerned with direction of difference

Exploratory

Theory in disagreement

Critical regions on both ends

One tailed

Ha specifies a direction of effect

Theory well developed

Critical regions only on one end

Two-tail v. One-tail tests

IS 620 Spring 2006


The 2 t rule
The 2-t rule

  • Gujarati, p. 134: zero null hypothesis can be rejected if t > 2

    • D.F. > 20

    • Level of significance = 0.05

    • Recall Weatherman Tom: t = 5.62!

IS 620 Spring 2006


Alpha versus p values

Alpha

Conventional

Findings reported at 0.5, 0.1, 0.01

Accessible, intuitive

Arbitrary

Makes assumptions about Type I, II errors

P-value

“The lowest significance at which a null hypothesis can be rejected”

Widely accepted today

Know your readers!

Alpha versus p-values

IS 620 Spring 2006


Anova
ANOVA

  • Intuitively similar to r2

    • Identical output for bivariate regression

  • A good test of the zero null hypothesis

  • In multivariate regression, tests the null hypotheses for all betas

    • Check F statistic before checking betas!

IS 620 Spring 2006


Limits of anova
Limits of ANOVA

  • Harder to interpret

  • Does not provide information on direction or magnitude of effect for independent variables

IS 620 Spring 2006


Anova output from spss
ANOVA output from SPSS

IS 620 Spring 2006


ad