AP Statistics

1 / 15

# AP Statistics - PowerPoint PPT Presentation

AP Statistics. Linear Regression Inference. Hypothesis Tests: Slopes. Given: Observed slope relating Education to Job Prestige = 2.47 Question: Can we generalize this to the population of all Americans?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'AP Statistics' - vanna-wilkinson

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### AP Statistics

Linear Regression Inference

Hypothesis Tests: Slopes
• Given: Observed slope relating Education to Job Prestige = 2.47
• Question: Can we generalize this to the population of all Americans?
• How likely is it that this observed slope was actually drawn from a population with slope = 0?
• Solution: Conduct a hypothesis test
• Notation: slope = b, population slope = b
• H0: Population slope b = 0
• H1: Population slope b 0 (two-tailed test)
Review: Slope Hypothesis Tests
• What information lets us to do a hypothesis test?
• Answer: Estimates of a slope (b) have a sampling distribution, like any other statistic
• It is the distribution of every value of the slope, based on all possible samples (of size N)
• If certain assumptions are met, the sampling distribution approximates the t-distribution
• Thus, we can assess the probability that a given value of b would be observed, if b = 0
• If probability is low – below alpha – we reject H0

If b=0, observed slopes should commonly fall near zero, too

Sampling distribution of the slope

b

If observed slope falls very far from 0, it is improbable that b is really equal to zero. Thus, we can reject H0.

0

Review: Slope Hypothesis Tests
• Visually: If the population slope (b) is zero, then the sampling distribution would center at zero
• Since the sampling distribution is a probability distribution, we can identify the likely values of b if the population slope is zero
Bivariate Regression Assumptions
• Assumptions for bivariate regression hypothesis tests:
• 1. Random sample
• Ideally N > 20
• But different rules of thumb exist. (10, 30, etc.)
• 2. Variables are linearly related
• i.e., the mean of Y increases linearly with X
• Check scatter plot for general linear trend
• Watch out for non-linear relationships (e.g., U-shaped)
Bivariate Regression Assumptions
• 3. Y is normally distributed for every outcome of X in the population
• “Conditional normality”
• Ex: Years of Education = X, Job Prestige (Y)
• Suppose we look only at a sub-sample: X = 12 years of education
• Is a histogram of Job Prestige approximately normal?
• What about for people with X = 4? X = 16
• If all are roughly normal, the assumption is met

Examine sub-samples at different values of X. Make histograms and check for normality.

Good

Not very good

Bivariate Regression Assumptions
• Normality:
Bivariate Regression Assumptions
• 4. The variances of prediction errors are identical at different values of X
• Recall: Error is the deviation from the regression line
• Is dispersion of error consistent across values of X?
• Definition: “homoskedasticity” = error dispersion is consistent across values of X
• Opposite: “heteroskedasticity”, errors vary with X
• Test: Compare errors for X=12 years of education with errors for X=2, X=8, etc.
• Are the errors around line similar? Or different?
Bivariate Regression Assumptions
• Homoskedasticity: Equal Error Variance

Here, things look pretty good.

At higher values of X, error variance increases a lot.

Bivariate Regression Assumptions
• Heteroskedasticity: Unequal Error Variance

Bivariate Regression Assumptions
• 1. Overall, regression is robust to violations of assumptions
• It often gives fairly reasonable results, even when assumptions aren’t perfectly met
• 2. Variations of regression can handle situations where assumptions aren’t met
• 3. But, there are also further diagnostics to help ensure that results are meaningful…
Regression Hypothesis Tests
• If assumptions are met, the sampling distribution of the slope (b) approximates a T-distribution
• Standard deviation of the sampling distribution is called the standard error of the slope (sb)
• Population formula of standard error:
• Where se2 is the variance of the regression error
Regression Hypothesis Tests
• Estimating se2 lets us estimate the standard error:
• Now we can estimate the S.E. of the slope:
Regression Hypothesis Tests
• Finally: A t-value can be calculated:
• It is the slope divided by the standard error
• Where sb is the sample point estimate of the standard error
• The t-value is based on N-2 degrees of freedom
Regression Confidence Intervals
• You can also use the standard error of the slope to estimate confidence intervals:
• Where tN-2 is the t-value for a two-tailed test given a desired a-level
• Example: Observed slope = 2.5, S.E. = .10
• 95% t-value for 102 d.f. is approximately 2
• 95% C.I. = 2.5 +/- 2(.10)
• Confidence Interval: 2.3 to 2.7