ap statistics n.
Skip this Video
Loading SlideShow in 5 Seconds..
AP Statistics PowerPoint Presentation
Download Presentation
AP Statistics

Loading in 2 Seconds...

play fullscreen
1 / 15

AP Statistics - PowerPoint PPT Presentation

  • Uploaded on

AP Statistics. Linear Regression Inference. Hypothesis Tests: Slopes. Given: Observed slope relating Education to Job Prestige = 2.47 Question: Can we generalize this to the population of all Americans?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'AP Statistics' - vanna-wilkinson

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ap statistics

AP Statistics

Linear Regression Inference

hypothesis tests slopes
Hypothesis Tests: Slopes
  • Given: Observed slope relating Education to Job Prestige = 2.47
  • Question: Can we generalize this to the population of all Americans?
    • How likely is it that this observed slope was actually drawn from a population with slope = 0?
  • Solution: Conduct a hypothesis test
  • Notation: slope = b, population slope = b
  • H0: Population slope b = 0
  • H1: Population slope b 0 (two-tailed test)
review slope hypothesis tests
Review: Slope Hypothesis Tests
  • What information lets us to do a hypothesis test?
  • Answer: Estimates of a slope (b) have a sampling distribution, like any other statistic
    • It is the distribution of every value of the slope, based on all possible samples (of size N)
  • If certain assumptions are met, the sampling distribution approximates the t-distribution
    • Thus, we can assess the probability that a given value of b would be observed, if b = 0
    • If probability is low – below alpha – we reject H0
review slope hypothesis tests1

If b=0, observed slopes should commonly fall near zero, too

Sampling distribution of the slope


If observed slope falls very far from 0, it is improbable that b is really equal to zero. Thus, we can reject H0.


Review: Slope Hypothesis Tests
  • Visually: If the population slope (b) is zero, then the sampling distribution would center at zero
    • Since the sampling distribution is a probability distribution, we can identify the likely values of b if the population slope is zero
bivariate regression assumptions
Bivariate Regression Assumptions
  • Assumptions for bivariate regression hypothesis tests:
  • 1. Random sample
    • Ideally N > 20
    • But different rules of thumb exist. (10, 30, etc.)
  • 2. Variables are linearly related
    • i.e., the mean of Y increases linearly with X
    • Check scatter plot for general linear trend
    • Watch out for non-linear relationships (e.g., U-shaped)
bivariate regression assumptions1
Bivariate Regression Assumptions
  • 3. Y is normally distributed for every outcome of X in the population
    • “Conditional normality”
  • Ex: Years of Education = X, Job Prestige (Y)
  • Suppose we look only at a sub-sample: X = 12 years of education
    • Is a histogram of Job Prestige approximately normal?
    • What about for people with X = 4? X = 16
  • If all are roughly normal, the assumption is met
bivariate regression assumptions2

Examine sub-samples at different values of X. Make histograms and check for normality.


Not very good

Bivariate Regression Assumptions
  • Normality:
bivariate regression assumptions3
Bivariate Regression Assumptions
  • 4. The variances of prediction errors are identical at different values of X
    • Recall: Error is the deviation from the regression line
    • Is dispersion of error consistent across values of X?
    • Definition: “homoskedasticity” = error dispersion is consistent across values of X
    • Opposite: “heteroskedasticity”, errors vary with X
  • Test: Compare errors for X=12 years of education with errors for X=2, X=8, etc.
    • Are the errors around line similar? Or different?
bivariate regression assumptions4

Examine error at different values of X. Is it roughly equal?

Bivariate Regression Assumptions
  • Homoskedasticity: Equal Error Variance

Here, things look pretty good.

bivariate regression assumptions5

At higher values of X, error variance increases a lot.

Bivariate Regression Assumptions
  • Heteroskedasticity: Unequal Error Variance

This looks pretty bad.

bivariate regression assumptions6
Bivariate Regression Assumptions
  • Notes/Comments:
  • 1. Overall, regression is robust to violations of assumptions
    • It often gives fairly reasonable results, even when assumptions aren’t perfectly met
  • 2. Variations of regression can handle situations where assumptions aren’t met
  • 3. But, there are also further diagnostics to help ensure that results are meaningful…
regression hypothesis tests
Regression Hypothesis Tests
  • If assumptions are met, the sampling distribution of the slope (b) approximates a T-distribution
  • Standard deviation of the sampling distribution is called the standard error of the slope (sb)
  • Population formula of standard error:
  • Where se2 is the variance of the regression error
regression hypothesis tests1
Regression Hypothesis Tests
  • Estimating se2 lets us estimate the standard error:
  • Now we can estimate the S.E. of the slope:
regression hypothesis tests2
Regression Hypothesis Tests
  • Finally: A t-value can be calculated:
    • It is the slope divided by the standard error
  • Where sb is the sample point estimate of the standard error
  • The t-value is based on N-2 degrees of freedom
regression confidence intervals
Regression Confidence Intervals
  • You can also use the standard error of the slope to estimate confidence intervals:
  • Where tN-2 is the t-value for a two-tailed test given a desired a-level
  • Example: Observed slope = 2.5, S.E. = .10
    • 95% t-value for 102 d.f. is approximately 2
  • 95% C.I. = 2.5 +/- 2(.10)
    • Confidence Interval: 2.3 to 2.7