1 / 18

Testing Relationships between Variables

Statistics 111 - Lecture 17. Testing Relationships between Variables. Administrative Notes. Administrative Notes. Homework 5 due tomorrow Lecture on Wednesday will be review of entire course. Final Exam. Thursday from 10:40-12:10 It’ll be right here in this room

marly
Download Presentation

Testing Relationships between Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics 111 - Lecture 17 Testing Relationships between Variables Lecture 17 - Regression Testing

  2. Administrative Notes Administrative Notes • Homework 5 due tomorrow • Lecture on Wednesday will be review of entire course Lecture 17 - Regression Testing

  3. Final Exam • Thursday from 10:40-12:10 • It’ll be right here in this room • Calculators are definitely needed! • Single 8.5 x 11 cheat sheet (two-sided) allowed • I’ve put a sample final on the course website Lecture 17 - Regression Testing

  4. Outline • Review of Regression coefficients • Hypothesis Tests • Confidence Intervals • Examples Lecture 17 - Regression Testing

  5. Two Continuous Variables • Visually summarize the relationship between two continuous variables with a scatterplot • Numerically, we focus on best fit line (regression) Education and Mortality Draft Order and Birthday Mortality = 1353.16 - 37.62 · Education Draft Order = 224.9 - 0.226 · Birthday Lecture 17 - Regression Testing

  6. Best values for Regression Parameters • The best fit line has these values for the regression coefficients: • Also can estimate the average squared residual: Best estimate of slope  Best estimate of intercept  Stat 111 - Lecture 16 - Regression

  7. Significance of Regression Line • Does the regression line show a significant linear relationship between the two variables? • If there is not a linear relationship, then we would expect zero correlation (r = 0) • So the slope b should also be zero • Therefore, our test for a significant relationship will focus on testing whether our slope  is significantly different from zero H0 :  = 0 versus Ha :   0 Lecture 17 - Regression Testing

  8. Linear Regression • Best fit line is called Simple Linear Regression Model: • Coefficients:is the intercept and  is the slope • Other common notation: 0 for intercept, 1 for slope • Our Y variable is a linear function of the X variable but we allow for error (εi) in each prediction • We approximate the error by using the residual Observed Yi Predicted Yi = + Xi Stat 111 - Lecture 16 - Regression

  9. Test Statistic for Slope • Our test statistic for the slope is similar in form to all the test statistics we have seen so far: • The standard error of the slope SE(b) has a complicated formula that requires some matrix algebra to calculate • We will not be doing this calculation manually because the JMP software does this calculation for us! Stat 111 - Lecture 16 - Regression

  10. Example: Education and Mortality Stat 111 - Lecture 16 - Regression

  11. Confidence Intervals for Coefficients • JMP output also gives the information needed to make confidence intervals for slope and intercept • 100·C % confidence interval for slope  : b +/- tn-2* SE(b) • The multiple t* comes from a t distribution with n-2 degrees of freedom • 100·C % confidence interval for intercept  : a +/- tn-2* SE(a) • Usually, we are less interested in intercept  but it might be needed in some situations Lecture 17 - Regression Testing

  12. Confidence Intervals for Example • We have n = 60, so our multiple t* comes from a t distribution with d.f. = 58. For a 95% C.I., t* = 2.00 • 95 % confidence interval for slope  : -37.6 ± 2.0*8.307 = (-54.2,-21.0) Note that this interval does not contain zero! • 95 % confidence interval for intercept  : 1353± 2.0*91.42 = (1170,1536) Lecture 17 - Regression Testing

  13. Another Example: Draft Lottery • Is the negative linear association we see between birthday and draft order statistically significant? p-value Lecture 17 - Regression Testing

  14. Another Example: Draft Lottery • p-value < 0.0001 so we reject null hypothesis and conclude that there is a statistically significant linear relationship between birthday and draft order • Statistical evidence that the randomization was not done properly! • 95 % confidence interval for slope  : -.23±1.98*.05 = (-.33,-.13) • Multiple t* = 1.98 from t distribution with n-2 = 363 d.f. • Confidence interval does not contain zero, which we expected from our hypothesis test Lecture 17 - Regression Testing

  15. Education Example • Dataset of 78 seventh-graders: relationship between IQ and GPA • Clear positive association between IQ and grade point average Lecture 17 - Regression Testing

  16. Education Example • Is the positive linear association we see between GPA and IQ statistically significant? p-value Lecture 17 - Regression Testing

  17. Education Example • p-value < 0.0001 so we reject null hypothesis and conclude that there is a statistically significant positive relationship between IQ and GPA • 95 % confidence interval for slope  : .101±1.99*.014 = (.073,.129) • Multiple t* = 1.99 from t distribution with n-2 = 76 d.f. • Confidence interval does not contain zero, which we expected from our hypothesis test Lecture 17 - Regression Testing

  18. Next Class - Lecture 18 • Review of course material Lecture 17 - Regression Testing

More Related