1 / 19

Understanding Linear Regressions: Models, Slopes, and Predictions

Learn about linear regression models, the slope, predicted values, residuals, and how to choose the best line. Explore interpolation, extrapolation, and the quality of the line through residual plots and R-squared. Discover the conditions for regression and how to test the quality of a regression.

schreck
Download Presentation

Understanding Linear Regressions: Models, Slopes, and Predictions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LINEAR REGRESSIONS: • About lines • Line as a model: • Understanding the slope • Predicted values • Residuals • How to pick a line? • Least squares criterion • “Point of Means” • Slope is really just correlation • Variance and what happens to it • Variance of original y, predicted y, residuals • “R squared” • Residual plots • More examples

  2. About Lines • y = mx + b • y = a + bx slope = b y intercept a Slope When x goes up 1 unit, y goes up b units.

  3. What can we do with the line? • Understand the relationship • ( Study the slope! ) • Predict values of Y based on values of X • Interpolation • Extrapolation

  4. Extrapolation • Interpolation: Using a model to estimate Y • for an X value within the range on which the model was based. • Extrapolation: Estimating based on an X value outside the range. • Interpolation Good, Extrapolation Bad.

  5. How good is the line? • RESIDUALS: • For each actual point, we can compute • Actual Y • Predicted Y • Residual ( = actual minus predicted ) • Small residuals  GOOD • No pattern in residuals  GOOD

  6. How to choose the line • Theory: Least Squares Criterion • Choose the line with the smallest SUM OF SQUARES of the residuals. • Or: Make the STANDARD DEVIATION of the residuals be as small as possible. • Practice: • Line of means • Slope is really just r --- • When X goes up one SD, Y goes up r SDs • Really, b = r sy / sx

  7. Linear models and non-linear models • Model A: Model B: • y = a + bx + error y = a x1/2 + error • Model B has smaller errors. Is it a better model?

  8. aa opas asl poasie ;aaslkf 4-9043578 • y = 453209)_(*_n &*^(*LKH l;j;)(*&)(*& + error • This model has even smaller errors. In fact, zero errors. • Tradeoff: Small errors vs. complexity. • (We’ll only consider linear models.)

  9. The “Regression” Effect • A preschool program attempts to boost children’s reading scores. • Children are given a pre-test and a post-test. • Pre-test: mean score ≈ 100, SD ≈ 10 • Post-test: mean score ≈ 100, SD ≈ 10 • The program seems to have no effect.

  10. A closer look at the data shows a surprising result: • Children who were below average on the pre-test tended to gain about 5-10 points on the post-test • Children who were above average on the pre-test tended to lose about 5-10 points on the post-test. • Maybe we should provide the program only for children whose pre-test scores are below average?

  11. Fact: • In most test–retest and analogous situations, the bottom group on the first test will on average tend to improve, while the top group on the first test will on average tend to do worse. • Other examples: • • Students who score high on the midterm tend on average to score high on the final, but not as high. • • An athlete who has a good rookie year tends to slump in his or her second year. (“Sophomore jinx”, "Sports Illustrated Jinx") • • Tall fathers tend to have sons who are tall, but not as tall. (Galton’s original example!)

  12. It works the other way, too: • • Students who score high on the final tend to have scored high on the midterm, but not as high. • • Tall sons tend to have fathers who are tall, but not as tall. • • Students who did well on the post-test showed improvements, on average, of 5-10 points, while students who did poorly on the post-test dropped an average of 5-10 points.

  13. Students can do well on the pretest… • -- because they are good readers, or • -- because they get lucky. • The good readers, on average, do exactly as well on the post-test. The lucky group, on average, score lower. • Students can get unlucky, too, but fewer of that group are among the high-scorers on the pre-test. • So the top group on the pre-test, on average, tends to score a little lower on the post-test.

  14. Conditions for regression • “Straight enough” condition (linearity) • Errors are mostly independent of X • Errors are mostly independent of anything else you can think of • Errors are more-or-less normally distributed

  15. How to test the quality of a regression— • Plot the residuals. • Pattern bad, no pattern good • R2 • How sure are you of the coefficients ?

  16. 5 5.68 5 4.74 5 5.73 8 6.89

More Related