1 / 21

Regression Wisdom

Regression Wisdom. Getting to Know Your Scatterplot and Residuals. Important Terms. Extrapolation (203) Outlier (205) Leverage (206) Influential Point (206) Lurking Variable (208). Residuals.

kalkin
Download Presentation

Regression Wisdom

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression Wisdom Getting to Know Your Scatterplot and Residuals

  2. Important Terms • Extrapolation (203) • Outlier (205) • Leverage (206) • Influential Point (206) • Lurking Variable (208)

  3. Residuals • Recall – Residuals are the difference between data values and the corresponding values predicted by the regression model • Residual = Observed Value – Predicted Value e = (page 172)

  4. When Residuals Aren’t Random • We want our plot of residuals to be boring • It should have no structure, direction, shape, none of that stuff. • When it does, there is something else going on in the data that explains the variation of the two variables.

  5. Sifting Residuals for Groups - We can form subsets of the same population to try and achieve a better analysis of the data. - Sometimes the easiest way to achieve this is to examine a plot or histogram of residuals

  6. Sifting and Subsets • You can perform regression analysis on each subset of the larger population, noting correlation and all appropriate summary statistics for each subset.

  7. Extrapolation • Extrapolations require the very questionable assumption that nothing about the relationship between x and y changes even at extreme values of x and beyond • Our Linear Model: • Plug in a new x, it gives you a predicted • But the farther the new x-value is from , the less trust we can place in the predicted y value. • Once we venture into new x territory such a prediction is called an extrapolation

  8. Extrapolation • If your x variable is Time, extrapolation becomes a prediction about the future! • Example: Mid-1970s, oil cost $17 a barrel in 2005 dollars • This is what it had cost for about 20 years! • But suddenly, within a few years, the price skyrocketed to over $40 a barrel • If you used this data for your model, you might be predicting oil prices today in the hundreds upon hundreds of dollars per barrel while if you had done your analysis before the spike in prices, you might still be predicting around 17$ a barrel.

  9. Outliers, Leverage, Influence • Outliers can have big impacts on your fitted regression line. • Points with large residuals always deserve special attention. • A data point with an unusually large x-value from the mean is said to have high leverage • High Leverage doesn’t mean the point changes the overall picture. • If the point lines up with the pattern of other points, including it doesn’t change our estimate of the line • But by sitting so far from it may strengthen the relationship, inflate the correlation and R-Squared

  10. Outliers, Leverage, Influence • A point is influentialif omitting it from the analysis gives a very different model • Influence depends on both leverage and residual • A case with high leverage whose y-value sits right on the line is not influential. • Removing this point may not change the slope but may change R-Squared

  11. Outliers, Leverage, Influence • A point is influentialif omitting it from the analysis gives a very different model • Influence depends on both leverage and residual • A case with modest leverage but a very large residual can be influential. • With enough leverage, the regression line can be pulled right to it. Then its highly influential but will have a small residual

  12. Outliers, Leverage, Influence • A point is influentialif omitting it from the analysis gives a very different model • Influence depends on both leverage and residual • The only thing to do is to do your analysis twice: • Once with the point • Once omitting the point

  13. Does the unusual point have high-leverage, a large residual, and is it influential? High Leverage Not Influential Small Residual Not high leverage Not influential Large Residual High Leverage Influential Not Large Residual

  14. Lurking Variables, Causation • No matter how strong the association • No matter how large the value • No matter how straight the line • There is NO way to conclude from regression alone that one variable causes the other. • There may always be a lurking variable that causes the apparent association

  15. Lurking Variable Example • The scatterplot shows the Life Expectancy of men and women in 41 different countries • These values are plotted against the square root of Doctors per person in that country.

  16. Lurking Variable Example • There is a strong positive correlation, • This confirms our expectation that more doctors per person improves healthcare, leading to longer lifetimes and greater life expectancy.

  17. Lurking Variable Example • Can we conclude though that doctors cause greater life expectancy? Perhaps, but increasing numbers of doctors and greater life expectancy may both be results of a larger change.

  18. Lurking Variable Example • Here is a similar looking scatterplot now comparing life expectancy to the square root of TVs per person. • This is an even stronger association!

  19. A Final Note • Beware of scatterplots of statistics of summarized data. • For example,

  20. Homework Pg 214, #1, 3, 4, 8, 10

More Related