1 / 21

# Regression Wisdom - PowerPoint PPT Presentation

Regression Wisdom. Getting to Know Your Scatterplot and Residuals. Important Terms. Extrapolation (203) Outlier (205) Leverage (206) Influential Point (206) Lurking Variable (208). Residuals.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Regression Wisdom' - kalkin

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Regression Wisdom

Getting to Know Your Scatterplot and Residuals

• Extrapolation (203)

• Outlier (205)

• Leverage (206)

• Influential Point (206)

• Lurking Variable (208)

• Recall – Residuals are the difference between data values and the corresponding values predicted by the regression model

• Residual = Observed Value – Predicted Value

e =

(page 172)

• We want our plot of residuals to be boring

• It should have no structure, direction, shape, none of that stuff.

• When it does, there is something else going on in the data that explains the variation of the two variables.

- We can form subsets of the same population to try and achieve a better analysis of the data.

- Sometimes the easiest way to achieve this is to examine a plot or histogram of residuals

• You can perform regression analysis on each subset of the larger population, noting correlation and all appropriate summary statistics for each subset.

• Extrapolations require the very questionable assumption that nothing about the relationship between x and y changes even at extreme values of x and beyond

• Our Linear Model:

• Plug in a new x, it gives you a predicted

• But the farther the new x-value is from , the less trust we can place in the predicted y value.

• Once we venture into new x territory such a prediction is called an extrapolation

• If your x variable is Time, extrapolation becomes a prediction about the future!

• Example: Mid-1970s, oil cost \$17 a barrel in 2005 dollars

• But suddenly, within a few years, the price skyrocketed to over \$40 a barrel

• If you used this data for your model, you might be predicting oil prices today in the hundreds upon hundreds of dollars per barrel while if you had done your analysis before the spike in prices, you might still be predicting around 17\$ a barrel.

• Outliers can have big impacts on your fitted regression line.

• Points with large residuals always deserve special attention.

• A data point with an unusually large x-value from the mean is said to have high leverage

• High Leverage doesn’t mean the point changes the overall picture.

• If the point lines up with the pattern of other points, including it doesn’t change our estimate of the line

• But by sitting so far from it may strengthen the relationship, inflate the correlation and R-Squared

• A point is influentialif omitting it from the analysis gives a very different model

• Influence depends on both leverage and residual

• A case with high leverage whose y-value sits right on the line is not influential.

• Removing this point may not change the slope but may change R-Squared

• A point is influentialif omitting it from the analysis gives a very different model

• Influence depends on both leverage and residual

• A case with modest leverage but a very large residual can be influential.

• With enough leverage, the regression line can be pulled right to it. Then its highly influential but will have a small residual

• A point is influentialif omitting it from the analysis gives a very different model

• Influence depends on both leverage and residual

• The only thing to do is to do your analysis twice:

• Once with the point

• Once omitting the point

Does the unusual point have high-leverage, a large residual, and is it influential?

High Leverage

Not Influential

Small Residual

Not high leverage

Not influential

Large Residual

High Leverage

Influential

Not Large Residual

Lurking Variables, Causation and is it influential?

• No matter how strong the association

• No matter how large the value

• No matter how straight the line

• There is NO way to conclude from regression alone that one variable causes the other.

• There may always be a lurking variable that causes the apparent association

Lurking Variable Example and is it influential?

• The scatterplot shows the Life Expectancy of men and women in 41 different countries

• These values are plotted against the square root of Doctors per person in that country.

Lurking Variable Example and is it influential?

• There is a strong positive correlation,

• This confirms our expectation that more doctors per person improves healthcare, leading to longer lifetimes and greater life expectancy.

Lurking Variable Example and is it influential?

• Can we conclude though that doctors cause greater life expectancy? Perhaps, but increasing numbers of doctors and greater life expectancy may both be results of a larger change.

Lurking Variable Example and is it influential?

• Here is a similar looking scatterplot now comparing life expectancy to the square root of TVs per person.

• This is an even stronger association!

A Final Note and is it influential?

• Beware of scatterplots of statistics of summarized data.

• For example,

### Homework and is it influential?

Pg 214, #1, 3, 4, 8, 10