Regression wisdom
Download
1 / 21

Regression Wisdom - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

Regression Wisdom. Getting to Know Your Scatterplot and Residuals. Important Terms. Extrapolation (203) Outlier (205) Leverage (206) Influential Point (206) Lurking Variable (208). Residuals.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Regression Wisdom' - kalkin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Regression wisdom

Regression Wisdom

Getting to Know Your Scatterplot and Residuals


Important terms
Important Terms

  • Extrapolation (203)

  • Outlier (205)

  • Leverage (206)

  • Influential Point (206)

  • Lurking Variable (208)


Residuals
Residuals

  • Recall – Residuals are the difference between data values and the corresponding values predicted by the regression model

  • Residual = Observed Value – Predicted Value

    e =

    (page 172)


When residuals aren t random
When Residuals Aren’t Random

  • We want our plot of residuals to be boring

  • It should have no structure, direction, shape, none of that stuff.

  • When it does, there is something else going on in the data that explains the variation of the two variables.


Sifting residuals for groups
Sifting Residuals for Groups

- We can form subsets of the same population to try and achieve a better analysis of the data.

- Sometimes the easiest way to achieve this is to examine a plot or histogram of residuals


Sifting and subsets
Sifting and Subsets

  • You can perform regression analysis on each subset of the larger population, noting correlation and all appropriate summary statistics for each subset.


Extrapolation
Extrapolation

  • Extrapolations require the very questionable assumption that nothing about the relationship between x and y changes even at extreme values of x and beyond

  • Our Linear Model:

    • Plug in a new x, it gives you a predicted

  • But the farther the new x-value is from , the less trust we can place in the predicted y value.

  • Once we venture into new x territory such a prediction is called an extrapolation


Extrapolation1
Extrapolation

  • If your x variable is Time, extrapolation becomes a prediction about the future!

  • Example: Mid-1970s, oil cost $17 a barrel in 2005 dollars

  • This is what it had cost for about 20 years!

  • But suddenly, within a few years, the price skyrocketed to over $40 a barrel

  • If you used this data for your model, you might be predicting oil prices today in the hundreds upon hundreds of dollars per barrel while if you had done your analysis before the spike in prices, you might still be predicting around 17$ a barrel.


Outliers leverage influence
Outliers, Leverage, Influence

  • Outliers can have big impacts on your fitted regression line.

  • Points with large residuals always deserve special attention.

  • A data point with an unusually large x-value from the mean is said to have high leverage

  • High Leverage doesn’t mean the point changes the overall picture.

  • If the point lines up with the pattern of other points, including it doesn’t change our estimate of the line

  • But by sitting so far from it may strengthen the relationship, inflate the correlation and R-Squared


Outliers leverage influence1
Outliers, Leverage, Influence

  • A point is influentialif omitting it from the analysis gives a very different model

  • Influence depends on both leverage and residual

  • A case with high leverage whose y-value sits right on the line is not influential.

  • Removing this point may not change the slope but may change R-Squared


Outliers leverage influence2
Outliers, Leverage, Influence

  • A point is influentialif omitting it from the analysis gives a very different model

  • Influence depends on both leverage and residual

  • A case with modest leverage but a very large residual can be influential.

  • With enough leverage, the regression line can be pulled right to it. Then its highly influential but will have a small residual


Outliers leverage influence3
Outliers, Leverage, Influence

  • A point is influentialif omitting it from the analysis gives a very different model

  • Influence depends on both leverage and residual

  • The only thing to do is to do your analysis twice:

  • Once with the point

  • Once omitting the point


Does the unusual point have high leverage a large residual and is it influential
Does the unusual point have high-leverage, a large residual, and is it influential?

High Leverage

Not Influential

Small Residual

Not high leverage

Not influential

Large Residual

High Leverage

Influential

Not Large Residual


Lurking variables causation
Lurking Variables, Causation and is it influential?

  • No matter how strong the association

  • No matter how large the value

  • No matter how straight the line

  • There is NO way to conclude from regression alone that one variable causes the other.

  • There may always be a lurking variable that causes the apparent association


Lurking variable example
Lurking Variable Example and is it influential?

  • The scatterplot shows the Life Expectancy of men and women in 41 different countries

  • These values are plotted against the square root of Doctors per person in that country.


Lurking variable example1
Lurking Variable Example and is it influential?

  • There is a strong positive correlation,

  • This confirms our expectation that more doctors per person improves healthcare, leading to longer lifetimes and greater life expectancy.


Lurking variable example2
Lurking Variable Example and is it influential?

  • Can we conclude though that doctors cause greater life expectancy? Perhaps, but increasing numbers of doctors and greater life expectancy may both be results of a larger change.


Lurking variable example3
Lurking Variable Example and is it influential?

  • Here is a similar looking scatterplot now comparing life expectancy to the square root of TVs per person.

  • This is an even stronger association!


A final note
A Final Note and is it influential?

  • Beware of scatterplots of statistics of summarized data.

  • For example,


Homework

Homework and is it influential?

Pg 214, #1, 3, 4, 8, 10


ad