Correlation and regression
Correlation and Regression. Statistics 2126. Introduction. Means etc are of course useful We might also wonder, “how do variables go together?” IQ is a great example It goes together with so much stuff. A scatterplot.

  • Means etc are of course useful

  • We might also wonder, “how do variables go together?”

  • IQ is a great example

  • It goes together with so much stuff

A scatterplot
  • You tend to put the predictor on the x axis and the predicted on the y, though this is not a hard and fast rule

  • A scatterplot is a pretty good EDA tool too eh

  • Pick an appropriate scale for you axes

  • Plot the (x,y) pairs

So what does it mean
  • If, as one variable increases, the other variable increases we have a positive association

  • If, as one goes up, the other goes down, we have a negative association

  • There could be no association at all

Linear relationships
  • BTW, I am only talking about straight line relationships

  • Not curvilinear

  • Say like the Yerkes Dotson Law, as far as a the stuff we will talk about, there is no relationship, yet we know there is

The strength is important too
  • The more the points cluster around a line, the stronger the relationship is

  • Height and weight vs height in cm vs height in inches

  • We need something that ignores the units though, so if I did IQ and your income in real money or IQ and your income in that worthless stuff they use across the river, the numbers would be the same

Properties of r
  • -1.00 <= r <= +1.00

  • The sign indicates ONLY the direction (think of it as going uphill or downhill)

  • |r| indicates the strength

  • So, r = -.77 is a stronger correlation than r = .40

Check these out
  • All of these have have the same correlation

  • R = .7 in each case

  • Note the problem of outliers

  • Note the problem of two subpopulations

Remember this
  • Correlation is not causation

  • I said, correlation is not causation

  • Let me say it again, correlation is not causation

  • Birth control and the toaster method

Wouldn t it be nice
  • If we could predict y from x

  • You know, like an equation

  • Remember that in school, you would get an equation, plug in the x and get the y

  • Well surprise surprise, there is a method like this in statistics

If we are going to predict with a line
  • Well, we will make mistakes

  • We will want to minimize those mistakes

There is a problem a common problem
  • Those prediction errors or residuals (e) sum to 0

  • Damn

  • Though guess what we could do…

  • Why square them of course

  • So we get a line that minimizes squared residuals

In general the equation of the line is
Y intercept


Y hat (predicted y)

Correlation and regression

  • With a regression line you can predict y from x

  • Just because it says that some value = a linear combination of numbers it does not mean that there is necessarily a causal link

  • Don’t go outside the range

  • Linear only