- 48 Views
- Uploaded on
- Presentation posted in: General

Correlation and Regression

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Correlation and Regression

Statistics 2126

- Means etc are of course useful
- We might also wonder, “how do variables go together?”
- IQ is a great example
- It goes together with so much stuff

- You tend to put the predictor on the x axis and the predicted on the y, though this is not a hard and fast rule
- A scatterplot is a pretty good EDA tool too eh
- Pick an appropriate scale for you axes
- Plot the (x,y) pairs

- If, as one variable increases, the other variable increases we have a positive association
- If, as one goes up, the other goes down, we have a negative association
- There could be no association at all

- BTW, I am only talking about straight line relationships
- Not curvilinear
- Say like the Yerkes Dotson Law, as far as a the stuff we will talk about, there is no relationship, yet we know there is

- The more the points cluster around a line, the stronger the relationship is
- Height and weight vs height in cm vs height in inches
- We need something that ignores the units though, so if I did IQ and your income in real money or IQ and your income in that worthless stuff they use across the river, the numbers would be the same

- -1.00 <= r <= +1.00
- The sign indicates ONLY the direction (think of it as going uphill or downhill)
- |r| indicates the strength
- So, r = -.77 is a stronger correlation than r = .40

- All of these have have the same correlation
- R = .7 in each case
- Note the problem of outliers
- Note the problem of two subpopulations

- Correlation is not causation
- I said, correlation is not causation
- Let me say it again, correlation is not causation
- Birth control and the toaster method

- If we could predict y from x
- You know, like an equation
- Remember that in school, you would get an equation, plug in the x and get the y
- Well surprise surprise, there is a method like this in statistics

- Well, we will make mistakes
- We will want to minimize those mistakes

- Those prediction errors or residuals (e) sum to 0
- Damn
- Though guess what we could do…
- Why square them of course
- So we get a line that minimizes squared residuals

Y intercept

slope

Y hat (predicted y)

- With a regression line you can predict y from x
- Just because it says that some value = a linear combination of numbers it does not mean that there is necessarily a causal link
- Don’t go outside the range
- Linear only