Correlation and regression
Download
1 / 20

Correlation and Regression - PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on

Correlation and Regression. Statistics 2126. Introduction. Means etc are of course useful We might also wonder, “how do variables go together?” IQ is a great example It goes together with so much stuff. A scatterplot.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Correlation and Regression' - liana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Introduction
Introduction

  • Means etc are of course useful

  • We might also wonder, “how do variables go together?”

  • IQ is a great example

  • It goes together with so much stuff


A scatterplot
A scatterplot

  • You tend to put the predictor on the x axis and the predicted on the y, though this is not a hard and fast rule

  • A scatterplot is a pretty good EDA tool too eh

  • Pick an appropriate scale for you axes

  • Plot the (x,y) pairs


So what does it mean
So what does it mean

  • If, as one variable increases, the other variable increases we have a positive association

  • If, as one goes up, the other goes down, we have a negative association

  • There could be no association at all


Linear relationships
Linear relationships

  • BTW, I am only talking about straight line relationships

  • Not curvilinear

  • Say like the Yerkes Dotson Law, as far as a the stuff we will talk about, there is no relationship, yet we know there is


The strength is important too
The strength is important too

  • The more the points cluster around a line, the stronger the relationship is

  • Height and weight vs height in cm vs height in inches

  • We need something that ignores the units though, so if I did IQ and your income in real money or IQ and your income in that worthless stuff they use across the river, the numbers would be the same



Properties of r
Properties of r

  • -1.00 <= r <= +1.00

  • The sign indicates ONLY the direction (think of it as going uphill or downhill)

  • |r| indicates the strength

  • So, r = -.77 is a stronger correlation than r = .40




Check these out
Check these out..

  • All of these have have the same correlation

  • R = .7 in each case

  • Note the problem of outliers

  • Note the problem of two subpopulations


Remember this
Remember this

  • Correlation is not causation

  • I said, correlation is not causation

  • Let me say it again, correlation is not causation

  • Birth control and the toaster method


Wouldn t it be nice
Wouldn’t it be nice

  • If we could predict y from x

  • You know, like an equation

  • Remember that in school, you would get an equation, plug in the x and get the y

  • Well surprise surprise, there is a method like this in statistics


If we are going to predict with a line
If we are going to predict with a line

  • Well, we will make mistakes

  • We will want to minimize those mistakes


There is a problem a common problem
There is a problem, a common problem

  • Those prediction errors or residuals (e) sum to 0

  • Damn

  • Though guess what we could do…

  • Why square them of course

  • So we get a line that minimizes squared residuals



In general the equation of the line is
In general the equation of the line is…..

Y intercept

slope

Y hat (predicted y)



Correlation and regression
So….

  • With a regression line you can predict y from x

  • Just because it says that some value = a linear combination of numbers it does not mean that there is necessarily a causal link

  • Don’t go outside the range

  • Linear only