week 11 correlation linear regression
Download
Skip this Video
Download Presentation
Week 11 Correlation & Linear Regression

Loading in 2 Seconds...

play fullscreen
1 / 16

Week 11 Correlation & Linear Regression - PowerPoint PPT Presentation


  • 68 Views
  • Uploaded on

Week 11 Correlation & Linear Regression. Administrative Tasks. Turn in your HW Sample Research Paper Buckle your seatbelts we have a lot to cover. Scatterplots. Y is the vertical axis X is the horizontal axis Dots are observations The intersection of the IV & DV for each unit of analysis

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Week 11 Correlation & Linear Regression' - lysandra-foley


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
administrative tasks
Administrative Tasks
  • Turn in your HW
  • Sample Research Paper
  • Buckle your seatbelts we have a lot to cover
scatterplots
Scatterplots
  • Y is the vertical axis
  • X is the horizontal axis
  • Dots are observations
    • The intersection of the IV & DV for each unit of analysis
  • What can you infer from about the IV/DV relationship from the scatterplot on the left?
  • Presentation can be deceiving
three factors to consider when analyzing scatterplots
Three factors to consider when analyzing scatterplots
  • Directionality
    • Do the dots appear to flow in a particular direction?
    • Or, does the scatterplot look more like white noise (like a TV when it doesn’t have a signal)?
    • The more it looks like white noise the less the two variables are related
  • Clustering
    • Are the majority of the dots in a small area of the graph?
    • How would this impact our confidence in predictions outside of this area?
  • Outliers
    • Are there cases that differ markedly from the overall pattern of the dots? i.e. not near the cluster or contrary to the directionality
    • Which observations are these? How influential are these cases?
pearsons s correlation coefficient quantifying scatter
Pearsons’s correlation coefficient: quantifying scatter
  • To estimate the strength of the relationship between two interval level variables we can calculate Pearson’s correlation coefficient (r)
    • Values range from -1 to +1
      • -1 = perfectly negative association
      • +1 = perfectly positive association
      • 0 = no association
  • The perks of Pearson:
    • Direction
    • Magnitude (of predictive power)
    • Impervious to the units in which the variables are measured
calculating r
Calculating r
  • “That’s scarier than anything I saw on Halloween!”
  • 1. Subtract each observations x value from x’s mean and multiply it by the difference between its y value and the mean of y
  • 2. Do that for each observation and sum them all together
  • 3. Divide that sum by n-1 times the s.d. of x & the s.d. of y
the real good news you don t have to actually do that
The real good news? You don’t have to actually do that
  • Excel is your best buddy & will do all the hard work for you
  • Not only that, but Excel will also allow you to create a scatterplot and show you a line depicting this relationship
  • Let’s check it out:
tell me more about this line
“Tell me more about this line!”
  • Slope:
    • Change in Y divided by change in X
  • Intercept
    • Value of Y when X = O
  • Error
    • Distance between the line’s Y value and a data point’s Y value
  • The line minimizes the sum of all the squared errors
  • “I love line!”
recipe for creating the line
Recipe for creating the line
  • The line rarely, if ever, passes through every point
  • There is an error component
  • Thus, the actual values of Y can be explained by the formula:
  • Y=α+βX+ε
    • α - Alpha – an intercept component to the model that represents the models value for Y when X=0
    • β - Beta – a coefficient that loosely denotes the nature of the relationship between Y and X and more specifically denotes the slope of the linear equation that specifies the model
    • ε - Epsilon – a term that represents the errors associated with the model
this is ordinary least squares ols or linear regression
This is ordinary least squares (OLS) or linear regression
  • The Goal:
    • Minimize the sum of the squared errors
  • Consider the impact of outliers
  • How many ways can a line be created?
not gonna do it wouldn t be prudent
“Not gonna do it. Wouldn’t be prudent.”
  • You know the trick:
    • It’s not as hard as it looks
  • You are really comparing Y’s deviations from it’s mean alongside X’s deviation from it’s mean
    • See the formula at the bottom of pg. 331
  • Ideally the Xi’s move in sync with the Yi’s divergence from the mean
    • This is covariation
the really good news excel does it all for you
The Really Good News? Excel does it all for you!
  • Enter the data into Excel
  • Click the “Data” tab at the top
  • In the Data tab look all the way to the right and click on “Data Analysis”
  • In the Analysis Tools menu click on Regression and hit Ok
  • Highlight the appropriate columns in the “Input Y Range” & Input X Range” fields
  • Check the labels option & hit Ok
  • Instant regression results!
what to look for when examining regression output
What to look for when examining regression output
  • Beta coefficient:
    • Directionality
    • Size of the coefficient
    • Standard Error
    • Statistical Significance
  • Constant
    • Far less important than Beta
    • When X = O what would we expect Y to be?
      • Is X ever O?
  • Goodness of fit
    • How much of the variation in Y is actually explained by X?
    • How “good” does your model “fit” the actual values of Y?
    • R-squared (the coefficient of determination) provides an estimate
r strikes back
r Strikes Back!
  • Recall that r, Pearson’s correlation coefficient, measures the degree to which two variables co-vary
  • With OLS the:
    • Constant tells us where the line starts
    • Beta tells us how the line slopes
    • R-squared tells us the % of the variation in Y our model predicts
      • Range 0-1
        • O = Predicts none of the variation
        • 1 = Predicts all of the variation
ad