Correlation regression
This presentation is the property of its rightful owner.
Sponsored Links
1 / 43

Correlation & Regression PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on
  • Presentation posted in: General

Correlation & Regression. Correlation. T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV i.e. differences between males and females (2 levels of the IV “gender”) on exam scores

Download Presentation

Correlation & Regression

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Correlation regression

Correlation & Regression


Correlation

Correlation

  • T-tests and ANOVA examine the mean differences between two + levels of one or more IV’s on a DV

    • i.e. differences between males and females (2 levels of the IV “gender”) on exam scores

  • What if instead of average differences we were more interested in the relationship between two variables?

    • “relationship” = how one variable changes as a function of another variable


Correlation1

Correlation

  • i.e. the relationship between anxiety prior to a medical procedure and the patient’s post-op recovery

  • This type of question concerns what is called a correlation

    • Correlation = relationship between two variables

      • NOTE – if we were looking at average post-op recovery (the DV) in groups both high and low in pre-op anxiety (2 levels of the IV anxiety), we would be looking at mean differences, and an ANOVA would be more appropriate than correlation


  • Correlation2

    Correlation

    • The easiest means of representing this relationship/correlation is via the use of a scatterplot

      • Scatterplot = a graph in which the individual data points are plotted in two-dimensions


    Correlation3

    Correlation

    • Predictor Variable = traditionally the variable on the x-axis (in this case “Depression”)

    • Criterion Variable = traditionally the variable on the y-axis (in this case “Pessimism”)

    • Best-Fit Line/Regression Line = the line that represents the area in space that each data point is minimally distant from/that best represents the data


    Correlation4

    Correlation

    • Regression Line

      • “Best” fit = line that minimizes average distance from all data points (i.e. residuals)

      • Residual = Amount that data point deviates from this line


    Correlation5

    Correlation

    • It is important to note that although the predictor is usually the variable on the x-axis, and the criterion on the y-axis, that often these definitions are not adhered to and the variables are named randomly

    • Also, because one variable is called the predictor does not mean that it “predicts” the criterion in the sense that it can tell you what the criterion is before it occurs

      • i.e. to say that depression predicts pessimism does not mean that depression comes first and causes you to be pessimistic!


    Correlation6

    Correlation

    • Correlation does not equal causation!

      • the only way that you can say that one variable predicts another in time is through the design of your experiment

        • if depression were assessed in January and pessimism were assessed in December, and the two were found to be related, then you can say that one predicts the other in time

        • statistical prediction ≠ “prediction”

      • if the two variables were measured at the same time, we do not know which one caused the other one


    Correlation7

    Correlation

    • to determine causation (that one variable caused another) we need to show several things:

      • that the predictor preceded the criterion in time (this also shows that the criterion did not cause the predictor)

      • that other variables did not cause both the criterion and the predictor at the same time, resulting in their relationship

        IVDV

        Var 1


    Correlation8

    Correlation

    • i.e. if we were studying the relationship (correlation) between two variables: the length of grass and ice cream consumption

      • If they were measured simultaneously it would be impossible to tell which caused which

      • If both were measured at two time points, July and December, we would find that they both increase and decrease at the same time (i.e. one does not seem to cause the other) – no causation

      • If we measured temperature as well, we would find that both are correlated because increases in temperature causes both, which explains why the increase and decrease at the same time


    Correlation9

    Correlation

    • Correlation is represented by the Pearson Product-Moment Correlation Coefficient (r)

      • can range from -1 to 1, where 1 represents a strong positive relationship, -1 a strong negative relationship, and 0 no relationship between the two variables

        • both strong positive and negative relationships are, none-the-less, robust relationships and are generally meaningful – a negative relationship is not bad

      • only used when the two variables are continuous/dimensional


    Correlation10

    Correlation

    • Positive Relationship (r = .82)

      • As BDI2TOT increases, MASQGDD also increases


    Correlation11

    Correlation

    • Negative Relationship (r = -.679)

      • As MASQAD increases, TMMSREP decreases


    Correaltion

    Correaltion

    • No Relationship (r = .00)

      • Information about Explanatory Flexibility tells you nothing about Emotional Insight


    Correlation12

    Correlation

    • Pearson’s r is heavily reliant on the covariance

    • covxy =

    • If variance =

    • …then cov is just average variability in both x and y


    Correlation13

    Correlation

    • Error variance = average amount each point deviates from best-fit line = standard error of the estimate =sy.x

    • sy.x =

    • If Ŷ is point on best fit line (predicted value of Y), then sy.x = standard deviation of residuals or variance of residuals/error = error variance


    Correlation14

    Correlation

    • Pearson’s r = covxy/sxsy

    • Correlation = amount of shared variability/√(total variability)

      • Since it’s like a %, r ranges from 0 – (-)1.00

      • In fact, by squaring r (r2) = % variability that is shared between x and y

        • Previous example of BDI2 and MASQGDD, r = .82; r2 = .67  67% of variance in BDI2 is predicted by MASQGDD


    Correlation15

    Correlation

    • Hypotheses in Correlation:

      • H0 = ρ = 0

        • ρ (rho) = correlation in population (parameter)

      • H1 = ρ ≠ 0


    Correlation16

    Correlation

    • Assumptions of Correlation (Pearson’s r)

      • Nonlinear/Curvilinear Relationships

        • If the relationship between the two variables is not linear, and is instead U-shaped or bell-shaped (like our normal distribution), our attempts at finding a best-fit line will fail, and it will seem as though our two variables are unrelated (r will approximate 0), when in fact the relationship exists, but is nonlinear


    Correlation17

    Correlation

    • Above is an example of a curvilinear relationship, although the two variables are clearly related, their correlation is only r = -.205

      • Note how the best-fit line does not represent the data points well


    Correlation18

    Correlation

    • Assumptions of Correlation (Pearson’s r)

      • Normality

        • Both variables must be normally distributed, otherwise correlation will appear smaller than it is

        • If our data is non-normal, correlation coefficients other than r can be used


    Correlation19

    Correlation

    • We can also calculate r if our data is ordinal instead of continuous/dimensional

      • Remember: data on an ordinal scale is ranked, which means that we can tell that one number is higher than another, but not how much higher (interval scales have this), and there is no zero point (ratio scales have this) – i.e. 1st place, 2nd place, etc. = ordinal data

      • Correlation here is represented by Spearman’s rs

        • Difference between r and rs is that rs requires that the data be monotonic, or constantly rising or falling – if data are arranged in rank order, they can only go up or down, you can’t go from 1st place to 9th place to 2nd place if the places are arranged in order


    Correlation20

    Correlation

    • Other correlation coefficients

      • The Point Biserial Correlation coefficient (rpb) - If one variable is continuous/dimensional and the other dichotomous (a nominal scale where the variable can take only two possible values)

        • Dichotomous variables – e.g. Gender (Male/Female), Yes/No answers, Race (if it is coded as Caucasian or Minority), etc.


    Correlation21

    Correlation

    • Other correlation coefficient

      • Phi (Φ) – when both variables are dichotomous


    Correlation22

    Correlation

    • Factors that bias correlation coefficients:

      • Range Restriction

        • Typically, restricting range reduces correlations

          Full Dataset (r = .82)Only BDI > 30 (r = .490)


    Correlation23

    Correlation

    • However, restricting range increases correlations if the relationship is curvilinear because it makes the variable linear

      Full Dataset (r = -.205)Only Var1 ≥ 5 (r = -.982)


    Correlation24

    Correlation

    • Problems of range restriction are common in psychological research, because researchers want their group to be as different from each other as possible to increase the effect sizes that they obtain

      • Remember: The formula for effect size for ANOVA (Cohen’s d) is the mean for Group 1 – the mean for Group 2 divided by the sp

    • To get highly different groups, researchers sample those high and low on a particular variable

      • I.e. comparing those highest on aggression to those lowest on aggression

      • This is identical to only looking at BDI2 scores higher than 30, when looking at the full range of scores, correlations will be more accurate


    Correlation25

    Correlation

    • Factors that bias correlation coefficients:

      • Heterogenous Subsamples

        • This is a problem when there is an interaction present (i.e. our age by gender interaction mentioned in the discussion of Factorial ANOVA)


    Correlation regression

    • If males’ performance increases as they age, and womens’ performance remains the same, when the two genders are averaged together and age and performance are correlated regardless of gender, the correlation will be smaller

      • Strong correlation of age and performance for males + weak correlation of age and performance for females = biased correlation when the two are added together


    Correlation26

    Correlation

    • Factors that bias correlation coefficients:

      • Outliers

        No Outliers (r = .989) Outlier (r = .522)


    Correlation27

    Correlation

    • Testing correlations for significance

      • just like t- and F-statistics, r-statistics can be tested for significance

      • just like t- and F-statistics, with increasing sample size (n), smaller correlations (r’s) will be significant

        • with 25 people, r ≥ .396 is significant at p < .05, with 1000 people you only need an r ≥ .062 (see Table E.2, page 515 in your text)


    Correlation28

    Correlation

    • Testing correlations for significance

      • the r-statistic is also its own, built-in effect size statistic

        • Cohen’s conventions for r: .1 = small, .3 = medium, and .5 = large effects

      • by squaring r (r2), you also get a relatively unbiased effect size estimate that is interpreted identically to η2 and ω2

        • Remember: η2 and ω2 represent the percent of variability in one variable accounted for by the other


    Correlation29

    Correlation

    • Testing correlations for significance

      • Therefore, if:

        • r = .5, p = .00001, you can state that your two variables are strongly (effect size) and reliably (p-value) related

        • r = .5, p = .65, you can conclude that your two variables are strongly related, but that you probably didn’t have enough subjects for this to be represented in your p-value

        • r = .1, p = .00001, you can conclude that large sample size inflated your p-value, and your variables are probably not related

        • r = .1, p = .65, you can conclude that your two variables are neither strongly nor reliably related


    Regression

    Regression

    • The best-fit line allows us to make educated guesses about what a score is on one variable given a score on the other

      • Extrapolate = make educated guesses what a score would be that is either higher or lower than any actual score obtained

      • Interpolate = make educated guesses what a score would be that is in the range of the scores obtained, but that was not actually obtained


    Regression1

    Regression

    • Range of scores on Depression = 0 – 49

    • Range of scores on Pessimism = 1 – 7

    • Extrapolation – What pessimism score would be associated with a depression score of 50? (~6.8)

    • Interpolation – What pessimism score would be associated with a depression score of 45? (~5.5)


    Regression2

    Regression

    • Interested in linear relationship between 2 variables = use correlation

    • Interested in linear relationship(s) between 3+ dimensional variables = regression

      • DV = Symptoms of paranoia

      • IV = Treatment vs. Control groups  ANOVA

        • IV discrete (dichotomous/polychotomous)

      • IV = # of sessions of treatment  Regression

        • IV dimensional/continuous


    Regression3

    Regression

    • DV = Criterion, IV’s = Predictors

    • Criterion = b1x1 +b2x2 + b3x3… + a

      • x1 = predictor #1; b1 = slope of x1 and DV; a = intercept Slope = rate of change

        • b = .75 = 1 pt. increase in IV associated with .75 pt. increase in DV

        • I.e. for every 1 pt. increase in pessimism, Dep increases .75 pt.


    Regression4

    Regression

    • Slope

      • Slope w/ raw data = b

        • I.e. b = .45 in prediction of GPA from IQ  1 pt. increase in IQ associated with ½ pt. increase in GPA

      • Slope w/ standardized data = β

        • Standardize data (i.e. convert to z-score) to compare slopes between experiments

        • β = bxs/sintercept

        • I.e. β = .53  1 s.d. increase in IQ associated with ½ s.d. increase in GPA

        • b more interpretable if scale of variables is meaningful

    • Intercept = value of DV when IV = 0

      • In previous ex., Pess = ~3 when Dep = 0, so a = ~3


    Regression5

    Regression

    • Regression can test:

      • The overall ability of all of your IV’s to predict your criterion (overall model/omnibus R2)

      • The ability of each IV to predict your criterion (b or β)

        • Each of these statistics is associated with a p-value & tested for significance

      • Can also be used to make predictions based on best-fit/regression line (less common)


    Regression6

    Regression

    • Hypotheses in Regression:

      • H0 = b/β/R2 (in population) = 0

      • H1 = b/β/R2 (in population) ≠ 0


    Regression7

    Regression

    • Assumptions of Regression

      • Linearity of Regression

        • Variables linearly related to one another

      • Normality in Arrays

        • Actual values of DV normally distributed around predicted values (i.e. regression line) – AKA regression line is good approximation of population parameter

      • Homogeneity of Variance in Arrays

        • Assumes that variance of criterion is equal for all levels of predictor(s)

        • Sound familiar?

          • Variance of DV equal for all levels of IV(s)


    Correlation regression1

    Correlation/Regression

    • Correlation & Regression can also answer other kinds of questions:

      • Can test difference between 2 independent r ’s/b ’s

        • ra & b > rc & d

        • Is the correlation between depression and anxiety using the BDI and BAI larger than the same correlation using the MASQ-AD and MASQ-AA subscales?


    Correlation regression2

    Correlation/Regression

    • Can test difference between 2 dependent r ‘s/b ‘s

      • ra & b > rb & c

      • Is the correlation between rumination and depression as high as between rumination and generalized anxiety?

      • Is the correlation between rumination and depression @ Time 1 the same at Time 2, 4 weeks later?

  • Don’t worry about how to do calculations by hand


  • Login