1 / 19

Correlation and Regression

Correlation and Regression. Sporiš Goran, PhD. http://kif.hr/predmet/mki http://www.science4performance.com/. Correlation and Regression. Correlation : measure of the strength of an association (relationship) between continuous variables

tausiq
Download Presentation

Correlation and Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlation and Regression Sporiš Goran, PhD. http://kif.hr/predmet/mki http://www.science4performance.com/

  2. Correlation and Regression • Correlation: measure of the strength of an association (relationship) between continuous variables • Regression: predicting the value of a continuous dependent variable (y) based on the value of a continuous independent variable (x)

  3. Correlation statistic - r • Values of r Range from –1 to +1 • -1 is a perfect negative association (correlation), meaning that as the scores of one variable increase, the scores of the other variable decrease at exactly the same rate • +1 is a perfect positive association, meaning that both variables go up or down together, in lock-step • Intermediate values of r (close to zero) indicate weak or no relationship • Zero r (never in real life) means no relationship – that the variables do not change or “vary” together except by chance.

  4. Two “scattergrams” – each with a “cloud” of dots Y Y NOTE: Dependent variable (Y) is always placed on the vertical axis r = +1 1 2 3 4 5 6 1 2 3 4 5 6 r = - 1 NOTE: Independent variable (X) is always placed on the horizontal axis X X 1 2 3 4 5 1 2 3 4 5 Can changes in one variable be predicted by changes in the other?

  5. Can changes in one variable be predicted by changes in the other? Y 1 2 3 4 5 6 r = 0 X 1 2 3 4 5

  6. “Line of best fit” Y • To arrive at a value of “r” a straight line is placed through the cloud of dots (the actual “observed” data) • Linear relationship between the variables is assumed • This line is placed so that the overall distance between itself and the dots is minimized 1 2 3 4 5 6 X 1 2 3 4 5 2

  7. “Line of best fit” • To place this line in the cloud of dots it is necessary to compute a and b for each observed (known) value of x. a = where the line crosses the y axis b = “slope”, orno. of units that the value of y changes when x changes one unit • When x is the “independent variable”: a = y - bx(x -x)(y -y) b = ------------------(x -x)2

  8. Y y = a + bx a = where the line crosses the y axis b = “slope”, ornumber of units that y changes when x changes one unit 1 2 3 4 5 6 b a X 1 2 3 4 5

  9. How closely will a straight line fit the “observed” (actual) data? Y Y 1 2 3 4 5 6 1 2 3 4 5 6 +1.0 - 1.0 X X 1 2 3 4 5 1 2 3 4 5 4 A perfect fit yields an r of +1 or -1

  10. Y An intermediate fit yields an intermediatevalue of r 1 2 3 4 5 6 r = +.65 X 1 2 3 4 5 2

  11. A poor fit yields a low value of r Y 1 2 3 4 5 6 r = - .19 X 1 2 3 4 5

  12. “Line of best fit” Y if y =5, x=3.4 • The line of best fit predicts a value for one variable given the value of the other variable • There will be a difference between these estimated values and the actual, known (“observed”) values. This difference is called a “residual” or an “error of the estimate.” • As the error between the known and predicted values decreases – as the dots cluster more tightly around the line – the absolute value of r (whether + or –) increases 1 2 3 4 5 6 if x =.5, y=2.3 X 1 2 3 4 5

  13. R-squared, the coefficient of determination • Proportion of the change in the dependent variable (also known as the “effect” variable) that is accounted for by change in the independent variable (also known as the “predictor” variable) • Taken by squaring the correlation coefficient (r) • “Big” R squared (R2) combines the effects of multiple independent/predictor variables • “Little” r squared (r2) is the contribution of a single independent/predictor variable

  14. Class exercise Hypothesis 1: Height  Weight Hypothesis 2: Age  Weight • Use this data to build two scattergrams • Be sure to place the independent and dependent variables on the correct axes • Estimate a possible value for the r statistics

  15. r = .72 r2 = .52

  16. r = .35 r2 = .12

  17. Changing the level of measurement from continuous to categorical SHORT TALL 240 220 HEAVY 3 7 200 180 WEIGHT 160 12 4 140 LIGHT 120 100 58 60 62 64 66 68 70 72 74 76 HEIGHT

  18. Some other correlation techniques • “Partial correlation” (see next slide) • Using a control variable to assess its potential influence on a bivariate (two-variable) relationship when all variables are continuous • Analogous to using tables for categorical variables • “Spearman’s r” • Assess correlation between two ordinal categorical variables • Logistical (“Logit” )regression • Used when a dependent variable is dichotomous. It’s converted into a binary 0/1 (e.g., 0 means “no”; 1 means “yes”) • Can use continuous and categorical independent variables • Results given as an odds ratio (aka log-odds ratio), which signifies the likelihood that an independent (“predictor”) variable contributes to changes in the dependent (“effect”) variable. A result of “1” means there is no relationship; results less than 1 and greater than 1 imply a relationship.

  19. Partial correlation • Instead of height  weight, is it possible that a variable related to height – age – is the real cause of changes in weight? Why or why not? HEIGHT WEIGHT AGE HEIGHT 1.00 .72 .04 WEIGHT .72 1.00 .34 AGE .04 .34 1.00 Zero-ordercorrelations Controlling for.. AGE HEIGHT WEIGHT HEIGHT 1.00 .75 WEIGHT .75 1.00 first-orderpartialcorrelations

More Related