Correlation

1 / 53

# Correlation - PowerPoint PPT Presentation

Correlation. Hal Whitehead BIOL4062/5062. The correlation coefficient Tests Non-parametric correlations Partial correlation Multiple correlation Autocorrelation Many correlation coefficients. The correlation coefficient. Linked observations: x 1 , x 2 ,..., x n y 1 , y 2 ,..., y n

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Correlation' - jana

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Correlation

BIOL4062/5062

The correlation coefficient
• Tests
• Non-parametric correlations
• Partial correlation
• Multiple correlation
• Autocorrelation
• Many correlation coefficients

### The correlation coefficient

x1,x2,...,xny1,y2,...,yn



Mean:x = Σxi / ny = Σyi / n



Variance:

S²(x)= Σ(xi-x)²/(n-1) S²(y)= Σ(yi-y)²/(n-1)



Standard Deviation:

S(x) S(y)

Covariance: S²(x,y) = Σ(xi-x) ∙ (yi-y) / (n-1)

Covariance: S²(x,y) = Σ(xi-x) ∙ (yi-y) / (n-1)

Correlation coefficient

(“Pearson” or “product-moment”):

r = {Σ(xi-x) ∙ (yi-y) / (n-1) } / {S(x) ∙ S(y)}

r = S²(x,y) / {S(x) ∙ S(y)}

The correlation coefficient:

r = S²(x,y) / {S(x) ∙ S(y)}

-1 ≤r≤ +1

If no linear relationship: r = 0

r2:

proportion of variance accounted for by linear regression

### Tests on Correlation Coefficients

Tests on Correlation Coefficients
• Assume:
• Independence
• Bivariate Normality
Tests on Correlation Coefficients
• Assume:
• Independence
• Bivariate Normality
Tests on Correlation Coefficients
• Assume:
• Independence
• Bivariate Normality
• Then:

z = Ln [(1+r)/(1-r)]/2 is normally distributed

with variance 1/(n-3)

And, if  (true population value of r) = 0 :

r∙√(n-2) / √(1-r²) is distributed as Student's t with n-2 degrees of freedom

We can test:

a) r≠ 0

b) r > 0 or r < 0

c) r = constant

d) r(x,y) = r(z,w)

Also confidence intervals for r

r= 0.75

(SE = 0.15)

(95% C.I. 0.47-0.89)

Tests:

r≠ 0 : P = 0.0001

r > 0 : P = 0.00005

More sexually dimorphic species

have relatively larger melons

Why do Large Animals have Large Brains?(Schoenemann Brain Behav. Evol. 2004)
• Correlations among mammals
• Log brain size with
• Log muscle mass

r=0.984

• Log fat massr=0.942
• Are these significantly different?

t=5.50; df=36; P<0.01

Hotelling-William test

• Brain mass is more closely related to muscle than fat

### Non-Parametric Correlation

Non-Parametric Correlation
• If one variable normally distributed
• can test r=0 as before.
• If neither normally distributed:
• Spearman's rS rank correlation coefficient

(replace values by ranks)

or:

• Kendall's τcorrelation coefficient
• Use Spearman's when there is less certainty about the close rankings

r= 0.75

rS= 0.62

τ= 0.47

### Partial Correlation

Partial Correlation
• Correlation between X and Y controlling for Z

r (X,Y|Z) = {r(X,Y) - r(X,Z)∙r(Y,Z)}

√{(1 - r(X,Z)²)∙(1 - r(Y,Z)²)}

• Correlation between X and Y controlling for W,Z

r (X,Y|W,Z) = {r(X,Y|W) - r(X,Z|W)∙r(Y,Z|W)}

√{(1 - r(X,Z|W)²)∙(1 - r(Y,Z|W)²)}

n-2-c degrees of freedom

(c is number of control variables)

Why do Large Animals have Large Brains?(Schoenemann Brain Behav. Evol. 2004)
• Correlations among mammals
• Log brain size with

Log musclemass

Controlling for Log bodymass

r=0.466

Log fat mass

Controlling for Log body mass

r=-0.299

• Fatter species have relatively smaller brains and more muscular species relatively larger brains
Semi-partial Correlation Coefficient
• Correlation between X & Y controlling Y for Z

r (X,(Y|Z)) = {r(X,Y) - r(X,Z)∙r(Y,Z)}

√(1 - r(Y,Z)²)

Correlation

r= 0.75

Partial Correlation

r (SSD,MA|L) = 0.73

Semi-partial Correlations

r (SSD,(MA|L)) = 0.69

r ((SSD |L),MA) = 0.71

### Multiple Correlation

Multiple Correlation Coefficient
• Correlation between one dependent variable and its best estimate from a regression on several independent variables:

r(Y∙X1,X2,X3,...)

• Square of multiple correlation coefficient is:
• proportion of variance accounted for by multiple regression

### Autocorrelation

Autocorrelation
• Purposes
• Examine time series
• Look at (serial) independence
Data

(e.g. Feeding rate on consecutive days,

plankton biomass at each station on a transect):

1.5 1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7 3.6

Autocorrelation of lag=1 is correlation between:

1.5 1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7

1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7 3.6

r = 0.508

Autocorrelation of lag=2 is correlation between:

1.5 1.7 4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9

4.3 5.4 5.7 6.2 3.9 4.4 5.2 4.8 3.9 3.7 3.6

r = -0.053

…….

### Many Correlation Coefficients

Listwise deletion, n=40; P<0.10; P<0.05; uncorrected

NGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERR

NGR25L 1.00

SST 0.12 1.00

SHITR -0.21 -0.33* 1.00

LSPEED 0.10 -0.28+ 0.06 1.00

APROP -0.15 -0.34* 0.07 0.18 1.00

SOCV -0.05 0.08 -0.16 -0.01 -0.33* 1.00

SHR2 -0.18 -0.12 0.01 -0.20 0.19 -0.03 1.00

LFMECS 0.08 0.14 -0.13 -0.12 -0.22 0.29+ -0.18 1.00

LAERR -0.10 0.03 -0.21 -0.24 -0.02 0.24 -0.08 0.23 1.00

Expected no. with P<0.10 = 3.6; with P<0.05 = 1.8

Listwise deletion, n=40; P<0.10; P<0.05; Bonferronicorrected

NGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERR

NGR25L 1.00

SST 0.12 1.00

SHITR -0.21 -0.33 1.00

LSPEED 0.10 -0.28 0.06 1.00

APROP -0.15 -0.34 0.07 0.18 1.00

SOCV -0.05 0.08 -0.16 -0.01 -0.33 1.00

SHR2 -0.18 -0.12 0.01 -0.20 0.19 -0.03 1.00

LFMECS 0.08 0.14 -0.13 -0.12 -0.22 0.29 -0.18 1.00

LAERR -0.10 0.03 -0.21 -0.24 -0.02 0.24 -0.08 0.23 1.00

P=1.0 for all coefficients

Listwise deletion, n=40; P<0.10; P<0.05; uncorrected

NGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERR

NGR25L 1.00

SST 0.12 1.00

SHITR -0.21 -0.33* 1.00

LSPEED 0.10 -0.28+ 0.06 1.00

APROP -0.15 -0.34* 0.07 0.18 1.00

SOCV -0.05 0.08 -0.16 -0.01 -0.33* 1.00

SHR2 -0.18 -0.12 0.01 -0.20 0.19 -0.03 1.00

LFMECS 0.08 0.14 -0.13 -0.12 -0.22 0.29+ -0.18 1.00

LAERR -0.10 0.03 -0.21 -0.24 -0.02 0.24 -0.08 0.23 1.00

Pairwise deletion, n=59-118; P<0.10; P<0.05; uncorrected

NGR25L SST SHITR LSPEED APROP SOCV SHR2 LFMECS LAERR

NGR25L 1.00

SST 0.11 1.00

SHITR -0.17+-0.46* 1.00

LSPEED 0.05 -0.17 0.05 1.00

APROP -0.05 -0.20+ 0.04 0.31* 1.00

SOCV -0.00 -0.05 -0.06 -0.02 -0.25* 1.00

SHR2 -0.15 -0.13 0.07 -0.14 0.05 0.01 1.00

LFMECS 0.01 0.07 -0.02 -0.14 -0.25* 0.43*-0.26+ 1.00

LAERR -0.06 0.06 0.09 -0.27*-0.20+ 0.06 -0.06 0.21+ 1.00

Many Correlation Coefficients
• Missing values:
• Listwise deletion (comparability), or
• Pairwise deletion (power)
• P-values:
• Uncorrected: type 1 errors
• Bonferroni, etc.: type 2 errors

### Beware!

Y2

Y1 Y3

Y4

Y1 Y2

Correlation Causation

Y1Y3

Y4

Y2Y5

Y1

Y3

Y2

Y1Y3

Y4

Y5

Y2 Y6

Y1Y3

Y4

Y2Y5