T tests anovas and regression
This presentation is the property of its rightful owner.
Sponsored Links
1 / 51

T tests, ANOVAs and regression PowerPoint PPT Presentation


  • 65 Views
  • Uploaded on
  • Presentation posted in: General

T tests, ANOVAs and regression. Tom Jenkins Ellen Meierotto SPM Methods for Dummies 2007. Why do we need t tests?. Objectives. Types of error Probability distribution Z scores T tests ANOVAs. Error. Null hypothesis Type 1 error ( α ): false positive Type 2 error ( β ): false negative.

Download Presentation

T tests, ANOVAs and regression

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


T tests anovas and regression

T tests, ANOVAs and regression

Tom Jenkins

Ellen Meierotto

SPM Methods for Dummies 2007


Why do we need t tests

Why do we need t tests?


Objectives

Objectives

  • Types of error

  • Probability distribution

  • Z scores

  • T tests

  • ANOVAs


Error

Error

  • Null hypothesis

  • Type 1 error (α): false positive

  • Type 2 error (β): false negative


Normal distribution

Normal distribution


Z scores

Z scores

  • Standardised normal distribution

  • µ = 0, σ = 1

  • Z scores: 0, 1, 1.65, 1.96

  • Need to know population standard deviation

Z=(x-μ)/σfor one point compared to pop.


T tests

T tests

  • Comparing means

  • 1 sample t

  • 2 sample t

  • Paired t


Different sample variances

Different sample variances


2 sample t tests

2 sample t tests

Pooled standard error of the mean


1 sample t test

1 sample t test


The effect of degrees of freedom on t distribution

The effect of degrees of freedom on t distribution


Paired t tests

Paired t tests


T tests in spm did the observed signal change occur by chance or is it stat significant

T tests in SPM: Did the observed signal change occur by chance or is it stat. significant?

  • Recall GLM. Y= X β + ε

  • β1 is an estimate of signal change over time attributable to the condition of interest

  • Set up contrast (cT) 1 0 for β1:1xβ1+0xβ2+0xβn/s.d

  • Null hypothesis: cTβ=0 No significant effect at each voxel for condition β1

  • Contrast 1 -1 : Is the difference between 2 conditions significantly non-zero?

  • t = cTβ/sd[cTβ] – 1 sided


Anova

ANOVA

  • Variances not means

  • Total variance= model variance + error variance

  • Results in F score- corresponding to a p value

Variance

F test = Model variance /Error variance


T tests anovas and regression

Group 1

Group 1

Group 1

Group 2

Group 2

Group 2

Partitioning the variance

Total =

Model +

(Between groups)

Error

(Within groups)


T vs f tests

T vs F tests

  • F tests- any differences between multiple groups, interactions

  • Have to determine where differences are post-hoc

  • SPM- T- one tailed (con)

  • SPM- F- two tailed (ess)


Conclusions

Conclusions

  • T tests describe how unlikely it is that experimental differences are due to chance

  • Higher the t score, smaller the p value, more unlikely to be due to chance

  • Can compare sample with population or 2 samples, paired or unpaired

  • ANOVA/F tests are similar but use variances instead of means and can be applied to more than 2 groups and other more complex scenarios


Acknowledgements

Acknowledgements

  • MfD slides 2004-2006

  • Van Belle, Biostatistics

  • Human Brain Function

  • Wikipedia


Correlation and regression

Correlation and Regression


Topics covered

Topics Covered:

  • Is there a relationship between x and y?

  • What is the strength of this relationship

    • Pearson’s r

  • Can we describe this relationship and use it to predict y from x?

    • Regression

  • Is the relationship we have described statistically significant?

    • F- and t-tests

  • Relevance to SPM

    • GLM


Relationship between x and y

Relationship between x and y

  • Correlation describes the strength and direction of a linear relationship between two variables

  • Regression tells you how well a certain independent variable predicts a dependent variable

  • CORRELATION  CAUSATION

    • In order to infer causality: manipulate independent variable and observe effect on dependent variable


Scattergrams

Y

X

Scattergrams

Y

Y

Y

Y

Y

X

X

Positive correlation

Negative correlation

No correlation


Variance vs covariance

Covariance ~

DX * DY

Variance ~

DX * DX

Variance vs. Covariance

  • Do two variables change together?


Covariance

Covariance

  • When X and Y : cov (x,y) = pos.

  • When X and Y : cov (x,y) = neg.

  • When no constant relationship: cov (x,y) = 0


Example covariance

x

(

)(

)

-

-

y

-

-

x

x

y

y

x

x

y

y

i

i

i

i

0

3

-

3

0

0

2

2

-

1

-

1

1

3

4

0

1

0

4

0

1

-

3

-

3

6

6

3

3

9

å

=

=

7

y

3

=

x

3

Example Covariance

What does this number tell us?


Example of how covariance value relies on variance

Example of how covariance value relies on variance


Pearson s r

Pearson’s R

  • Covariance does not really tell us anything

    • Solution: standardise this measure

  • Pearson’s R: standardise by adding std to equation:


Basic assumptions

Basic assumptions

  • Normal distributions

  • Variances are constant and not zero

  • Independent sampling – no autocorrelations

  • No errors in the values of the independent variable

  • All causation in the model is one-way (not necessary mathematically, but essential for prediction)


Pearson s r degree of linear dependence

Pearson’s R: degree of linear dependence


Limitations of r

Limitations of r

  • r is actually

    • r = true r of whole population

    • = estimate of r based on data

  • r is very sensitive to extreme values:


In the real world

In the real world…

  • r is never 1 or –1

  • interpretations for correlations in psychological research (Cohen)

    CorrelationNegativePositive

    Small-0.29 to -0.1000.10 to 0.29

    Medium-0.49 to -0.300.30 to 0.49

    Large-1.00 to -0.500.50 to 1.00


Regression

Regression

  • Correlation tells you if there is an association between x and y but it doesn’t describe the relationship or allow you to predict one variable from the other.

  • To do this we need REGRESSION!


Best fit line

ŷ = ax + b

slope

ε

= y i , true value

ε =residual error

Best-fit Line

  • Aim of linear regression is to fit a straight line, ŷ = ax + b, to data that gives best prediction of y for any value of x

  • This will be the line that

    minimises distance between

    data and fitted line, i.e.

    the residuals

intercept

= ŷ, predicted value


Least squares regression

Least Squares Regression

  • To find the best line we must minimise the sum of the squares of the residuals (the vertical distances from the data points to our line)

Model line: ŷ = ax + b

a = slope, b = intercept

Residual (ε) = y - ŷ

Sum of squares of residuals = Σ (y – ŷ)2

  • we must find values of a and b that minimise

    Σ (y – ŷ)2


Finding b

b

Finding b

  • First we find the value of b that gives the min sum of squares

b

ε

ε

b

  • Trying different values of b is equivalent to shifting the line up and down the scatter plot


Finding a

b

b

Finding a

  • Now we find the value of a that gives the min sum of squares

b

  • Trying out different values of a is equivalent to changing the slope of the line, while b stays constant


Minimising sums of squares

sums of squares (S)

Gradient = 0

min S

Values of a and b

Minimising sums of squares

  • Need to minimise Σ(y–ŷ)2

  • ŷ = ax + b

  • so need to minimise:

    Σ(y - ax - b)2

  • If we plot the sums of squares for all different values of a and b we get a parabola, because it is a squared term

  • So the min sum of squares is at the bottom of the curve, where the gradient is zero.


The maths bit

The maths bit

  • So we can find a and b that give min sum of squares by taking partial derivatives of Σ(y - ax - b)2 with respect to a and b separately

  • Then we solve these for 0 to give us the values of a and b that give the min sum of squares


The solution

r sy

r = correlation coefficient of x and y

sy = standard deviation of y

sx = standard deviation of x

a =

sx

The solution

  • Doing this gives the following equations for a and b:

  • You can see that:

    • A low correlation coefficient gives a flatter slope (small value of a)

    • Large spread of y, i.e. high standard deviation, results in a steeper slope (high value of a)

    • Large spread of x, i.e. high standard deviation, results in a flatter slope (high value of a)


The solution cont

y = ax + b

b = y – ax

b = y – ax

r sy

r = correlation coefficient of x and y

sy = standard deviation of y

sx = standard deviation of x

x

b = y -

sx

The solution cont.

  • Our model equation is ŷ = ax + b

  • This line must pass through the mean so:

  • We can put our equation into this giving:

  • The smaller the correlation, the closer the intercept is to the mean of y


Back to the model

Back to the model

  • We can calculate the regression line for any data, but the important question is:

    How well does this line fit the data, or how good is it at predicting y from x?


How good is our model

∑(ŷ – y)2

∑(y – y)2

SSy

SSpred

sy2 =

sŷ2 =

=

=

n - 1

n - 1

dfŷ

dfy

∑(y – ŷ)2

SSer

serror2 =

=

n - 2

dfer

How good is our model?

  • Total variance of y:

  • Variance of predicted y values (ŷ):

This is the variance explained by our regression model

  • Error variance:

This is the variance of the error between our predicted y values and the actual y values, and thus is the variance in y that is NOT explained by the regression model


T tests anovas and regression

How good is our model cont.

  • Total variance = predicted variance + error variance

    sy2 = sŷ2 + ser2

  • Conveniently, via some complicated rearranging

    sŷ2 = r2 sy2

    r2 = sŷ2 / sy2

  • so r2 is the proportion of the variance in y that is explained by our regression model


How good is our model cont

How good is our model cont.

  • Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get:

    ser2 = sy2 – r2sy2

    = sy2 (1 – r2)

  • From this we can see that the greater the correlation the smaller the error variance, so the better our prediction


Is the model significant

sŷ2

r2 (n - 2)2

F

=

(dfŷ,dfer)

ser2

1 – r2

Is the model significant?

  • i.e. do we get a significantly better prediction of y from our regression equation than by just predicting the mean?

  • F-statistic:

complicated

rearranging

=......=

  • And it follows that:

So all we need to

know are r and n !

r(n - 2)

t(n-2) =

(because F = t2)

√1 – r2


General linear model

General Linear Model

  • Linear regression is actually a form of the General Linear Model where the parameters are a, the slope of the line, and b, the intercept.

    y = ax + b +ε

  • A General Linear Model is just any model that describes the data in terms of a straight line


Multiple regression

Multiple regression

  • Multiple regression is used to determine the effect of a number of independent variables, x1, x2, x3 etc., on a single dependent variable, y

  • The different x variables are combined in a linear way and each has its own regression coefficient:

    y = a1x1+ a2x2 +…..+ anxn + b + ε

  • The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y.

  • i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for


T tests anovas and regression

SPM

  • Linear regression is a GLM that models the effect of one independent variable, x, on ONE dependent variable, y

  • Multiple Regression models the effect of several independent variables, x1,x2 etc, on ONE dependent variable, y

  • Both are types of General Linear Model

  • GLM can also allow you to analyse the effects of several independent x variables on several dependent variables, y1, y2, y3etc, in a linear combination

  • This is what SPM does and will be explained soon…


  • Login