1 / 51

# T tests, ANOVAs and regression - PowerPoint PPT Presentation

T tests, ANOVAs and regression. Tom Jenkins Ellen Meierotto SPM Methods for Dummies 2007. Why do we need t tests?. Objectives. Types of error Probability distribution Z scores T tests ANOVAs. Error. Null hypothesis Type 1 error ( α ): false positive Type 2 error ( β ): false negative.

Related searches for T tests, ANOVAs and regression

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'T tests, ANOVAs and regression' - lerato

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### T tests, ANOVAs and regression

Tom Jenkins

Ellen Meierotto

SPM Methods for Dummies 2007

• Types of error

• Probability distribution

• Z scores

• T tests

• ANOVAs

• Null hypothesis

• Type 1 error (α): false positive

• Type 2 error (β): false negative

• Standardised normal distribution

• µ = 0, σ = 1

• Z scores: 0, 1, 1.65, 1.96

• Need to know population standard deviation

Z=(x-μ)/σfor one point compared to pop.

• Comparing means

• 1 sample t

• 2 sample t

• Paired t

Pooled standard error of the mean

T tests in SPM: Did the observed signal change occur by chance or is it stat. significant?

• Recall GLM. Y= X β + ε

• β1 is an estimate of signal change over time attributable to the condition of interest

• Set up contrast (cT) 1 0 for β1:1xβ1+0xβ2+0xβn/s.d

• Null hypothesis: cTβ=0 No significant effect at each voxel for condition β1

• Contrast 1 -1 : Is the difference between 2 conditions significantly non-zero?

• t = cTβ/sd[cTβ] – 1 sided

ANOVA chance or is it stat. significant?

• Variances not means

• Total variance= model variance + error variance

• Results in F score- corresponding to a p value

Variance

F test = Model variance /Error variance

Group 1 chance or is it stat. significant?

Group 1

Group 1

Group 2

Group 2

Group 2

Partitioning the variance

Total =

Model +

(Between groups)

Error

(Within groups)

T vs F tests chance or is it stat. significant?

• F tests- any differences between multiple groups, interactions

• Have to determine where differences are post-hoc

• SPM- T- one tailed (con)

• SPM- F- two tailed (ess)

Conclusions chance or is it stat. significant?

• T tests describe how unlikely it is that experimental differences are due to chance

• Higher the t score, smaller the p value, more unlikely to be due to chance

• Can compare sample with population or 2 samples, paired or unpaired

• ANOVA/F tests are similar but use variances instead of means and can be applied to more than 2 groups and other more complex scenarios

Acknowledgements chance or is it stat. significant?

• MfD slides 2004-2006

• Van Belle, Biostatistics

• Human Brain Function

• Wikipedia

### Correlation and Regression chance or is it stat. significant?

Topics Covered: chance or is it stat. significant?

• Is there a relationship between x and y?

• What is the strength of this relationship

• Pearson’s r

• Can we describe this relationship and use it to predict y from x?

• Regression

• Is the relationship we have described statistically significant?

• F- and t-tests

• Relevance to SPM

• GLM

Relationship between chance or is it stat. significant?x and y

• Correlation describes the strength and direction of a linear relationship between two variables

• Regression tells you how well a certain independent variable predicts a dependent variable

• CORRELATION  CAUSATION

• In order to infer causality: manipulate independent variable and observe effect on dependent variable

Y chance or is it stat. significant?

X

Scattergrams

Y

Y

Y

Y

Y

X

X

Positive correlation

Negative correlation

No correlation

Covariance ~ chance or is it stat. significant?

DX * DY

Variance ~

DX * DX

Variance vs. Covariance

• Do two variables change together?

Covariance chance or is it stat. significant?

• When X and Y : cov (x,y) = pos.

• When X and Y : cov (x,y) = neg.

• When no constant relationship: cov (x,y) = 0

x chance or is it stat. significant?

(

)(

)

-

-

y

-

-

x

x

y

y

x

x

y

y

i

i

i

i

0

3

-

3

0

0

2

2

-

1

-

1

1

3

4

0

1

0

4

0

1

-

3

-

3

6

6

3

3

9

å

=

=

7

y

3

=

x

3

Example Covariance

What does this number tell us?

Example of how covariance value relies on variance chance or is it stat. significant?

Pearson’s R chance or is it stat. significant?

• Covariance does not really tell us anything

• Solution: standardise this measure

• Pearson’s R: standardise by adding std to equation:

Basic assumptions chance or is it stat. significant?

• Normal distributions

• Variances are constant and not zero

• Independent sampling – no autocorrelations

• No errors in the values of the independent variable

• All causation in the model is one-way (not necessary mathematically, but essential for prediction)

Pearson’s R: chance or is it stat. significant?degree of linear dependence

Limitations of r chance or is it stat. significant?

• r is actually

• r = true r of whole population

• = estimate of r based on data

• r is very sensitive to extreme values:

In the real world… chance or is it stat. significant?

• r is never 1 or –1

• interpretations for correlations in psychological research (Cohen)

Correlation Negative Positive

Small -0.29 to -0.10 00.10 to 0.29

Medium -0.49 to -0.30 0.30 to 0.49

Large -1.00 to -0.50 0.50 to 1.00

Regression chance or is it stat. significant?

• Correlation tells you if there is an association between x and y but it doesn’t describe the relationship or allow you to predict one variable from the other.

• To do this we need REGRESSION!

ŷ chance or is it stat. significant? = ax + b

slope

ε

= y i , true value

ε =residual error

Best-fit Line

• Aim of linear regression is to fit a straight line, ŷ = ax + b, to data that gives best prediction of y for any value of x

• This will be the line that

minimises distance between

data and fitted line, i.e.

the residuals

intercept

= ŷ, predicted value

Least Squares Regression chance or is it stat. significant?

• To find the best line we must minimise the sum of the squares of the residuals (the vertical distances from the data points to our line)

Model line: ŷ = ax + b

a = slope, b = intercept

Residual (ε) = y - ŷ

Sum of squares of residuals = Σ (y – ŷ)2

• we must find values of a and b that minimise

Σ (y – ŷ)2

b chance or is it stat. significant?

Finding b

• First we find the value of b that gives the min sum of squares

b

ε

ε

b

• Trying different values of b is equivalent to shifting the line up and down the scatter plot

b chance or is it stat. significant?

b

Finding a

• Now we find the value of a that gives the min sum of squares

b

• Trying out different values of a is equivalent to changing the slope of the line, while b stays constant

sums of squares (S) chance or is it stat. significant?

min S

Values of a and b

Minimising sums of squares

• Need to minimise Σ(y–ŷ)2

• ŷ = ax + b

• so need to minimise:

Σ(y - ax - b)2

• If we plot the sums of squares for all different values of a and b we get a parabola, because it is a squared term

• So the min sum of squares is at the bottom of the curve, where the gradient is zero.

The maths bit chance or is it stat. significant?

• So we can find a and b that give min sum of squares by taking partial derivatives of Σ(y - ax - b)2 with respect to a and b separately

• Then we solve these for 0 to give us the values of a and b that give the min sum of squares

r s chance or is it stat. significant?y

r = correlation coefficient of x and y

sy = standard deviation of y

sx = standard deviation of x

a =

sx

The solution

• Doing this gives the following equations for a and b:

• You can see that:

• A low correlation coefficient gives a flatter slope (small value of a)

• Large spread of y, i.e. high standard deviation, results in a steeper slope (high value of a)

• Large spread of x, i.e. high standard deviation, results in a flatter slope (high value of a)

y = ax + b chance or is it stat. significant?

b = y – ax

b = y – ax

r sy

r = correlation coefficient of x and y

sy = standard deviation of y

sx = standard deviation of x

x

b = y -

sx

The solution cont.

• Our model equation is ŷ = ax + b

• This line must pass through the mean so:

• We can put our equation into this giving:

• The smaller the correlation, the closer the intercept is to the mean of y

Back to the model chance or is it stat. significant?

• We can calculate the regression line for any data, but the important question is:

How well does this line fit the data, or how good is it at predicting y from x?

∑(ŷ – y) chance or is it stat. significant?2

∑(y – y)2

SSy

SSpred

sy2 =

sŷ2 =

=

=

n - 1

n - 1

dfŷ

dfy

∑(y – ŷ)2

SSer

serror2 =

=

n - 2

dfer

How good is our model?

• Total variance of y:

• Variance of predicted y values (ŷ):

This is the variance explained by our regression model

• Error variance:

This is the variance of the error between our predicted y values and the actual y values, and thus is the variance in y that is NOT explained by the regression model

How good is our model cont. chance or is it stat. significant?

• Total variance = predicted variance + error variance

sy2 = sŷ2 + ser2

• Conveniently, via some complicated rearranging

sŷ2 = r2 sy2

r2 = sŷ2 / sy2

• so r2 is the proportion of the variance in y that is explained by our regression model

How good is our model cont. chance or is it stat. significant?

• Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get:

ser2 = sy2 – r2sy2

= sy2 (1 – r2)

• From this we can see that the greater the correlation the smaller the error variance, so the better our prediction

s chance or is it stat. significant?ŷ2

r2 (n - 2)2

F

=

(dfŷ,dfer)

ser2

1 – r2

Is the model significant?

• i.e. do we get a significantly better prediction of y from our regression equation than by just predicting the mean?

• F-statistic:

complicated

rearranging

=......=

• And it follows that:

So all we need to

know are r and n !

r(n - 2)

t(n-2) =

(because F = t2)

√1 – r2

General Linear Model chance or is it stat. significant?

• Linear regression is actually a form of the General Linear Model where the parameters are a, the slope of the line, and b, the intercept.

y = ax + b +ε

• A General Linear Model is just any model that describes the data in terms of a straight line

Multiple regression chance or is it stat. significant?

• Multiple regression is used to determine the effect of a number of independent variables, x1, x2, x3 etc., on a single dependent variable, y

• The different x variables are combined in a linear way and each has its own regression coefficient:

y = a1x1+ a2x2 +…..+ anxn + b + ε

• The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y.

• i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for

SPM chance or is it stat. significant?

• Linear regression is a GLM that models the effect of one independent variable, x, on ONE dependent variable, y

• Multiple Regression models the effect of several independent variables, x1,x2 etc, on ONE dependent variable, y

• Both are types of General Linear Model

• GLM can also allow you to analyse the effects of several independent x variables on several dependent variables, y1, y2, y3etc, in a linear combination

• This is what SPM does and will be explained soon…