t tests anovas and regression
Download
Skip this Video
Download Presentation
T tests, ANOVAs and regression

Loading in 2 Seconds...

play fullscreen
1 / 51

T tests, ANOVAs and regression - PowerPoint PPT Presentation


  • 103 Views
  • Uploaded on

T tests, ANOVAs and regression. Tom Jenkins Ellen Meierotto SPM Methods for Dummies 2007. Why do we need t tests?. Objectives. Types of error Probability distribution Z scores T tests ANOVAs. Error. Null hypothesis Type 1 error ( α ): false positive Type 2 error ( β ): false negative.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'T tests, ANOVAs and regression' - lerato


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
t tests anovas and regression

T tests, ANOVAs and regression

Tom Jenkins

Ellen Meierotto

SPM Methods for Dummies 2007

objectives
Objectives
  • Types of error
  • Probability distribution
  • Z scores
  • T tests
  • ANOVAs
error
Error
  • Null hypothesis
  • Type 1 error (α): false positive
  • Type 2 error (β): false negative
z scores
Z scores
  • Standardised normal distribution
  • µ = 0, σ = 1
  • Z scores: 0, 1, 1.65, 1.96
  • Need to know population standard deviation

Z=(x-μ)/σfor one point compared to pop.

t tests
T tests
  • Comparing means
  • 1 sample t
  • 2 sample t
  • Paired t
2 sample t tests
2 sample t tests

Pooled standard error of the mean

t tests in spm did the observed signal change occur by chance or is it stat significant
T tests in SPM: Did the observed signal change occur by chance or is it stat. significant?
  • Recall GLM. Y= X β + ε
  • β1 is an estimate of signal change over time attributable to the condition of interest
  • Set up contrast (cT) 1 0 for β1:1xβ1+0xβ2+0xβn/s.d
  • Null hypothesis: cTβ=0 No significant effect at each voxel for condition β1
  • Contrast 1 -1 : Is the difference between 2 conditions significantly non-zero?
  • t = cTβ/sd[cTβ] – 1 sided
anova
ANOVA
  • Variances not means
  • Total variance= model variance + error variance
  • Results in F score- corresponding to a p value

Variance

F test = Model variance /Error variance

slide17

Group 1

Group 1

Group 1

Group 2

Group 2

Group 2

Partitioning the variance

Total =

Model +

(Between groups)

Error

(Within groups)

t vs f tests
T vs F tests
  • F tests- any differences between multiple groups, interactions
  • Have to determine where differences are post-hoc
  • SPM- T- one tailed (con)
  • SPM- F- two tailed (ess)
conclusions
Conclusions
  • T tests describe how unlikely it is that experimental differences are due to chance
  • Higher the t score, smaller the p value, more unlikely to be due to chance
  • Can compare sample with population or 2 samples, paired or unpaired
  • ANOVA/F tests are similar but use variances instead of means and can be applied to more than 2 groups and other more complex scenarios
acknowledgements
Acknowledgements
  • MfD slides 2004-2006
  • Van Belle, Biostatistics
  • Human Brain Function
  • Wikipedia
topics covered
Topics Covered:
  • Is there a relationship between x and y?
  • What is the strength of this relationship
    • Pearson’s r
  • Can we describe this relationship and use it to predict y from x?
    • Regression
  • Is the relationship we have described statistically significant?
    • F- and t-tests
  • Relevance to SPM
    • GLM
relationship between x and y
Relationship between x and y
  • Correlation describes the strength and direction of a linear relationship between two variables
  • Regression tells you how well a certain independent variable predicts a dependent variable
  • CORRELATION  CAUSATION
    • In order to infer causality: manipulate independent variable and observe effect on dependent variable
scattergrams

Y

X

Scattergrams

Y

Y

Y

Y

Y

X

X

Positive correlation

Negative correlation

No correlation

variance vs covariance

Covariance ~

DX * DY

Variance ~

DX * DX

Variance vs. Covariance
  • Do two variables change together?
covariance
Covariance
  • When X and Y : cov (x,y) = pos.
  • When X and Y : cov (x,y) = neg.
  • When no constant relationship: cov (x,y) = 0
example covariance

x

(

)(

)

-

-

y

-

-

x

x

y

y

x

x

y

y

i

i

i

i

0

3

-

3

0

0

2

2

-

1

-

1

1

3

4

0

1

0

4

0

1

-

3

-

3

6

6

3

3

9

å

=

=

7

y

3

=

x

3

Example Covariance

What does this number tell us?

pearson s r
Pearson’s R
  • Covariance does not really tell us anything
    • Solution: standardise this measure
  • Pearson’s R: standardise by adding std to equation:
basic assumptions
Basic assumptions
  • Normal distributions
  • Variances are constant and not zero
  • Independent sampling – no autocorrelations
  • No errors in the values of the independent variable
  • All causation in the model is one-way (not necessary mathematically, but essential for prediction)
limitations of r
Limitations of r
  • r is actually
    • r = true r of whole population
    • = estimate of r based on data
  • r is very sensitive to extreme values:
in the real world
In the real world…
  • r is never 1 or –1
  • interpretations for correlations in psychological research (Cohen)

Correlation Negative Positive

Small -0.29 to -0.10 00.10 to 0.29

Medium -0.49 to -0.30 0.30 to 0.49

Large -1.00 to -0.50 0.50 to 1.00

regression
Regression
  • Correlation tells you if there is an association between x and y but it doesn’t describe the relationship or allow you to predict one variable from the other.
  • To do this we need REGRESSION!
best fit line

ŷ = ax + b

slope

ε

= y i , true value

ε =residual error

Best-fit Line
  • Aim of linear regression is to fit a straight line, ŷ = ax + b, to data that gives best prediction of y for any value of x
  • This will be the line that

minimises distance between

data and fitted line, i.e.

the residuals

intercept

= ŷ, predicted value

least squares regression
Least Squares Regression
  • To find the best line we must minimise the sum of the squares of the residuals (the vertical distances from the data points to our line)

Model line: ŷ = ax + b

a = slope, b = intercept

Residual (ε) = y - ŷ

Sum of squares of residuals = Σ (y – ŷ)2

  • we must find values of a and b that minimise

Σ (y – ŷ)2

finding b

b

Finding b
  • First we find the value of b that gives the min sum of squares

b

ε

ε

b

  • Trying different values of b is equivalent to shifting the line up and down the scatter plot
finding a

b

b

Finding a
  • Now we find the value of a that gives the min sum of squares

b

  • Trying out different values of a is equivalent to changing the slope of the line, while b stays constant
minimising sums of squares

sums of squares (S)

Gradient = 0

min S

Values of a and b

Minimising sums of squares
  • Need to minimise Σ(y–ŷ)2
  • ŷ = ax + b
  • so need to minimise:

Σ(y - ax - b)2

  • If we plot the sums of squares for all different values of a and b we get a parabola, because it is a squared term
  • So the min sum of squares is at the bottom of the curve, where the gradient is zero.
the maths bit
The maths bit
  • So we can find a and b that give min sum of squares by taking partial derivatives of Σ(y - ax - b)2 with respect to a and b separately
  • Then we solve these for 0 to give us the values of a and b that give the min sum of squares
the solution

r sy

r = correlation coefficient of x and y

sy = standard deviation of y

sx = standard deviation of x

a =

sx

The solution
  • Doing this gives the following equations for a and b:
  • You can see that:
    • A low correlation coefficient gives a flatter slope (small value of a)
    • Large spread of y, i.e. high standard deviation, results in a steeper slope (high value of a)
    • Large spread of x, i.e. high standard deviation, results in a flatter slope (high value of a)
the solution cont

y = ax + b

b = y – ax

b = y – ax

r sy

r = correlation coefficient of x and y

sy = standard deviation of y

sx = standard deviation of x

x

b = y -

sx

The solution cont.
  • Our model equation is ŷ = ax + b
  • This line must pass through the mean so:
  • We can put our equation into this giving:
  • The smaller the correlation, the closer the intercept is to the mean of y
back to the model
Back to the model
  • We can calculate the regression line for any data, but the important question is:

How well does this line fit the data, or how good is it at predicting y from x?

how good is our model

∑(ŷ – y)2

∑(y – y)2

SSy

SSpred

sy2 =

sŷ2 =

=

=

n - 1

n - 1

dfŷ

dfy

∑(y – ŷ)2

SSer

serror2 =

=

n - 2

dfer

How good is our model?
  • Total variance of y:
  • Variance of predicted y values (ŷ):

This is the variance explained by our regression model

  • Error variance:

This is the variance of the error between our predicted y values and the actual y values, and thus is the variance in y that is NOT explained by the regression model

slide46

How good is our model cont.

  • Total variance = predicted variance + error variance

sy2 = sŷ2 + ser2

  • Conveniently, via some complicated rearranging

sŷ2 = r2 sy2

r2 = sŷ2 / sy2

  • so r2 is the proportion of the variance in y that is explained by our regression model
how good is our model cont
How good is our model cont.
  • Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get:

ser2 = sy2 – r2sy2

= sy2 (1 – r2)

  • From this we can see that the greater the correlation the smaller the error variance, so the better our prediction
is the model significant

sŷ2

r2 (n - 2)2

F

=

(dfŷ,dfer)

ser2

1 – r2

Is the model significant?
  • i.e. do we get a significantly better prediction of y from our regression equation than by just predicting the mean?
  • F-statistic:

complicated

rearranging

=......=

  • And it follows that:

So all we need to

know are r and n !

r(n - 2)

t(n-2) =

(because F = t2)

√1 – r2

general linear model
General Linear Model
  • Linear regression is actually a form of the General Linear Model where the parameters are a, the slope of the line, and b, the intercept.

y = ax + b +ε

  • A General Linear Model is just any model that describes the data in terms of a straight line
multiple regression
Multiple regression
  • Multiple regression is used to determine the effect of a number of independent variables, x1, x2, x3 etc., on a single dependent variable, y
  • The different x variables are combined in a linear way and each has its own regression coefficient:

y = a1x1+ a2x2 +…..+ anxn + b + ε

  • The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y.
  • i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for
slide51
SPM
  • Linear regression is a GLM that models the effect of one independent variable, x, on ONE dependent variable, y
  • Multiple Regression models the effect of several independent variables, x1,x2 etc, on ONE dependent variable, y
  • Both are types of General Linear Model
  • GLM can also allow you to analyse the effects of several independent x variables on several dependent variables, y1, y2, y3etc, in a linear combination
  • This is what SPM does and will be explained soon…
ad