Statistics for Linguistics Students

Statistics for Linguistics Students Michaelmas 2004 Week 6 Bettina Braun www.phon.ox.ac.uk/~bettina/teaching.html

Overview • Recap • X2-test for frequency data • Introduction to Analysis of variance (ANOVA) • One-factor between-subjects design • Two-factor between-subjects design

What do we report from the results table? Differences in the mean There is a signficant difference (t=2.94, df=15, p = 0.01)

What are frequency data? • Frequency count • Number of subjects/events in a given category (e.g. number of high and low accents) • What about the number of correct responses of different subjects? • X2-test for • Test of deviation from expected frequencies: Test whether the observed frequencies deviate from expected frequencies (e.g. using a dice, there is an a priori chance of 16.67% for each number) • Test of association: Finding relationship between two or more independent variables (e.g. test relation between gender and the use of high or low accents?) this is not frequency data!!!

X2-test for deviation from expected frequencies • Null-hypothesis: there is no difference between expected and observed frequencies • Example data • Calculation Have to be identical = 5.8

Looking up the p-value • Calculated value for X2 must be larger than the one found in the table to be significant • Degrees of freedom: • If there is one independent variabledf = a – 1 • If there are two independent variables:df = (a-1)(b-1)

Further Notes • The X2 test for the deviation from expected frequencies can be used for one independent variable only • If the independent variable has only two levels (e.g. high vs. low accent), a correction for continuity has to be used -0.5)2

Row total x column total Grand total X2 as test of association • Calculation of expected frequencies: Cell freq =

Useful checks • Sum of expected frequencies must eqal to the sum of the observed frequenciesfo = fe = N • The sum of the observed frequencies minus the expected frequencies must equal zero(fo – fe) = 0

X2-test • Limitations: • All raw data for X2 must be frequencies (not percentages!) • Each subject or event is counted only once, i.e. contributes to only one cell value (strictly between-subjects) • The total number of observations should be greater than 20 • The expected frequency in any cell should be greater than 5

An Example • You want to test how well non-Chinese speaking students can learn Chinese characters using different kinds of mnemonic. There are three groups of subjects, one with no mnemonic, one with mnemonic 1 and one with mnemonic 2. You count how many characters were correctly recalled. • What are the independent and dependent variables? • How many levels does the IV have? • What is the type of the dependent variable? IV: kind of mnemonic, DV: recall 3 interval

ANOVA: general • Analysis of variance • Test the null-hypothesis that all the samples are taken from the same population • compares the variances within the samples (random error) to the variance between the samples (systematic error) • If the variances between the samples are larger than the variances within the samples, we can reject the null-hypothesis

ANOVA: limitations • All samples must be selected randomly • The scores must be interval • The scores in the samples must be normally distributed • The variances of the samples must be homogenious • There need to be an equal number of scores in each sample

ANOVA: general • Conventions • Independent variables are called `factors´ • ANOVA calculates an F-statistic that determines whether the null-hypothesis can be rejected or not • In SPSS, you find the ANOVAs in Analyze => General Linear Model • “univariate”: analysis of one DV (between) • “multivariate”: for more than one DV (between) • “repeated measures”: within-subjects designs

F-statistic • F-statistic is the ratio of the between-group variance to the within-group variance.It has to be larger thana critical value in a table • The p-value of the F-statistic depends on two df-values  F(dfn, dfd) = value • Df of the numerator dfn=k-1 • Df of the denominator dfd=N-k (N: number of scores in sample, k: number of groups)

Reporting the F-value • As the p-value of the F-statistic depends on two df-values, you have to report them • Suppose, we have 3 groups (3 levels of an independent variable), and 12 scores per group, we report the F-statistic as follows: (similarly to the t-value, the df, and the p-value for t-tests!) F(2,9) = 2.9, p = ???

Critical values for the F-statistic …

One factor between-subjects ANOVA • If the independent variable has two levels, the results are comparable to an independent t-test (F = t2) • If we have more than two levels, we could in principle run multiple independent t-tests • BUT: This increases our Type I error • With one test we can be 95% sure our conclusion is correct • With two tests, this percentage drops to 0.95 * 0.95 = 0.90 (we can only be 90% sure of our conclusion) • With even more tests …

One factor between-subjects ANOVA • A one-factor ANOVA corrects for this increased risk of a Type I error • There are fixed factors and random factors: • If you choose the IV to be a fixed factor, the model is calculated for just the levels of independent variable you have (e.g. gender, accentedness) • If you choose the IV to be a random factor, you want to generalise from the levels of your independent variable to other levels (e.g. IV variable contains three different degrees of blood alcohol but you want to generalise the effect of e.g. speech control to other levels)

SPSS output Ignore these! • There is a significant effect of mnemonicness on the number of characters recalled:F (2,27) = 17.7, p < 0.001 But between which of the groups??

Post-hoc tests • If the IV has more than 2 levels, we have to do post-hoc tests to find out, which of the groups are significantly differet • Scheffé test: • Suitable for pair-wise comparison between all groups • Corrects for the increased risk of an Type I error (most conservative post-hoc test) • Dunnet test: • Useful for “planned comparisons”, e.g. comparing two different groups against a control group • Less stringent than Scheffé

Post-hoc tests with SPSS

SPSS output for multiple comparisions (here Scheffé test) • Significant differences between “no mnemonic” and the other two groups.

SPSS output for homegenous subsets (Scheffé test) • There are two subsets

Two factor between-subject ANOVA • In an ANOVA, you can also investigate the effect of more than one independent variable • This is called a factorial design • Example: You would like to investigate how the two diff. speech rates affect the duration of words in sentence-initial, -medial, and -final position. • What are the IV and DV? • How many levels do the IV have? • What is the type of the DV? IV: speech rates, position 2 and 3 interval

Factorial Design:Example • Every level of each factor is combined with every level of the other factor • Factorial designs have to be completely randomised, i.e. every group contributes data to only one cell

Factorial Design:Main effects and interaction • For every variable we can find significant main effects. Would you expect main effects here? • we would expect to find a main effect of speech rate on duration (i.e. higher speech rate => shorter durations) • Also, there might be a main effect of position (final segments undergo phrase-final lengthening, early and medial ones don’t • An interaction would indicate that the effect of one IV is different in the conditions of another IV

Factorial Design: Hypothesised results Does this graph show an interaction? Slow speech Fast speech duration Non-parallel lines always show interactions!!! Initial medial final position

Factorial Design: Degrees of freedom • Degrees of freedom are different for stating main effects and for interactions • Numerator (the first value in round brackets): • For main effects: df = k -1 (number of groups -1) • For interactions: df = (k-1)*(j-1)(k, j: number of levels in IV) • Denominator (second value in round brackets): • for both: df = N-j*k • Note: df for denominator is always found in row labelled “error”

Output for factorial design: Please interpret • Significant main effect of task: F(2,126)=132.9, p<0.001 • Signficant interaction: F(4,126)=6.3, p<0.001 Inter-action From http://www.uvm.edu/~dhowell/gradstat/psych341/lectures/Factorial1Folder/class4.html

Why are inhomogenious variances a problem? • Assume that the means and the variances are correlated (i.e. a higher mean in one sample coccurs with a higher variance in that sample – possibly caused by outliers and extreme values) • Then the mean is very unreliable • Since the ANOVA compares the variances and the means, you might get a significant difference which is not actually in the data!

Statistics for Linguistics Students