MA in English Linguistics Experimental design and statistics II

MA in English LinguisticsExperimental design and statistics II Sean Wallis Survey of English Usage University College London s.wallis@ucl.ac.uk

Outline • Plotting data with Excel™ • The idea of a confidence interval • Binomial  Normal  Wilson • Interval types • 1 observation • The difference between 2 observations • From intervals to significance tests

Plotting graphs with Excel™ • Microsoft Excel is a very useful tool for • collecting data together in one place • performing calculations • plotting graphs • Key concepts of spreadsheet programs: • worksheet - a page of cells (rows x columns) • you can use a part of a page for any table • cell - a single item of data, a number or text string • referred to by a letter (column), number (row), e.g. A15 • each cell can contain: • a string: e.g. ‘Speakers • a number: 0, 23, -15.2, 3.14159265 • a formula: =A15, =$A15+23, =SQRT($A$15), =SUM(A15:C15)

Plotting graphs with Excel™ • Importing data into Excel: • Manually, by typing • Exporting data from ICECUP • Manipulating data in Excel to make it useful: • Copy, paste: columns, rows, portions of tables • Creating and copying functions • Formatting cells • Creating and editinggraphs: • Several different types (bar chart, line chart, scatter, etc) • Can plot confidence intervals as well as points • You can download a useful spreadsheet for performing statistical tests: • www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls

Recap: the idea of probability • A way of expressing chance 0 = cannot happen 1 = must happen • Used in (at least) three ways last week P= true probability (rate) in the population p= observed probability in the sample a= probability ofpbeing different fromP • sometimes called probability of error,pe • found in confidence intervals and significance tests

The idea of a confidence interval • All observations are imprecise • Randomness is a fact of life • Our abilities are finite: • to measure accurately or • reliably classify into types • We need to express caution in citing numbers • Example (from Levin 2013): • 77.27% of uses of think in 1920s data have a literal (‘cogitate’) meaning

The idea of a confidence interval • All observations are imprecise • Randomness is a fact of life • Our abilities are finite: • to measure accurately or • reliably classify into types • We need to express caution in citing numbers • Example (from Levin 2013): • 77.27% of uses of think in 1920s data have a literal (‘cogitate’) meaning Really? Not 77.28, or 77.26?

The idea of a confidence interval • All observations are imprecise • Randomness is a fact of life • Our abilities are finite: • to measure accurately or • reliably classify into types • We need to express caution in citing numbers • Example (from Levin 2013): • 77% of uses of think in 1920s data have a literal (‘cogitate’) meaning

The idea of a confidence interval • All observations are imprecise • Randomness is a fact of life • Our abilities are finite: • to measure accurately or • reliably classify into types • We need to express caution in citing numbers • Example (from Levin 2013): • 77% of uses of think in 1920s data have a literal (‘cogitate’) meaning Sounds defensible. But how confident can we be in this number?

The idea of a confidence interval • All observations are imprecise • Randomness is a fact of life • Our abilities are finite: • to measure accurately or • reliably classify into types • We need to express caution in citing numbers • Example (from Levin 2013): • 77% (66-86%*) of uses of think in 1920s data have a literal (‘cogitate’) meaning

The idea of a confidence interval • All observations are imprecise • Randomness is a fact of life • Our abilities are finite: • to measure accurately or • reliably classify into types • We need to express caution in citing numbers • Example (from Levin 2013): • 77% (66-86%*) of uses of think in 1920s data have a literal (‘cogitate’) meaning Finally we have a credible range of values - needs a footnote* to explain how it was calculated.

F P p 0.1 0.3 0.5 0.7 0.9 Binomial  Normal  Wilson • Binomial distribution • Expected pattern of observations found when repeating an experiment for a givenP (here, P = 0.5) • Based on combinatorial mathematics

F P p 0.1 0.3 0.5 0.7 0.9 Binomial  Normal  Wilson • Binomial distribution • Expected pattern of observations found when repeating an experiment for a givenP (here, P = 0.5) • Based on combinatorial mathematics • Other values ofP have differentexpected distribution patterns 0.3 0.1 0.05

0.1 0.3 0.5 0.7 0.9 Binomial  Normal  Wilson • Binomial distribution • Expected pattern of observations found when repeating an experiment for a givenP (here, P = 0.5) • Based on combinatorial mathematics • Binomial  Normal • Simplifies the Binomial distribution(tricky to calculate) to two variables: • meanP • Pis the most likely value • standard deviationS • S is a measure of spread F S P p

0.1 0.3 0.5 0.7 0.9 Binomial  Normal  Wilson • Binomial distribution • Binomial  Normal • Simplifies the Binomial distribution(tricky to calculate) to two variables: • meanP • standard deviationS • Normal  Wilson • The Normal distribution predictsobservationsp given a populationvalueP • We want to do the opposite: predict the true population valuePfrom an observationp • We need a different interval, the Wilson score interval F p P

 Binomial  Normal • Any Normal distribution can be defined by only two variables and the Normal function z  population mean P  standard deviationS =  P(1 – P) / n F • With more data in the experiment, S will be smaller z . S z . S 0.1 0.3 0.5 0.7 p

 Binomial  Normal • Any Normal distribution can be defined by only two variables and the Normal function z  population mean P  standard deviationS =  P(1 – P) / n F z . S z . S • 95% of the curve is within ~2 standard deviations of the expected mean • the correct figure is 1.95996! • the critical value of z for an error level of 0.05. 2.5% 2.5% 95% 0.1 0.3 0.5 0.7 p

 Binomial  Normal • Any Normal distribution can be defined by only two variables and the Normal function z  population mean P  standard deviationS =  P(1 – P) / n F z . S z . S • 95% of the curve is within ~2 standard deviations of the expected mean • The ‘tail areas’ • For a 95% interval, total 5% 2.5% 2.5% 95% 0.1 0.3 0.5 0.7 p

The single-sample ztest... • Is an observationp > z standard deviations from the expected (population) mean P? • If yes, p is significantly different from P F observation p z . S z . S 2.5% 2.5% P 0.1 0.3 0.5 0.7 p

...gives us a “confidence interval” • The interval about pis called the Wilson score interval (w–, w+) observation p • This interval reflects the Normal interval about P: • If P is at the upper limit of p,p is at the lower limit of P F w– w+ (Wallis, 2013) P 2.5% 2.5% 0.1 0.3 0.5 0.7 p

observation p F  p' = p + z²/2n 1 + z²/n  s' =  p(1 – p)/n + z²/4n²  w– w+ 1 + z²/n P 2.5% 2.5%  (w–, w+) = (p' – s', p' + s') 0.1 0.3 0.5 0.7 p ...gives us a “confidence interval” • The Wilson score interval (w–, w+) has a difficult formula to remember

observation p F  p' = p + z²/2n 1 + z²/n  s' =  p(1 – p)/n + z²/4n²  w– w+ 1 + z²/n P 2.5% 2.5%  (w–, w+) = (p' – s', p' + s') 0.1 0.3 0.5 0.7 p ...gives us a “confidence interval” • The Wilson score interval (w–, w+) has a difficult formula to remember • You do not need to know this formula! • You can use the 2x2 spreadsheet! • www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls

An example: uses of think • Magnus Levin (2013) examined uses of think in the TIME corpus in three time periods • This is the graph wecreated in Excel • http://corplingstats.wordpress.com/2012/04/03/plotting-confidence-intervals/

An example: uses of think • Magnus Levin (2013) examined uses of think in the TIME corpus in three time periods • This is the graph wecreated in Excel • Not an alternation study • Categories are not “choices” • The graph plots the probability of readingdifferent uses of theword think (given thewriter used the word) • http://corplingstats.wordpress.com/2012/04/03/plotting-confidence-intervals/

An example: uses of think • Magnus Levin (2013) examined uses of think in the TIME corpus in three time periods • This is the graph wecreated in Excel • Has Wilson score intervals for eachpoint • http://corplingstats.wordpress.com/2012/04/03/plotting-confidence-intervals/

An example: uses of think • Magnus Levin (2013) examined uses of think in the TIME corpus in three time periods • This is the graph wecreated in Excel • Has Wilson score intervals for eachpoint • It is easy to spot whereintervals overlap • A quick test forsignificant difference • http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/

An example: uses of think • Magnus Levin (2013) examined uses of think in the TIME corpus in three time periods • Wilson score intervalsfor each point • It is easy to spot whereintervals overlap • A quick test forsignificant difference • No overlap = significant • Overlaps point = ns • Otherwise test fully • http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/

A quick test for significant difference • No overlap = significant • Overlaps point = ns • Otherwise test fully w1+ p1 w2+ w1– p2 w2– • http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/

A quick test for significant difference • No overlap = significant • Overlaps point = ns • Otherwise test fully w1+ Upper bound p1 Observed probability w2+ w1– Lower bound p2 w2– • http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/

w1+ p1 w2+ w1– p2 w2– Test 1: Newcombe’s test • This test is used when data is drawn from different populations (different years, groups, text categories) • We calculate a new Newcombe-Wilson interval (W–, W+): • W– = -(p1–w1–)2 + (w2+–p2)2 • W+ = (w1+–p1)2 + (p2–w2–)2 (Newcombe, 1998) • http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/

w1+ p1 w2+ w1– p2 w2– Test 1: Newcombe’s test • This test is used when data is drawn from different populations (different years, groups, text categories) • We calculate a new Newcombe-Wilson interval (W–, W+): • W– = -(p1–w1–)2 + (w2+–p2)2 • W+ = (w1+–p1)2 + (p2–w2–)2 • We then compare W– < (p2 – p1) < W+ (Newcombe, 1998) • http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/

w1+ p1 w2+ w1– p2 w2– Test 1: Newcombe’s test • This test is used when data is drawn from different populations (different years, groups, text categories) • We calculate a new Newcombe-Wilson interval (W–, W+): • W– = -(p1–w1–)2 + (w2+–p2)2 • W+ = (w1+–p1)2 + (p2–w2–)2 • We then compare W– < (p2 – p1) < W+ (Newcombe, 1998) (p2 – p1) < 0 = fall • http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/

w1+ p1 w2+ w1– p2 w2– Test 1: Newcombe’s test • This test is used when data is drawn from different populations (different years, groups, text categories) • We calculate a new Newcombe-Wilson interval (W–, W+): • W– = -(p1–w1–)2 + (w2+–p2)2 • W+ = (w1+–p1)2 + (p2–w2–)2 • We then compare W– < (p2 – p1) < W+ • We only need tocheck the innerinterval (Newcombe, 1998) • http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/

Test 2: 2 x 2 chi-square • This test is used when data is drawn from the same population of speakers (e.g. grammar -> grammar) • We put the data into a 2 x 2 table • www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls (Wallis, 2013) • http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/

Test 2: 2 x 2 chi-square • This test is used when data is drawn from the same population of speakers (e.g. grammar -> grammar) • We put the data into a 2 x 2 table • www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls • The test uses the formula 2 = (o –e)2 • wheree = rxc / n e (Wallis, 2013) • http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/

Expressing change • Percentage difference is a very common idea: • “X has grown by 50%” or “Y has fallen by 10%” • We can calculate percentage difference by • d% = d / p1whered = p2–p1 • We can put Wilson confidence intervals on d% • BUT Percentage difference can be very misleading • It depends heavily on the starting pointp1 (might be 0) • What does it mean to say • something has increased by 100%? • it has decreased by 100%? • It is better to simply say that • “the rate of ‘cogitate’ uses of think fell from 77% to 59%” • http://corplingstats.wordpress.com/2012/08/14/plotting-confidence-intervals-2/

Summary • We analyse results to help us report them • Graphs are extremely useful! • You can include graphs and tables in your essays • If a result is not significant, say so and move on… • Don’t say it is “nearly significant” or “indicative” • An error level of 0.05 (or 95% correct) is OK • Some people use 0.01 (99%) but this is not really better • Wilson confidence intervals tell us • Where the true value is likely to be • Which differences between observations are likely to be significant • If intervals partially overlap, perform a more precise test

Summary • Always say which test you used, e.g. • “We compared ‘cogitate’ uses of think with other uses, between the 1920s and 1960s periods, and this was significant according to 2 at the 0.05 error level.” • Tell your reader that you have plotted (e.g.) “95% Wilson confidence intervals” in a footnote to the graph. • For advice on deciding which test to use, see • http://corplingstats.wordpress.com/2012/04/11/choosing-right-test/ • The tests you will need in one spreadsheet: • www.ucl.ac.uk/english-usage/statspapers/2x2chisq.xls

References • Levin, M. 2013. The progressive in modern American English. In Aarts, B., J. Close, G. Leech and S.A. Wallis (eds). The Verb Phrase in English: Investigating recent language change with corpora. Cambridge: CUP. • Newcombe, R.G. 1998. Interval estimation for the difference between independent proportions: comparison of eleven methods. Statistics in Medicine17: 873-890 • Wallis, S.A. 2013. z-squared: The origin and application of χ². Journal of Quantitative Linguistics20: 350-378. • Wilson, E.B. 1927. Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association22: 209-212 • Assorted statistical tests: • www.ucl.ac.uk/english-usage/staff/sean/resources/2x2chisq.xls

MA in English Linguistics Experimental design and statistics II

MA in English Linguistics Experimental design and statistics II

Presentation Transcript

Statistics of Experimental Design

English Linguistics 1

EMiL Experimental Methods in Linguistics

English Linguistics 1

English Linguistics 1

English Linguistics 1

English Linguistics 1

MA in English

MA 411 BUSINESS STATISTICS II

Experimental Design: Part II

Linguistics II

English for Design II

HAs in English Language and Linguistics

BM3 Introduction to English Linguistics Part II

BM3 Introduction to English Linguistics Part II

BM3 Introduction to English Linguistics Part II

Linguistics and English Language

Why Biologists Need Sampling, Experimental Design, and Statistics

Ma in linguistics 8010000200

The Nature of Statistics: Experimental Design

MA II (MCC-MEA) 2014-2015 English Language and Linguistics