GRS LX 865 Topics in Linguistics

GRS LX 865Topics in Linguistics Week 6. Statistics etc.

Update on our sentence processing experiment… • Quick graph of reaction time per region

Update • Seems nice; there’s a difference in region 5 (where the NP, I, they, John were) and also in region 6. From slowest to fastest, John, he, NP, I. • Something like what we expected—but wait…

Update • Two further things that this didn’t account for: • Different people read at different speeds. • I is a lot shorter than the photographer. Might it go faster? • To take account of people’s reading speeds, tried average RT per character on the fillers.

Subject RT/c • Average RT per character was pretty much all over the map. • So at least it seemed worth factoring out. • Overhead?

Items? • It’s also important to look at the items. Were any always incorrect? Those might have been too hard or had something else wrong with them. ? • (Not clear that we actually care whether the answer was right)

End result so far? • So, taking that all into account, I ended up with this… Not what we were going for.

So… • There’s still work to be done. Since I’m not sure exactly what work that is, once again… no lab work to do. • Instead, we’ll talk about statistics generally… • Places to go: • http://davidmlane.com/hyperstat/ • http://www.stat.sc.edu/webstat/

Measuring things • When we go out into the world and measure something like reaction time for reading a word, we’re trying to investigate the underlying phenomenon that gives rise to the reaction time. • When we measure reaction time of reading I vs. they, we are trying to find out of there is a real, systematic difference between them (such that I is generally faster).

Measuring things • So, suppose for any given person, it takes A ms to read I and B ms to read they. • If our measurement worked perfectly, we’d get A whenever we measure for I and B whenever we measure for they. • But it’s a noisy world.

Measuring things • Measurement never works perfectly. • There is always additional noise of some kind or another. You’re likely to get a value near A when you measure I, but you’re not guaranteed to get A. • Similarly, there are differences between subjects, differences between items, differences of still other sorts…

A common goal • Commonly what we’re after is an answer to the question: are these two things that we’re measuring actually different? • So, we measure for I and for they. Of the measurements we’ve gotten, I seems to be around A, they seems to be around B, and B is a bit longer than A. The question is: given the inherent noise of measurement, how likely is it that we got that different just by chance?

Some stats talk • There are two major uses for statistics: • Describing a set of data in some comprehensible way • Drawing inferences from a sample about a population. • That last one is the useful one for us; by picking some random representative sample of the population, we can estimate characteristics of the whole population by measuring things in our sample.

Normally… • Many things we measure, with their noise taken into account, can be described (at least to a good approximation) by this “bell-shaped” normal distribution. • Often as we do statistics, we implicitly assume that this is the case…

First some descriptive stuff • Central tendency: • What’s the usual value for this thing we’re measuring? • Various ways to do it, most common way is by using the arithmetic mean (“average”). • Average is determined by adding up the measurements and dividing by the number of measurements.

Descriptive stats • Spread • How often is the measurement right around the mean? How far out does it get? • Range (maximum - minimum), kind of basic. • Variance, standard deviation: a more sophisticated measure of the width of the measurement distribution. • You describe a normal distribution in terms of two parameters, mean and standard deviation.

Interesting facts about stdev • About 68% of the observations will be within one standard deviation of the mean. • About 95% of the observations will be within two standard deviations of the mean. • Percentile (mean 80, score 75, stdev 5): 15.9

So, more or less, … • If we knew the actual mean of the variable we’re measuring and the standard deviation, we can be 95% sure that any given measurement we do will land within two standard deviations of that mean—and 68% sure that it will be within one. • Of course, we can’t know the actual mean. But we’d like to.

Confidence intervals • It turns out that you kind run this logic in reverse as well, coming up with a confidence interval (I won’t tell you how precisely, but here’s the idea): • Given where you see the measurements coming up, they must be 68% likely to be within 1 CI of the mean, and 95% likely to be within 2 CI of the mean, so the more measurements you have the better guess you can make. • A 95% CI like 209.9 < µ < 523.4 means “we’re 95% confident that the real mean is in there”.

Hypothesis testing • Testing to see if the means generating two distributions are actually different. • The idea is to determine how likely it is that we could get the difference we observe by chance. After all, you could roll 25 6’es in a row, it’s just very unlikely. (1/6)^25. (Null hypothesis = chance). • Once you estimate the sample means and standard deviations, this is something you basically look up (t-test, based on number of observations you make). This is what you see reported as p. • “p < 0.05” means there’s only a 5% chance this happened by accident.

Significance • Generally, 0.05 is taken to be the level of “significance”—if the difference you measure only has a 5% chance of having arisen by pure accident, than that difference is significant. • There’s no real magic about 0.05, it’s just a convention. Hard to say that 0.055 and 0.045 are seriously qualitatively different.

ANOVA • Analysis of variance—same as the t-test, except for more than two means at once. Still trying to discover if there are differences in the underlying distributions of several means that are unlikely to have arisen just by chance. • I hope to come back to this. Perhaps it can be tacked on to a different lab.

In general, the more samples you get, the better off you are—the more statistical power your analysis has. Also, the lower the variance, the significant level you’ve chosen. Technically, statistical power has to do with how likely it is that you will correctly reject a false null hypothesis. Statistical power

Correlation between two two measured variables is often measured in terms of (Pearson’s) r. If r is close to 1 or -1, the value of one variable can predict quite accurate the value of the other. If r is close to 0, predictive power is low. Chi-square test is supposed to help us decide if two conditions/factors are independent of one another or not. (Does knowing one help predict the effect of the other?) Correlation and Chi square

Much more to it… • Mainly I just wanted you to see some terminology. I hope to get some workable data from some experiment or lab we do that we can put into a stats program, perhaps just WebStat. • …

         

GRS LX 865 Topics in Linguistics