Experimental design for linguists l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 112

Experimental Design for Linguists PowerPoint PPT Presentation


  • 226 Views
  • Updated On :
  • Presentation posted in: General

Experimental Design for Linguists. Charles Clifton, Jr. University of Massachusetts Amherst. Slides available at http://people.umass.edu/cec/teaching.html. Goals of Course. Why should linguists do experiments? How should linguists do experiments?

Related searches for Experimental Design for Linguists

Download Presentation

Experimental Design for Linguists

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Experimental design for linguists l.jpg

Experimental Design for Linguists

Charles Clifton, Jr.

University of Massachusetts Amherst

Slides available at http://people.umass.edu/cec/teaching.html


Goals of course l.jpg

Goals of Course

  • Why should linguists do experiments?

  • How should linguists do experiments?

    • Part 1: General principles of experimental design

  • How should linguists do experiments?

    • Part 2: Specific techniques for (psycho)linguistic experiments

Schütze, C. (1996). The empirical basis of linguistics. Chicago: University of Chicago Press.

Cowart, W. (1997). Experimental syntax: Applying objective methods to sentence judgments. Thousand Oaks, CA: Sage Publications Inc.

Myers, J. L., & Well, A. D. (in preparation). Research design and statistical analysis (3d ed.). Mahwah, NJ: Erlbaum.


1 acceptability judgments l.jpg

1. Acceptability judgments

  • Check theorists’ intuitions about acceptability of sentences

    • Acceptability, grammaticality, naturalness, comprehensibility, felicity, appropriateness…

  • Aren’t theorists’ intuitions solid?


Example of acceptability judgment cowart 1997 l.jpg

Example of acceptability judgment: Cowart, 1997

  • Subject extraction: I wonder who you think (that) likes John.

  • Object extraction: I wonder who you think (that) John likes.


Stability of ratings cowart 1997 l.jpg

Stability of ratings (Cowart,1997)


2 sometimes linguists are wrong l.jpg

2. Sometimes linguists are wrong…

  • Superiority effects

    • I’d like to know who hid it where.

    • *I’d like to know where who hid it.

  • Ameliorated by a third wh-phrase?

    • ?I’d like to know where who hid it when.


Maybe paired comparison preference judgments l.jpg

…maybe. Paired-comparison preference judgments

  • a. I’d like to know who hid it where.86%

  • b. (*)I’d like to know where who hid it.14%76%

  • c. (*)I’d like to know where who hid it when.24%.49%

  • d. I’d like to know who hid it where when.51%

  • a-b basic superiority violation

  • b-c heads-on comparison, extra wh “when” hurts, doesn’t help

  • c-d the “ameliorated” superiority violation, c, seems good when compared to its non-superiority-violation counterpart

  • Clifton, C. Jr., Fanselow, G., & Frazier, L. (2006). Amnestying superiority violations: Processing multiple questions. Linguistic Inguiry, 37(51-68).


    Another instance l.jpg

    Another instance…

    • Question: is the antecedent of an ellipsis a syntactic or a semantic object? Why is (a) good and (b) bad?

    • The problem was to have been looked into, but obviously nobody did.

    • #The problem was looked into by John, and Bob did too.

    • Andrew Kehler’s suggestion: semantic objects for cause-effect discourse relations, syntactic objects for resemblance relations. Corpus data bear his suggestion out.


    But an experimental approach l.jpg

    But an experimental approach…

    Kim looked into the problem even though Lee did. (causal, syntactic parallel)

    Kim looked into the problem just like Lee did. (resemblance)

    The problem was looked into by Kim even though Lee did. (causal, nonparallel)

    The problem was looked into by Kim just like Lee did. (resemblance)

    Frazier, L., & Clifton, C. J. (2006). Ellipsis and discourse coherence. Linguistics and Philosophy, 29, 315-346.


    Context effects l.jpg

    Context effects

    • Linguists: think of minimal pairs

    • The contrast between a pair may affect judgments

    • Hirotani: Production of Japanese sentences

      • The experimental context in which sentences are produced affects their prosody


    Hirotani experiment l.jpg

    Hirotani experiment

    • a.Embedded wh-question (ka associated to na’ni-o) (# = Major phrase boundary)

    • Mi’nako-san-waYa’tabe-kun-ga na’ni-o moyasita’ka (#) gumon-sita’-nokai?

    • Minako-Ms.-TOP Yatabe-Mr.-NOMwhat-ACC burned-Q stupid question-did-Q (-wh)

    • ‘Did Minako ask stupidly what Yatabe burned?’ (Yes, it seems (she) asked such a question.’)

    • Matrix wh-question (ndai associated to na’ni-o)

    • Mi’nako-san-waYa’tabe-kun-gana’ni-omoyasita’ka (#)gumon-sita’-ndai?

    • Minako-Ms.-TOP Yatabe-Mr.-NOMwhat-ACC burned-Q stupid question-did-Q (+wh)

    • ‘What did Minako ask stupidly whether Yatabe burned?” (‘The letters (he) received from (his) ex-girlfriend.’)


    Hirotani results l.jpg

    Hirotani results

    Percentage of insertion of MaP before phrase with question particle

    Hirotani, Mako. (submitted). Prosodic phrasing of wh-questions in Japanese


    3 unacceptable grammaticality l.jpg

    3. Unacceptable grammaticality

    • Old multiple self-embedding sentence experiments

      • Miller & Isard 1964: sentence recall, right-branching vs. self-embedded (1-4)

        • She liked the man that visited the jeweler that made the ring that won the prize that was given at the fair.

        • The prize that the ring that the jeweler that the man that she liked visited made won was given at the fair.

        • Median trial of first perfect recall: 2.25 vs never

      • Stolz 1967, clausal paraphrases: subjects never understood the self-embedded sentences anyway

    Miller, G. A., & Isard, S. (1964). Free recall of self-embedded English sentences. Information and Control, 4, 292-303. Stolz, W. (1967). A study of the ability to decode grammatically novel sentences. Journal of verbal Learning and verbal Behavior, 6, 867-873..


    3 acceptable ungrammaticality l.jpg

    3’. Acceptable ungrammaticality

    • Speeded acceptability judgment and acceptability rating

    • %OKRating

    • OK

    • None of the astronomers saw the comet, but John did. 83%4.36

    • B. Embedded VP

    • Seeing the comet was nearly impossible, but John did. 66%3.71

    • C. VP w/ trace

    • The comet was nearly impossible to see, / but John did. 44%3.27

    • D. Neg adj

    • The comet was nearly unseeable, / but John did. 17%2.21

    Arregui, A., Clifton, C. J., Frazier, L., & Moulton, K. (2006). Processing elided verb phrases with flawed antecedents: The recycling hypothesis. Journal of Memory and Language, 55, 232-246.


    4 provide additional evidence about linguistic structure l.jpg

    4. Provide additional evidence about linguistic structure

    • A direct experimental reflex of structure would be nice

      • But we don’t have one

    • Are traces real?

      • Filled gap effect: reading slowed at us in My brother wanted to know who Ruth will bring (t) us home to at Christmas.

      • Compared to My brother wanted to know if Ruth will bring us home to at Christmas.

    Stowe, L. (1986). Parsing wh-constructions: Evidence for on-line gap location. Language and Cognitive Processes, 1, 227-246.


    Are traces real cont l.jpg

    Are traces real, cont.

    • Pickering and Barry. “no.”

    • Possible evidence

      • That’s the pistol with which the heartless killer shot the hapless man yesterday afternoon t.

      • That’s the garage with which the heartless killer shot the hapless man yesterday afternoon t.

    • Reading disrupted at shot in the second example, far before the trace position

      • But who’s to say that the parser has to wait to project the trace?

    Pickering, M., & Barry, G. (1991). Sentence processing without empty categories. Language and Cognitive Processes, 6, 229-259.

    Traxler, M. J., & Pickering, M. J. (1996). Plausibility and the processing of unbounded dependencies: An eye-tracking study. Journal of Memory and Language, 35, 454-475.


    5 is grammatical knowledge used l.jpg

    5. Is grammatical knowledge used?

    • Serious question early on

      • “psychological reality” experiments

    • Direct experimental attack did not succeed

      • Derivational theory of complexity

    • Indirect experimental attack has succeeded

      • Build experimentally-based theory of processing


    6 test theories of how grammatical knowledge is used l.jpg

    6. Test theories of how grammatical knowledge is used

    • Moving beyond modularity debate – more articulated questions about real-time use of grammar

    • Phillips: parasitic gaps, selfpaced reading

      • The superintendent learned which schools/students the plan to expand _ … overburdened _. (slowed at expand after students – plausibility effect)

      • The superintendent learned which schools/students the plan that expanded _ … overburdened _. (no differential slowing at expand – no plausibility effect)

    Phillips, C. (2006) The real-time status of island phenomena. Language, 82, 795-823.


    Ii how to do experiments part 1 general design principles l.jpg

    II: How to do experiments. Part 1, General design principles

    • Dictum 1: Formulate your question clearly

    • Dictum 2: Keep everything constant that you don’t want to vary

    • Dictum 3: Know how to deal with unavoidable extraneous variability

    • Dictum 4: Have enough power in your experiment

    • Dictum 5: Pay attention to your data, not just your statistical tests


    Dictum 1 formulate your question clearly l.jpg

    Dictum 1: Formulate your question clearly

    • Independent variable: variation controlled be experimenter, not by what subject does

    • Dependent variable: variation observed in subject’s behavior, perhaps dependent on IV

    • Operationalization of variables


    Formulate your question l.jpg

    Formulate your question

    • Question: Do you identify a focused word faster than a non-focused word?

      • Must clarify: Syntactic focus? Prosodic focus? Semantic focus?

      • Must operationalize

        • Syntactic focus – Clefting? Fronting? Other device?

        • Prosodic focus – Natural speech? Manipulated speech? Synthetic speech? Target word or context?


    Formulate your question22 l.jpg

    Formulate your question

    • Question: does discourse context guide or filter parsing decisions?

      • Clarify question: does discourse satisfy reference? establish plausibility? set up pragmatic implications? create syntactic structure biases?

      • Operationalize IV: Lots of choices here

        • But also have to worry about dependent variable…


    Choose appropriate task dv l.jpg

    Choose appropriate task, DV

    • Question about focus: need measure of speed of word identification

      • Conventional possibilities: lexical decision, naming, phoneme detection, reading time

    • Question about “guide vs filter:” probably need explicit theory of your task

      • Tanenhaus: linking hypothesis

      • E.g. eye movements in reading: tempting to think that “guide” implicates “early measures,” “filter” implicated “late measures.”

        • But what’s early, what’s late? Need model of eye movement control in parsing.


    Subdictum a never leave your subjects to their own devices l.jpg

    Subdictum A: Never leave your subjects to their own devices

    • It may not matter a lot

      • Cowart example: 5-point acceptability rating

        • A. “….base your responses solely on your gut reaction”

        • B. “…would you expect the professor to accept this sentence [for a term paper in an advanced English course]?”

    • But sometimes it does matter…


    Cowart 1997 l.jpg

    Cowart 1997


    Dictum 2 try to keep everything constant except what you want to vary l.jpg

    Dictum 2: Try to keep everything constant except what you want to vary

    • Try to hold extraneous variables constant through norms, pretests, corpora…

    • When you can’t hold them constant, make sure they are not associated (confounded) with your IV


    An example staub in press l.jpg

    An example: Staub, in press

    Eyetracking: does the reader honor intransitivity? Compare unaccusative (a), unergative (b), and optionally transitive)

    a. When the dog arrived the vet1 and his new assistant took off the muzzle2.

    b. When the dog struggled the vet1 and his new assistant took off the muzzle2.

    c. When the dog scratched the vet1 and his new assistant took off the muzzle2.

    Critical regions: held constant (the vet…; took off the muzzle).

    Manipulated variable (verb): conditions equated on average length and average word frequency of occurrence.

    Better: match on additional factors (number of stressed syllables, concreteness, plausibility as intransitive, ….)

    Better: don’t just have overall match, but match the items in each triple.

    Staub, A. (in press). The parser doesn't ignore intransitivity, after all. Journal of Experimental Psychology: Learning, Memory and Cognition.


    Another example np vs s comp bias l.jpg

    Another example: NP vs S-comp bias

    • Kennison (2001), eyetracking during reading of sentences like:

    • The athlete admitted/revealed (that) his problem worried his parents….

    • The athlete admitted/revealed his problem because his parents worried…

    • Conflicting results from previous research (Ferreira & Henderson, 1990; Trueswell, Tanenhaus, & Kello, 1993): does a bias toward use as S-complement (admit) reduce the disruption at the disambiguating word worried?

    • Problems in previous research: plausibility of direct object analysis not controlled (e.g., Trueswell et al., ambiguous NP (his problem) rated as implausible as direct object of S-biased verb)

    • Kennison, normed material, equated plausibility of subject-verb-object fragment for NP- and S-comp biased verbs; found reading disrupted equally at disambiguating verb worried for both types of verbs.

    Kennison, S. M. (2001). Limitations on the use of verb information during sentence comprehension. Psychonomic Bulletin & Review, 8, 132-137.


    What happens when there is unavoidable variation l.jpg

    What happens when there is unavoidable variation?

    • Subdictum B: When in doubt, randomize

      • Random assignment of subjects to conditions

      • Questionnaire: order of presentation of items?

        • Single randomization: problems

        • Different randomization for each subject

        • Constrained randomizations

    • Equate confounds by balancing and counterbalancing

      • Alternative to random assignment of subject to conditions: match squads of subjects


    Counterbalancing of materials l.jpg

    Counterbalancing of materials

    • Counterbalancing

      • Ensure that each item is tested equally often in each condition.

      • Ensure that each subject receives an equal number of items in each condition.

    • Why is it necessary?

      • Since items and subjects may differ in ways that affect your DV, you can’t have some items (or subjects) contribute more to one level of your IV than another level.


    Sometimes you don t have to counterbalance l.jpg

    Sometimes you don’t have to counterbalance

    • If you can test each subject on each item in each condition, life is sweet

    • E.g., Ganong effect (identification of consonant in context)

      • Vary VOT in 8 5-ms steps

        • /dais/ - /tais/

        • /daip/ - /taip/

      • Classify initial segment as /d/ or /t/

        • Present each of the 80 items to each subject 10 times

        • Ganong effect: biased toward /t/ in “type,” /d/ in “dice”

    Connine, C. M., & Clifton, C., Jr. (1987). Interactive use of information in speech perception. Journal of Experimental Psychology: Human Perception and Performance, 13, 291-299.


    If you have to counterbalance l.jpg

    If you have to counterbalance…

    • Simple example

      • Questionnaire, 2 conditions, N items

      • Need 2 versions, each with N items, N/2 in condition 1, remaining half in condition 2

        • Versions 1 and 2, opposite assignment of items to conditions

    • More general version

      • M conditions, need some multiple of M items, and need M different versions

        • Embarrassing if you have 15 items, 4 conditions…

        • That means that some subjects contributed more to some conditions than others did; bad, if there are true differences among subjects


    Counterbalancing things besides items l.jpg

    Counterbalancing things besides items

    • Order of testing

      • Don’t test all Ss in one condition, then the next condition…

      • At least, cycle through one condition before testing a second subject

      • Fancier, latin square

        • Avoid minor confound if always test cond 1 before cond 2 etc.

        • N x n square, sequence x squad, containing condition numbers, such that each condition occurs once in each column, each order

    • Location of testing

      • E.g., 2 experiment stations


    Experimental design for linguists34 l.jpg

    Experimental Design for Linguists

    Charles Clifton, Jr.

    University of Massachusetts Amherst

    Slides available at http://people.umass.edu/cec/teaching.html

    and at

    http://coursework.stanford.edu


    Goals of course35 l.jpg

    Goals of Course

    • Why should linguists do experiments?

    • How should linguists do experiments?

      • Part 1: General principles of experimental design

    • How should linguists do experiments?

      • Part 2: Specific techniques for (psycho)linguistic experiments

    Schütze, C. (1996). The empirical basis of linguistics. Chicago: University of Chicago Press.

    Cowart, W. (1997). Experimental syntax: Applying objective methods to sentence judgments. Thousand Oaks, CA: Sage Publications Inc.

    Myers, J. L., & Well, A. D. (in preparation). Research design and statistical analysis (3d ed.). Mahwah, NJ: Erlbaum.


    Ii how to do experiments part 1 general design principles36 l.jpg

    II: How to do experiments. Part 1, General design principles

    • Dictum 1: Formulate your question clearly

    • Dictum 2: Keep everything constant that you don’t want to vary

    • Dictum 3: Know how to deal with unavoidable extraneous variability

    • Dictum 4: Have enough power in your experiment

    • Dictum 5: Pay attention to your data, not just your statistical tests


    So how do you randomize l.jpg

    So how do you randomize?

    • E-mail me ([email protected]) and I’ll send you a powerful program

    • But for most purposes, check out

      http://www-users.york.ac.uk/~mb55/guide/randsery.htm

      Or

      http://www.randomizer.org/index.htm


    Factor out confounds l.jpg

    Factor out confounds

    • Factorial design

      • An example, discussed earlier: Arregui et al., 2006

      • Initial experiment contained a confound; corrected in second experiment by adding a second factor


    Arregui et al rating study l.jpg

    Arregui et al., rating study

    • Acceptability rating

    • RatingRating

    • clause 1

    • OK

    • None of the astronomers saw the comet, but John did. 4.36 4.53

    • B. Embedded VP

    • Seeing the comet was nearly impossible, but John did. 3.71 4.41

    • C. VP w/ trace

    • The comet was nearly impossible to see, but John did. 3.27 4.81

    • D. Neg adj

    • The comet was nearly unseeable, but John did. 2.21 4.39

    Arregui, A., Clifton, C. J., Frazier, L., & Moulton, K. (2006). Processing elided verb phrases with flawed antecedents: The recycling hypothesis. Journal of Memory and Language, 55, 232-246.


    Factorial design l.jpg

    Factorial Design

    Factor 1: syntactic form of initial clause (4 levels)

    Factor 2: presence or absence of ellipsis (2 levels)


    An interaction l.jpg

    An interaction

    Interaction: The size of the effect of one factor differs among the different levels of the other factor.


    Factorial designs in hypothesis testing l.jpg

    Factorial Designs in Hypothesis Testing

    • Cowart (1997), that-trace effect

      • Question: is it bad to extract a subject over that

        • ?I wonder who you think (that) t likes John.

      • Acceptability judgment: worse with that

    • But: underlying theory talks just about extracting a subject.

      • Does acceptability suffer with extraction of object over that?

        • I wonder who you think (that) John likes t.

      • Need to do factorial experiment

        • Factor 1: presence vs. absence of that

        • Factor 2: subject vs. object extraction


    The results from before l.jpg

    The results (from before)

    A clear interaction.


    A worry about scales l.jpg

    A worry about scales

    • Interactions of the form “the effect of Factor A is bigger at Level 1 than at Level 2 of Factor B.

      • Cowart, effect of that bigger at subject than object extraction

    • Types of scales

      • Ratio: true zero, equal intervals, can talk about ratios (time, distance, weight)

      • Interval: equal intervals, but no true zero (temperature, dates on a calendar)

      • Ordinal: only more or less (ratings on rating scale, measures of acceptability, measures of difficulty)


    Slide45 l.jpg

    Is there really an interaction?


    Disordinal and crossover interactions l.jpg

    Disordinal and crossover interactions


    An example of an important but problematic experiment frazier rayner 1982 l.jpg

    An example of an important but problematic experiment: Frazier & Rayner, 1982

    Closure:

    LC: Since Jay always jogs a mile and a halfthis seems like a short distance to him.

    4040 ms/ch

    EC: Since Jay always jogs a mile and a halfseems like a very short distance to him.

    3554 ms/ch

    Attachment:

    MA: The lawyers think his second wife will claim the entire family inheritance.

    36 ms/ch

    NMA: The second wife will claim the entire family inheritancebelongs to her.

    3751 ms/ch

    Data shown: ms/character first pass times for the colored regions.

    Problems???

    Frazier, L., & Rayner, K. (1982). Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology, 14, 178-210.


    Dictum 3 know how to deal with unavoidable extraneous variability l.jpg

    Dictum 3: Know how to deal with unavoidable extraneous variability

    • i.e., know some statistics

    • Measures of central tendency (“typical”)

      • Mean (average, sum/N)

      • Median (middle value)

      • Mode (most frequent value)

    • Measures of variability

      • Variance (Average squared deviation from mean)

      • Average deviation (Average absolute deviation from median)


    Computation of variance l.jpg

    Computation of Variance


    Variance in an experiment l.jpg

    Variance in an experiment

    • Systematic variance: variability due to manipulation of IV and other variables you can identify

    • Random variance: variability whose origin you’re ignorant of

    • Point of inferential statistics: is there really variability associated with IV, on top of other variability?

      • Is there a signal in the noise?


    Best way to deal with extraneous variability minimize it l.jpg

    Best way to deal with extraneous variability: Minimize it!

    • Keep everything constant

      • Reduce experimental noise

        • See the signal easier

      • Keep environment, instructions, distractions, experimenter, response manipulanda, etc. constant

      • Pretest subjects and select homogeneous ones, if that suits your purposes


    One way to minimize extraneous variance within subject designs l.jpg

    One way to minimize extraneous variance: Within-subject designs

    • Subjects differ

      • …a lot, in some measures, eg. Reading speed, reaction time

    • Present all levels of your IV to each subject

      • Assume the subject effect is a constant across all the levels.

      • Differences among conditions thus abstracted from subject differences

    • Counterbalancing necessary

      • Test each item in each condition for an equal number of subjects.

    • Worry about experience changing what your subject did

      • E.g., will reading an unreduced relative clause (The horse that was raced past the barn fell) affect reading of a reduced relative clause sentence?


    Statistical tests statistical inference l.jpg

    Statistical tests/statistical inference

    • Never expect observed condition means to be exactly the same

      • Just noise? Or signal + noise?

    • Statistical inference: is there really a signal?

      • p value: the probability you’d obtain a difference among the means that is as large as what you observed, if the true signal is zero

      • “null hypothesis” test


    Basic logic of statistical tests t f etc l.jpg

    Basic logic of statistical tests (t, F, etc.)

    • Get one estimate of the variabilty due to noise + any signal

      • Estimate from the variation among the observed mean values in the different conditions

    • Get another estimate of the variabilty due to noise alone

      • Estimate from how much variation there is among subjects, within a condition

    • If signal = 0, ratio is expected to be 1

      • If it’s enough bigger than 1, then the signal is likely to be non-zero


    Underlying model l.jpg

    Underlying model

    • Subjects are a random sample from some population

    • You can make inferences about variability in the population from the observed variability in the sample

    • Logical inference: “if the size of the signal in the population is zero, the probability of getting a difference among the means that is as big as we observed is p” where p is the level of significance

      • If p is small enough, reject the proposal that the population signal is zero


    Between subject design l.jpg

    Between-subject design

    • Estimate of signal + noise: variability among the condition means

    • Estimate of noise alone: variability among the subject means in each condition

    • F = MSbetween conds/MSwithin cond

      • MS, not exactly variance; must divide sum of squares by df, not by N


    Within subject design l.jpg

    Within-subject design

    • Estimate of signal + noise: variability between the condition means

    • Estimate of noise alone

      • Get a measure of the variability among condition means for each subject

      • Calculate the variability among these measures

      • Subjects x treatment interaction

        • How much the the size of the treatment effect differs among subjects is an estimate of error variability.

    • F = MSbetween conditions/MSsubjects x treatments


    Advanced topics l.jpg

    Advanced topics

    • Multi-factor designs, tests for interactions

    • Treat counterbalancing factors as factors in ANOVA

      • E.g., if have 4 conditions, 4 counterbalancing groups, differing in assignment of items to conditions, you can treat groups as a between-subject factor and pull out variability due to items from the subjects x treatment error term

    • Statistical accommodation of extraneous variation

      • Analysis of covariance

      • Multi-level, hierarchical designs

    Pollatsek, A., & Well, A. D. (1995). On the use of counterbalanced designs in cognitive research: A suggestion for a better and more powerful analysis. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 785-794.

    Forthcoming special issue of the Journal of Memory and Language on new and alternative data analyses.


    Dictum 4 have enough power to overcome extraneous variability l.jpg

    Dictum 4: Have enough power to overcome extraneous variability

    • Add more data!

      • Minimizes noise component of differences among condition means

    • Law of large numbers

      • The larger the sample size, the more probable it is that the sample mean comes arbitrarily close to the population mean

      • If you’re (almost) looking at population means, any differences have to be real – not sampling error


    Law of large numbers l.jpg

    Law of large numbers

    • Imagine a population with a variance v2.

    • Imagine you take a bunch of independent samples from this population, each sample of size N.

    • Each sample will have a mean value.

    • These mean values will have a variance, which turns out to be v2/N.

    • This variance will be smaller as N gets larger.


    A sampling simulation l.jpg

    A sampling simulation

    • The effect of sample size on the variability of sample means

      • Bigger samples, smaller variability

      • Standard deviation = square root of variance

    N = 5

    N = 25

    http://onlinestatbook.com/stat_sim/index.html


    Means from larger ns have less noise l.jpg

    Means from larger Ns have less noise

    • Holds for subject means

      • More subjects, means reflect vagaries of sample less; means have less noise

    • Holds for item means too

      • More items, means less affected by peculiarity of individual items

    • OK, you can have too many items and burn out your subjects


    Have enough power l.jpg

    Have enough power….

    • Back to holding everything constant

      • First reason: don’t want variables confounded with our independent variable

      • Second reason: minimize noise. Less noise, more power.


    Dictum 5 pay attention to your data not just your statistical tests l.jpg

    Dictum 5: Pay attention to your data, not just your statistical tests

    • Look at your data, graph them, try to make sense out of them

      • Don’t just look for p < .05!

    • Examine confidence intervals

    • Look at your data distributions

      • Stem and leaf graphs

      • By subjects…


    Confidence intervals l.jpg

    Confidence intervals

    • Confidence intervals (of means over items and subjects)

      • If you have a sample mean and you know the true population standard deviation of the sample σM , you can say that there is a 95% chance that the true population mean is within +/- 1.96 * σM your sample mean.

      • But of course you don’t know σM so you have to estimate it from your data and use the t distribution.

      • But then you can present your means as X +/- CI

    • A simulation: http://onlinestatbook.com/stat_sim/index.html


    Confidence intervals66 l.jpg

    Confidence Intervals

    • Do you want to look at individual item data?

      • Don’t make too much of the tea leaves

      • Consider getting a confidence interval on the individual item means

      • Example: Self-paced reading time

        • Cond 1: This table is slightly dirty and the manager wants it removed. (minimum standard adjective)

        • Cond 1: This table is slightly clean and the manager wants it removed. (maximum standard adjective)

        • Reading time, clause 2, slower for maximum than minimum standard adjective

      • Are some items more effective than others?


    Confidence intervals individual items l.jpg

    Confidence intervals, individual items

    • Each item has 12 different observations (different subjects) in each condition.

    • Can measure the variability among these subject data points for max std and min std adjective

      • And from that, estimate the variability of the difference, and from that, the confidence interval of the difference


    Dictum 5 pay attention to your data not just your statistical tests69 l.jpg

    Dictum 5: Pay attention to your data, not just your statistical tests

    • Graph your data

    • Examine confidence intervals

    • Look at the distributions of your means

      • Stem and leaf graphs

      • By subjects…and by items


    Slide70 l.jpg

    Maria asked Bob to invite Fred or Sam to the barbecue. She didn't have enough room to invite both.

    Maria asked Bob not to invite Fred or Sam to the barbecue. She didn't have enough room to invite both.

    Frequency Stem & Leaf

    1.00 0 . 9

    13.00 1 . 1112223334444

    22.00 1 . 5555555666677777777788

    9.00 2 . 000111344

    1.00 2 . 7

    2.00 Extremes (>=3120)

    Stem width: 1000.00

    Each leaf: 1 case(s)

    Frequency Stem & Leaf

    3.00 0 . 778

    17.00 1 . 01111222333334444

    15.00 1 . 556677788889999

    11.00 2 . 00011122333

    1.00 2 . 7

    1.00 Extremes (>=3286)

    Stem width: 1000.00

    Each leaf: 1 case(s)


    Slide71 l.jpg

    Maria asked Bob to invite Fred or Sam to the barbecue. She didn't have enough room to invite both.

    Maria asked Bob not to invite Fred or Sam to the barbecue. She didn't have enough room to invite both.

    By items

    Frequency Stem & Leaf

    7.00 1 . 1122234

    12.00 1 . 556666667788

    4.00 2 . 1123

    1.00 Extremes (>=2629)

    Stem width: 1000

    Each leaf: 1 case(s)

    Frequency Stem & Leaf

    1.00 Extremes (=<950)

    4.00 1 . 3334

    14.00 1 . 56666777777888

    4.00 2 . 1222

    1.00 Extremes (>=2562)

    Stem width: 1000.00

    Each leaf: 1 case(s)


    Variation among items l.jpg

    Variation among items

    • Treat items as a random sample from some population.

      • Just like we treat subjects as a random sample

    • Then do statistical tests to generalize to this population of items.

      • “F1” and “F2”

    • Criticisms

      • Should generalize simultaneously to subjects and items, using F’.

        • But must estimate F’ unless every you have data from every subject on every condition of every item (min F’; Clark, 1973)

      • We’re fooling ourselves when we view items as anything like a random sample from a population.


    Alternatives to f1 and f2 l.jpg

    Alternatives to F1 and F2

    • Some conventional ANOVA designs do permit generalization to subjects and items without full data

      • But generally lack power

    • Coming trend: multilevel, hierarchical designs

      • Complex regression-based analyses of individual data points, not subject- or item-means.


    But what if you recognize that random sampling from a population of items is nutty l.jpg

    But what if you recognize that random sampling from a population of items is nutty?

    • What you really want is to show that your effects hold for most or all of your items and aren’t due to a couple of oddballs

      • F2 tests a crude attempt to do this.

    • People struggling to get a better way.

      • One possibility, from Ken Forster: plot effect size vs effect rank, see if it is pleasingly regular.


    Forster what is f2 good for l.jpg

    Forster, “What is F2 good for”

    • Plot effect size (difference between two conditions) against rank of effect size (suggested by Peter Killeen)

      • Both cases: a 5 msec mean effect size

      • Left panel: a limited effect (add 100 ms to 5 items)

      • Right panel: a general effect (add 5 ms to 100 items)

    Forster, K. (2007). What is F2 good for? Round 2. Unpublished ms, University of Arizona.


    Bogartz 2007 l.jpg

    Bogartz, 2007

    • Effect size vs. rank effect size, Clifton et al. JML 2003

      • Effect of ambiguity (absence of relative pronoun) on sentences with relative clauses (The man [who was] paid by the parents was unreasonable)

      • Contrasted with Monte Carlo data based on same mean and variance as experimental data

    Bogartz, R. (2007). Fixed vs. random effects, extrastatistical inference, and multilevel modeling. Unpublished manuscript, University of Massachusetts.


    Iii how to do experiments part 2 experimental procedures l.jpg

    III. How to do experiments, Part 2: Experimental procedures

    • Acceptability judgment

    • Interpretive choices

    • Stops making sense

    • Self-paced reading

    • Eyetracking during reading

    • ERP

    • Secondary tasks

    • Speed-accuracy tradeoff tasks

    • Eyetracking during listening (“visual world”)


    Choose task that is appropriate for your question l.jpg

    Choose task that is appropriate for your question

    • Is this really a sentence of English?

    • Does some variable affect how a sentence is understood?

    • Is there some difficulty in understanding this sentence?

    • Just where in the sentence does the difficulty appear?

    • Where in processing does the difficulty appear?

    • Can we observe consequences of processing other that difficulty?

    • and more….


    Acceptability judgment l.jpg

    Acceptability judgment

    • Simple written questionnaire

      • See Schütze, Cowart for lots of examples

      • Worry about instructions

      • Rating scales

        • Is seven the magical number?

    • Magnitude estimation

      • Basis in psychophysics – attempt to build an interval scale


    Magnitude estimation an example l.jpg

    Magnitude estimation: an example

    Which man did you wonder when to meet?

    Assign an arbitrary number to that item, greater than zero.

    Now, for each of the following items, assign a number. If the item is better than the first one, use a larger number; if it’s worse, smaller. Make the item proportional to how much better or worse the item is than the original – if twice as good, make the number 2x the start; if 1/3 as good, make the number 1/3 as big as the start.


    Magnitude estimation an example81 l.jpg

    Magnitude estimation : an example

    • Which man did you wonder when to meet?

      • Assign an arbitrary number, greater than 0, to this first item.

      • Now, for each successive item, assign a number – bigger if the item is better, smaller if worse, and proportional – if the item is 2x as good, make the number 2x the original; if ¼ as good, make the number ¼ as big as the original.

    • Which book would you recommend reading?

    • When do you know the man whom Mary invited?

    • This is a paper that we need someone who understands.

    • With which pen do you wonder when to write.

    • Who did Bill buy the car to please?

    Bard, E. G., Robertson, D., & Sorace, A. (1996). Magnitude estimation of linguistic acceptability. Language, 72.


    On line and web based questionnaires l.jpg

    On-line and web-based questionnaires

    • WebExp: http://www.webexp.info

    • Subject scheduling systems option

    • Advantages: Big N, easy, broader population

    • Disadvantages: you have to worry about control


    Speeded acceptability judgment l.jpg

    Speeded acceptability judgment

    • Time pressure; discourage navel-examining

    • Measure reaction time and acceptability

    • Example: is given-new order more acceptable than new-given?

      • Maybe so. Maybe not always.


    Slide84 l.jpg

    • Given-New: DefNP-IndefNP

    • All the players were watching an umpire. The pitcher threw the umpire a ball.

    • New-Given: IndefNP-DefNP

    • b. The catcher tossed a ball to the mound. The pitcher threw an umpire the ball.

    • Given-New: DefNP-IndefPP

    • c. The catcher tossed a ball to the mound. The pitcher threw the ball to an umpire.

    • New-Given: IndefNP-DefPP

    • d. All the players were watching an umpire. The pitcher threw a ball to the umpire.

    Clifton, C. J., & Frazier, L. (2004). Should given information come before new? Yes and no. Memory & Cognition, 32, 886-895.


    Choice of interpretation l.jpg

    Choice of interpretation

    • Paper and pencil or speeded

    • Multiple-choice or paraphrase

    • Example: interpretation of ellipsis

      • Full stop effect

    • Auditory questionnaire

      • Relative size of intonational phrase boundary

    • Strengths: does indicate whether a variable has an effect or not

    • Weaknesses: don’t know when the effect operates

      • Worst case: subject says sentence to self, mulls it over, reacts to the prosody s/he happened to impose


    Example of interpretation questionnaire vpe l.jpg

    Example of interpretation questionnaire: VPE

    John said Fred went to Europe and Mary did too.

    What did Mary do?

    …went to Europe60%

    …said Fred went to Europe40%

    John said Fred went to Europe. Mary did too.

    What did Mary do?

    …went to Europe45%

    …said Fred went to Europe55%

    Frazier, L., & Clifton, C. Jr. (2005). The syntax-discourse divide: Processing ellipsis. Syntax, 8, 154-207.


    Slide87 l.jpg

    Who arrived? Johnny and Sharon’sip inlaws. (0 ip)

    Who arrived? Johnnyip and Sharon’sip inlaws (ip ip)

    Who arrived? JohnnyIPh and Sharon’sip inlaws (IPh ip)

    Alternative answers: Sharon’s inlaws and Johnny; Sharon’s and Johnny’s inlaws

    Clifton, C. J., Carlson, K., & Frazier, L. (2002). Informative prosodic boundaries. Language and Speech, 45, 87-114.


    Stops making sense task l.jpg

    Stops-making-sense task

    • Word-by-word, self-paced, but each word make one of two responses: OK, BAD

    • Get cumulative proportion of BAD responses and OK RT

    • Sensitive to point of difficulty in a sentence


    Example of sms l.jpg

    Example of SMS

    Which client/prize did the salesman visit while in the city? (transitive)

    Which child/movie did your brother remind to watch the show? (object control)

    Boland, J., Tanenhaus, M., Garnsey, S., & Carlson, G. (1995). Verb argument structure in parsing and interpretation: Evidence from wh-questions. Journal of Memory and Language, 34, 774-806.


    Stops making sense task90 l.jpg

    Stops-making-sense task

    • Strengths

      • Begins to address processing dynamics questions

      • Can get both time and choice as relevant data

    • Weaknesses

      • Very slow reading time – 500 to 900 ms/word typically

      • Permits more analysis than is done in normal reading


    Self paced reading l.jpg

    Self-paced reading

    • Word by word self-paced reading

      • Generally noncumulative

      • Sometimes in place (“RSVP”), sometimes moving across screen

      • Time strongly affected by length of word, frequency of word

        • Can statistically adjust

    • Variant: phrase by phrase self-paced reading.


    Spr methods l.jpg

    SPR methods

    • Computer programs

      • E-prime (www.pstnet.com)

      • Dmastr/DMDX (http://www.u.arizona.edu/~kforster/dmastr/dmastr.htm)

      • Others (PsyScope, Superlab, various home-made systems)


    Spr evaluation l.jpg

    SPR Evaluation

    • Cheap and effective

      • Don Mitchell, trailblazing technique

    • Slower than normal reading

      • Perhaps 180 words per minute reading

      • Unless reader clicks fast and buffers….

    • Often get effect on word following critical word

      • Spillover

    • Phrase-by-phrase: overcomes these difficulties, but you lose precision

    Mitchell, D. C. (2004). On-line methods in language processing: Introduction and historical review. In M. Carreiras & C. J. Clifton (Eds.), The on-line study of sentence comprehension: Eyetracking, ERPs, and beyond. Brighton, UK: Psychology Press.


    More spr evaluation l.jpg

    More SPR evaluation

    • Does SPR hide subtle details

      • Maybe: Clifton, Speer, & Abney 1991 JML; Schütze & Gibson 1999 JML

        • Verb attachment: The man expressed his interest in a hurry during the storewide sale… (VP adjunct)

        • NP attachment: The man expressed his interest in a wallet during the storewide sale… (NP argument)

      • Clifton et al: eyetracking, slow first-pass time in NP-attached PP (followed by faster reading for argument than adjunct)

      • Schütze & Gibson, word by word SPR, only the argument advantage

        • Better materials

        • Worse technique


    Even more spr evaluation l.jpg

    Even more SPR evaluation

    • Does SPR introduce unnatural effects?

      • Maybe: Tabor, Galantucci, Richardson, 2004, local coherence effects

      • The coach smiled at the player tossed/thrown the frisbee by the…

        • Result: slowed reading at tossed as if reader considering grammatically illegal main clause interpretation of “the player tossed the…”

      • But: scuttlebutt, may not show up in eyetracking

        • Global SPR reading speed, 412 ms/word, 145 wpm


    Eyetracking during reading l.jpg

    Eyetracking during reading

    • Eye movement measurement

      • Fixations and saccades

      • Reading time affected by word length, frequency, other lexical factors


    Word based measures of eye movements ms l.jpg

    Word-based measures of eye movements (ms)

    Most cowboys hate to live in houses so they

    1 2 3 4 6 5 7

    223 235 178 301 179 267 199

    cowboys hate houses

    SFD: 301 ms 267 ms

    FFD: 235 ms 301 ms 267 ms

    GAZE: 413 ms 301 ms 267 ms

    Go-P: 413 ms 301 ms 436 ms


    Region based measures l.jpg

    Region-based measures

    While Mary/ was mending/ the sock/ fell off/

    * * * * * * * * *

    1 2 3 6 4 7 5 8 9

    277 213 233 277 445 289

    401 233 314

    First pass: 510 ms 445 ms

    Second pass: 401 ms 547 ms

    Go-Past: 510 ms 1393 ms

    Total Time: 911 ms 992 ms


    Interpretation of the measures l.jpg

    Interpretation of the measures

    • “Early” vs. “late” measures

      • Debates about modularity

      • Some measures clearly late – second pass time

      • But early: need explicit model of eye movement control

    • Rayner, Pollatsek, Reichle, colleagues – EZ Reader

      • Good model of lexical effects

      • Says little or nothing about parsing & intepretation


    Erp event related potentials l.jpg

    ERP (event-related potentials)

    • Measure electrical activity on scalp

      • Reflect electrical activity of bundles of cortical neurons

      • Good time resolution, questionable spatial resolution

    • Standard effects: LAN, N400, P600

      • Typical peak time, polarity


    Standard erp effects osterhout 2004 l.jpg

    N400

    The cat will EAT

    The cat will BAKE

    P600

    The cat will EAT

    *The cat will EATING

    “Standard” ERP effects (Osterhout, 2004)

    Osterhout, L. et al. (2004). Sentences in the brain…. In M. Carreiras and C. Clifton, Jr., The on-line study of sentence comprehension. New York: Psychology Press, pp 271-308.


    Secondary tasks load effects l.jpg

    Secondary tasks: Load effects

    • Limited capacity models

    • Desire: measure of auditory processing difficulty

    • Phoneme monitoring

      • Eg: Cutler & Fodor, 1979

        • Which man was wearing the hat? The man on the corner was wearing the blue hat.

        • Which hat was the man wearing? The man on the corner was wearing the blue hat.

        • Target: /k/ or /b/; when target started focused word, 360 ms; when started non-focused word, 403 ms.

      • Interpretive difficulties


    Secondary tasks load effects ii l.jpg

    Secondary tasks: Load effects II

    • Lexical decision (or naming, or semantic decision, or….)

      • Word unrelated to sentence; measure of available capacity

      • Piñango et al., auditory presentation, visual probe

        • The man examined the little bundle of fur for a long time aspect to see if it was… 743 ms

        • The man kicked the little bundle of fur for a long time aspect to see if it was… 782 ms

    Pinango, M. M., Zurif, E., & Jackendoff, R. (1999). Real-time processing implications of enriched composition at the syntax-semantics interface. Journal of Psycholinguistic Research, 28, 395-414.


    Secondary tasks probe for activation l.jpg

    Secondary tasks: Probe for activation

    • Auditory (or visual) presentation

      • Probe semantically related to word in sentence whose activiation you want to measure

    • E.g., activation of “filler” at “gap” in long-distance dependency

      • The policeman saw the boy who the crowd at the party1 accused2 of the crime.

        • Present probe girl or matched unrelated word at point 1 or 2; girl faster at 2.

      • Worries, criticisms…

    Nicol, J., Swinney, D., Love, T., & Hald, L. (2006). The on-line study of sentence comprehension: An examination of dual task paradigms. Journal of Psycholinguistic Research, 35, 215-231.


    Speed accuracy tradeoff l.jpg

    Speed-accuracy tradeoff

    • Present sentence (usually RSVP), subject to make judgment (grammaticality, etc.)

    • But judgment is made in response to a signal that is presented some time after a critical point.

    • Accuracy increases with time after the critical point.

    • Note, current procedure, multiple signals and multiple responses, e.g., every 350 ms

      • Early procedure: just one signal, one response, per trial

    McElree, B., Pylkkanen, L., Pickering, M., & Traxler, M. (2006). A time course analysis of enriched composition. Psychonomic Bulletin & Review, 13, 53-59.


    Mcelree et al data l.jpg

    McElree et al. data

    Best fit: coercion lowered asymptote and lowered rate of approach to asymptote.


    Visual world head mounted eyetracking l.jpg

    Visual World (“head-mounted eyetracking”)

    • Measure where you look when you are listening to speech.

      • Cooper, 1974. About 40% probability of fixating on referent, 30% fixating on related picture

        • About 10% in control group.

    • Permits on-line measure of processing during listening.

      • Not just difficulty – actual content

      • Both incidental looks and controlled reaching

    Cooper, R. M. (1974). The control of eye fixation by the meaning of spoken language: A new methodology for the real-time investigation of speech perception, memory, and language processing. Cognitive Psychology, 6, 84-107.


    Cooper 1974 l.jpg

    Cooper, 1974

    While on a photographic safari in Africa, I managed to get a number of breathtaking shots of the wild terrain. These included pictures of rugged mountains and forests as well as muddy streams winding their way through big game country. One of my best shots thought was ruined by my scatterbrained dog Scotty. Just as I had slowly wormed my way on my stomach to within range of a flock….


    Slide110 l.jpg

    Allopenna, Magnuson & Tanenhaus (1998)

    Eye camera

    Scene camera

    Pick up the beaker

    Allopenna, P. D., Magnuson, J. S., & Tanenhaus, M. K. (1998). Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models. Journal of Memory and Language, 38, 419-439.


    Slide111 l.jpg

    Allopenna et al. Results

    200 ms after coarticulatory information in vowel


    Slide112 l.jpg

    Thanks! Enjoy the Institute!


  • Login