collecting and interpreting acceptability judgments using magnitude estimation l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Collecting and interpreting acceptability judgments using Magnitude Estimation PowerPoint Presentation
Download Presentation
Collecting and interpreting acceptability judgments using Magnitude Estimation

Loading in 2 Seconds...

play fullscreen
1 / 55

Collecting and interpreting acceptability judgments using Magnitude Estimation - PowerPoint PPT Presentation


  • 139 Views
  • Uploaded on

Collecting and interpreting acceptability judgments using Magnitude Estimation. Caroline Heycock with Zakaris Svabo Hansen and Antonella Sorace University of Edinburgh. NLVN-course/NORMS-seminar Tórshavn, Faroe Islands, 8–16 August 2008. Outline. Why do we need acceptability judgments?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Collecting and interpreting acceptability judgments using Magnitude Estimation' - ozzie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
collecting and interpreting acceptability judgments using magnitude estimation

Collecting and interpreting acceptability judgments using Magnitude Estimation

Caroline Heycockwith Zakaris Svabo Hansen and Antonella Sorace

University of Edinburgh

NLVN-course/NORMS-seminar

Tórshavn, Faroe Islands, 8–16 August 2008

outline
Outline
  • Why do we need acceptability judgments?
  • What are the problems with acceptability judgments?
  • How can Magnitude Estimation help with any of these problems?
  • Exemplification from ongoing studies on Faroese (and related languages)
why do we need judgment data
Why do we need judgment data?

Need Problems ME Examples

  • There is no direct way to access I-language (the speaker’s knowledge of their language), we need to triangulate from all available sources of data.
  • Corpus data typically
    • aggregate across speakers
    • include performance errors
    • allow no straightforward distinction between non-occurring and ungrammatical
    • may not exist
outline4
Outline
  • Why do we need acceptability judgments?
  • What are the problems with acceptability judgments?
  • How can Magnitude Estimation help with any of these problems?
  • Exemplification from ongoing studies on Faroese (and related languages)
validity
Validity

Need Problems ME Examples

Judgments are also a type of behaviour, known to be affected by

  • processing constraints
  • personality and mental state
  • presentation (order, context, mode)
  • absolute vs relative task
  • linguistic training
reliability
Reliability

Need Problems ME Examples

  • Interspeaker variation
    • This may or may not be considered a problem of reliability, depending on assumptions about individual’s grammars, but it is at least a methodological problem
  • Intraspeaker inconsistency
conventional measurements of acceptability
Conventional measurements of acceptability

Need Problems ME Examples

  • Judgments of linguistic acceptability usually form category scales (ok/*) or limited ordinal scales (ok/?/?*/*), (1,2,3,4,5)
  • These scales require absolute rating judgments, rather than relative ranking judgments
  • Ordinal scales provide no information about the relative distance between adjacent points on the scale
problems arising with conventional scales for acceptability judgments
Problems arising with conventional scales for acceptability judgments

Need Problems ME Examples

  • Limited in their range of values
  • Lack of statistical power
    • These scales cannot be analysed using parametric statistics, because this type of analysis requires the data to be on at least an interval scale.
  • Inconsistency
    • Even trained linguists use diacritics in different ways. Comparison between different studies is extremely difficult.
  • Uninterpretability
    • What do the middle points on a rating scale actually mean?
    • How can we distinguish between lack of certainty and intermediate acceptability?
outline11
Outline
  • Why do we need acceptability judgments?
  • What are the problems with acceptability judgments?
  • How can Magnitude Estimation help with any of these problems?
  • Exemplification from ongoing studies on Faroese (and related languages)
m agnitude e stimation in psychophysics
M[agnitude] E[stimation] in psychophysics

Need Problems ME Examples

  • ME is an experimental technique used to determine quickly and easily how much of a given sensation a person is having.
  • In an ME experiment subjects are presented with a standard stimulus (a modulus) and are asked to express the magnitude by a number.
  • They are then presented with a series of stimuli that vary in intensity and are asked to assign each of the stimuli a number relative to the modulus.
me in psychophysics
ME in psychophysics

Need Problems ME Examples

  • Subjects assign a number:
    • to the modulus to reflect magnitude of pertinent characteristics (length, loudness, brightness)
    • to each successive stimulus to indicate apparent magnitude relative to the first (or to a previous stimulus)
me in psychophysics scaling
ME in psychophysics: Scaling

Need Problems ME Examples

  • Scaling in ME is not about absolute accuracy of judgments;
  • Scaling is about the relative relationships between judgments of stimuli of different intensities.
me in psychophysics modalities
ME in psychophysics: modalities

Need Problems ME Examples

  • The numerical modality is the most common but other modalities are possible (e.g. line length).
  • Other modalities can be more user-friendly particularly if you are testing people who (think they) are numerically-challenged.
me in psychophysics can people do it
ME in psychophysics: can people do it?

Need Problems ME Examples

  • Many magnitude estimation experiments use a control condition in which subjects are asked to perform magnitude estimations of the length of a line.
  • Magnitude estimations of line length have been shown to be proportional to the actual length of the lines.
me in linguistics
ME in Linguistics

Need Problems ME Examples

  • Unlike other dimensions, linguistic acceptability has no obvious “physical” continuum to plot against subjects’ impressions.
  • However, Bard, Robertson & Sorace 1996 have applied standard cross-modality matching techniques and were able to show that the technique is reliable.
typical instructions
Typical instructions

Need Problems ME Examples

  • Here’s an example of what the instructions look like...
instructions
Instructions

The purpose of this exercise is to get you to judge the acceptability of some English sentences. You will see a series of sentences on the screen. These sentences are all different. Some will seem perfectly okay to you, but others will not. What we're after is not what you think of the meaning of the sentence, but what you think of the way it's constructed.

slide20
Your task is to judge how good or bad each sentence is by assigning a number to it.
  • You can use any number that seems appropriate to you. For each sentence after the first, assign a number to show how good or bad that sentence is in proportion to the reference sentence.
slide21
For example, if the first sentence was:

(1) cat the mat on sat the.

and you gave it a 1, and if the next example:

(2) the dog the bone ate.

seemed 20 times better, you'd give it twenty. If

it seems half as good as the reference sentence,

give it the number 0.5

slide22
You can use any range of positive numbers you like including, if necessary, fractions or decimals.
  • You should not restrict your responses to, say, an academic marking scale.
  • You may not use minus numbers or zero, of course, because they aren't proper multiples or fractions of positive numbers.
  • If you forget the reference sentence don't worry; if each of your judgments is in proportion to the first, you can judge the new sentence relative to any of them that you do remember.
slide23
There are no 'correct' answers, so whatever seems right to you is a valid response. Nor is there a 'correct' range of answers or a `correct` place to start.
  • Any convenient positive number will do for the reference.
  • We are interested in your first impressions, so don't spend too long thinking about your judgment.
slide24
Remember:
  • Use any number you like for the first sentence.
  • Judge each sentence in proportion to the reference sentence.
  • Use any positive numbers you think appropriate.
choices about the modulus face validity
Choices about the modulus: face validity

Need Problems ME Examples

  • The experimenter has the option of assigning a fixed number to the modulus.
  • Another option is to leave the modulus in sight throughout the experiment.
  • This option has good face validity, but it isn’t clear to what extent it affects the ultimate reliability of the estimates.
  • People don’t need to remember the modulus; if they are making judgments proportionally, the reference point shifts as they move on.
advantages of quasi randomization
Advantages of quasi-randomization

Need Problems ME Examples

  • The experimenter can impose constraints on the randomization to prevent certain experimental items from occurring consecutively.
  • The modulus can be chosen to represent an intermediate degree of acceptability.
  • A number (or a line) of intermediate size can be assigned to the modulus by the experimenter.
timed vs untimed me
Timed vs untimed ME

Need Problems ME Examples

  • Timing the intervals between sentences may reduce the likelihood that people consult metalinguistic or prescriptive knowledge.
  • Intervals have to be different for non-native speakers: they have to be piloted carefully.
varying the instructions
Varying the instructions

Need Problems ME Examples

  • There is a tendency in some people to use a fixed (usually 10-point) scale. This is possibly because of familiarity with school marking systems.
  • If the instructions contain an explicit warning against using a restricted range of numbers, the tendency is much reduced.
  • People are very sensitive to instructions: these have to be as explicit and clear as possible.
  • A detailed practice session is essential!
advantages
Advantages

Need Problems ME Examples

  • ME yields interval scales, which allow the use of parametric statistics
  • Mathematical operations can be applied to the estimates, allowing:
    • a direct indication of the speaker’s ability to discriminate between more or less acceptable sentences
    • a direct measure of the strength of speakers’ preferences
advantages30
Advantages

Need Problems ME Examples

  • Informants are enabled to express their intuitions without any restrictions of the judgment scale.
  • They are asked to provide purely comparative judgments: these are relative both to a reference item and the individual subject’s own previous judgments.
  • At no point is an absolute criterion of grammaticality applied.
  • The subjects themselves fix the value of the reference item relative to which subsequent judgments are made.
advantages31
Advantages

Need Problems ME Examples

  • The scale used by informants is open-ended and has no minimum division: subjects can always add a further highest score or produce an additional intermediate rating.
  • The result is that subjects are able to produce judgments which distinguish all and only the differences they perceive.
data analysis normalisation
Data analysis: normalisation

Need Problems ME Examples

ME data need to be normalized because people use different ranges of estimates.

  • Raw magnitude values are often transformed into logs in order to yield a normal distribution.
  • Each number is divided by the modulus that the subject had assigned to the reference sentence, or alternatively the z-scores are used.
  • Any statistical package can easily do these transformations.
outline33
Outline
  • Why do we need acceptability judgments?
  • What are the problems with acceptability judgments?
  • How can Magnitude Estimation help with any of these problems?
  • Exemplification from ongoing studies on Faroese (and related languages)
faroese
Faroese

Need Problems ME Examples

Some questions:

  • Do current speakers of Faroese have V-to-I as part of their competence grammar(s)?that is, do they allow the order Finite Verb > Negation in all types of subordinate clause?
  • Do current speakers of Faroese allow “generalised embedded Verb Second” (V2)?That is, do they allow a wide range of subordinate clauses to begin with something other than the subject?
  • With respect to these phenomena, how is Faroese situated with respect to Icelandic and Danish?
how acceptable is v i in faroese
How acceptable is V-I in Faroese?

We looked at the effect of two variables and their interaction (2 within-subjects variables, 2 and 3 levels):

  • Order
    • Verb-Adverb
    • Adverb-Verb
  • Type of “adverb”
    • Negation (ikki)
    • “High” adverb (kanska)
    • “Low” adverb (ofta)

These orders were all contained in relative clauses.

examples
Examples
  • Adverb: Negation Order: V-Adv

Hatta er filmurin, sum Hanus hevur ikki sæðThat is film-def that Hanus has neg seen

  • Adverb: Negation Order: Adv-V

Hetta er brævið, sum Elin ikki hevur lisiðThat is letter-def that Elin neg has read

  • Adverb: Low Adv Order: V-Adv

Hetta er lagið, sum Teitur hevur ofta spæltThat is piece-the that Teitur has often played

  • Adverb: Low Adv Order: Adv-V

Hatta er sangurin, sum Eivør ofta hevur sungiðThat is song-def that Eivør often has sung

how generalized is v2 in faroese
How “generalized” is V2 in Faroese?

We looked at the effect of two variables and their interaction (2 within-subjects variables, 2 and 5 levels):

  • Order
    • Subject-Initial
    • Adjunct-Initial
  • Clause type
    • Main clause
    • “Bridge verb” complement
    • Nonbridge verb A complement (regret, admit)
    • Nonbridge verb B complement (deny, doubt, be proud)
    • Indirect question
examples38
Examples
  • Clause Type: Bridge Order: Subject-Initial

Lív segði, at hon kom seint til arbeiðis í gjárLív said that she came late to work yesterday

  • Clause Type: Bridge Order: Adjunct-Initial

Beinir segði, at í morgin kemur hann seint til arbeiðisBeinir said that tomorrow comes he late to work

  • Clause Type: NonBridge B Order: Subject-Initial

Sámal noktaði, at hann hevði verið alla náttina á barrini í fleiri førumSámal denied that he had been all night in bar-def frequently

  • Clause Type: NonBridge B Order: Adjunct-Initial

Einar noktaði, at í fleiri forum hevði hann drukkið alla náttina á barriniEinar deniedthat frequently had he drunk all night in bar-def

faroese 1 vs faroese 2 geographic
Faroese 1 vs Faroese 2: geographic?
  • In Jonas 1996 it is argued that there are two distinct “dialects” in Faroese:
    • Faroese 1, which optionally allows V-to-I
    • Faroese 2, which does not allow V-to-I
  • Jonas suggests that these two dialects may correlate both with age and with dialect area: Faroese 1 more common in the southern islands, and among older speakers.
  • We investigated the geographic dialect suggestion by collecting data from 25 subjects from Tórshavn (North) and 22 subjects from Suðuroy (South). Subjects were, as much as possible, matched for age.
no geographic dialect difference
No geographic dialect difference
  • The main effect of dialect group was not significant
  • There was no significant interaction between language group and position of verb, or between language group and type of adverb
  • We did not find any evidence for a geographic dialect difference with respect to V-to-I in our subjects
commparison with danish icelandic
Commparison with Danish, Icelandic
  • There is a significant interaction between language and order of the verb with respect to Negation/Adverb.
  • I.e. the effect of the different orders is different, depending on the language...
comparing verb adverb orders
Comparing Verb/Adverb orders
  • To see where there is any difference between the different adverbs in terms of whether or not the verb can move past them, we can look at the difference between the Verb-Adverb and Adverb-Verb orders with respect to each of the three adverbs
  • We’d expect no difference between verb movement over the three adverbs in Icelandic (all should be good) and in Danish (all should be bad)
  • If Faroese is just intermediate between Icelandic and Danish, we’d also expect no effect of the different adverb types here.
comparing verb adverb orders46
Comparing Verb/Adverb orders
  • Our Faroese subjects dispreferred the order Finite Verb - Negation in an unambiguously non-V2 context to the same extent that the Danish subjects did.
  • However, our Faroese subjects found Verb-Adverb orders better than Verb-Negation orders (this effect was found neither in Danish nor in Icelandic).
  • It is possible that to the extent that IP-internal verb movement is still grammatical in Faroese, for some speakers it is to an intermediate position.
looking at the effect of v2
Looking at the effect of V2

The best measure of the effect of V2 is to look at the difference between the Subject-Initial and Adjunct-Initial order, for each clause type:

That is, what is the difference between the scores for sentences of type (a) and type (b) for each clause type?

(a) Order: Subject-Initial

Lív segði, at hon kom seint til arbeiðis í gjárLív said that she came late to work yesterday

(b) Order: Adjunct-Initial

Beinir segði, at í morgin kemur hann seint til arbeiðisBeinir said that tomorrow comes he late to work

the effect of v2 danish
The effect of V2: Danish
  • In Danish there was a significant difference between the effect of V2 in a main clause and after the second category of “nonbridge” verbs (deny, doubt, be proud).
  • There was however no significant difference between the effect of V2 in a main clause and after the first category of “nonbridge” verbs (regret, admit).
  • Taken together, this suggests that for this language Vikner’s original categorisation of “bridge” verbs for V2 is not correct; instead these results are more consistent with the proposals in Bentzen et al (2007) or Julien (2007).
the effect of v2 faroese and icelandic
The effect of V2: Faroese and Icelandic
  • In Faroese and Icelandic, however, there is no significant difference between the effect of V2 in a main clause and after the second category of “nonbridge” verbs.
  • This suggests that V2 in these languages targets a different projection than in Danish (and the other mainland Scandinavian languages?)
is apparent v to i really v2
Is apparent V-to-I really V2?

V2:

  • Clause Type: Nonasserted Order: Subject-Initial

Ronaldo noktar, at hann hevur skrivað undir sáttmála við Liverpool næsta árRonaldo denies that he has signed contract with Liverpool next year

  • Clause Type: Nonasserted Order: Adjunct-Initial

Næmingarnir noktaðu, at í fríkorterinum høvdu teir roykt á vesinumStudents-def denied that in breaks had they smoked in toilets-def

“V-to-I”

  • Clause Type: Nonasserted Order: Negation-Verb

Handilskvinnan noktaði, at hon ikki hevði læst handilin í gjárkvøldiðShopkeeper denied that she not had locked shop-def yesterday evening

  • Clause Type: Nonasserted Order: Verb-Negation

Sámal noktaði, at hann hevði ikki latið sjálvuppgávuna inn til tíðinaSámal denied that he had not handed assignment in on time

conclusion
Conclusion
  • Judgment data are important for linguistic analysis, especially where corpora are not available, but even where they are.
  • In investigating language we are always dealing with behaviour, when we want to learn about knowledge. Investigating different types of behaviour may help us to narrow down the range of possibilities
  • Magnitude Estimation is a method for gathering judgment data that allows for a wider range of analytical tools than many other techniques
slide54

All data collected by Zakaris Svabo Hansen for the project

Verb movement in contemporary Faroesehttp://www.ling.ed.ac.uk/~heycock/faroese-project.shtml

Project funded by the Arts and Humanities Research Council

some references
Some References
  • Bard, E.G., Robertson, D. and Sorace, A. 1996. Magnitude estimation of linguistic acceptability. Language 72: 32-68.
  • Featherston, S. (2005). Magnitude estimation and what it can do for your syntax: Some wh-constraints in German. Lingua, 115:1525–1550.
  • Featherston, S. (2007). Data in generative grammar: the stick and the carrot. Theoretical Linguistics, 33(3):269–318.
  • Keller, F. 2003. A psychophysical law for linguistic judgments. Proceedings of the 25th Annual Conference of the Cognitive Science Society. Mahawah: Lawrence Erlbaum.
  • Sorace, A. 1996. The use of acceptability judgments in second language research. In V. T. Bhatia and W. Ritchie (eds.) Handbook of Second Language Acquisition. New York: Academic Press, p. 375-409.
  • Sorace, A. & Keller, F. in press. Gradience in linguistic data. To appear in Lingua.
  • Sprouse, J. 2007. A program for experimental syntax: Finding the relationship between acceptability and grammatical knowledge. PhD thesis, University of Maryland, College Park.