Fieldwork – consultation and elicitation methods ELDP Training 2007 Friederike Lüpke
Structure of the talk • Motivation for methodological considerations in the field of language documentation • Overview of Himmelmann’s types of communicative events • Illustration of data resulting from different types of communicative events, with a focus on staged communicative events • Presentation and classification of different types of stimuli • Potential problems • Links
Why bother • So far, the field of language documentation has focussed on the shape that a language documentation should take, but not on what data should be included, how they should be collected and to whom they should be of use. • A step toward this is a systematic investigation of the goals of language documentation, of the data collection methods associated with them, and the usability of the data resulting from them. Oral vs. written I-language vs. E-language Oral vs. nonverbal Text vs. performance Quality vs. quantity …
“For description, the main concern is the production of grammars and dictionaries whose primary audience are linguists… In these products language data serves essentially as exemplification and support for the linguist’s analysis.” (Austin 2006: 87) [..] Language documentation, on the other hand, places data at the center of its concerns.” (Austin 2006:87) The (new?) role of data
One view of language documentation Corpus (Himmelmann 1998) Elicitations(paradigms, results of tests...) Observed commu-nicative events(conversation, narratives…) Staged communicative events(descriptions of picture and video stimuli...) Qualitative analyses(occurrence of x in context y) Quantitative analyses (weighting of occurrence of x in context y throughout speakers, texts and genres…)
Speech in cultural context Information about the content, format and structure of data Transcription, annotation and analysis of data Data types in the corpus Corpus Written data Metadata Video, audio and image data
“A language documentation […] conceived of as a lasting, multipurpose record of a language [… ] should contain a large set of primarydata which provide evidence for the language(s) used at a given time in a given community” (Himmelmann 2006: 7) “The main goal of a language documentation is to make primary data available for a broad group of users.” (Himmelmann 2006: 15) But exactly what data? Which audience(s)? Which community/ies? Which language/s?
Do we document a snapshot or the production, transmission, maintenance and change of linguistic and cultural behaviour? Homogeneity vs. heterogeneity Do we aim at a monolithic record or at documenting variation (of what) ? How do we establish representativeness?
What status for negative evidence? • “With regard to the usual way of obtaining negative evidence (i.e. asking one or two speakers whether examples x, y, z, are “okay”), it is doubtful whether this really makes a difference in quality compared to evidence provided by the fact that the structure in question is not attested in a large corpus. Elicited evidence is only superior here if it is very carefully elicited, paying adequate attention to the sample of speakers interviewed, potential biases in presenting the material, and the like.” (Himmelmann 2006: 23) How much methodological and theoretical awareness can we expect in language documentation? Which methods are robust and widely accepted?
Data for who? • We are aware of the disciplines that also have language as a centre of interest – but do we cater for their needs? • We want to create data relevant for the speech community/ies, but we have little evidence for the use of our electronic corpora. How can we create a true multipurpose record of a language?
The (new?) role of the consultant • “…some older field manuals give advice on what kind of questions to ask or not to ask, … . In this manner, such manuals quite automatically assign a passive role to the speaker. If we regard fieldwork as a mutual teaching-learning event, this approach is no longer acceptable.” (Mosel 2006: 75) What roles do we assume for ourselves and our consultants?
What’s left? Data and methodology • “The major discovery of post-1957 “syntactic theory” is not “theoretical”, but methodological: That a huge amount of generalizations can best be found by adopting an “experimental” approach…What remains of the published body of research is the empirical part. So all the papers that are neatly divided into a “data/generalizations” part and an “analysis” part have a good chance of continuing to be useful”. (Haspelmath 2006: Linguistlist 17.2304) How can we reach maximal transparency and explicitness in providing information about how and why we collected our data ? If its data that is central, how can we assure that our data are, and will be, relevant? ?
Your turn • Please take 5 minutes to: • Think about the main goals and users of your research project. • Think about how you have collected and/or intend to collect data in the field. • What kind of methods of data collection (i.e. word lists, questionnaires, stimuli…) do you use? • We will discuss your findings and concerns in the plenary.
PRO: Have a high degree of ecological validity. Yield phonologically, semantically and syntactically natural utterances. Give insight into the culture, if thematically balanced. Show high-frequency phenomena. CON: Can seem natural but factually aren’t because the cultural settings are not respected. Can contain pragmatic oddities. Are not very controlled. Many features are not quantifiable because a unique performance of one speaker. Don’t offer negative evidence and are not good for low-frequency phenomena.. Data resulting from monologues “This lecture is about the fascinating theory on...”
PRO: Often seen as the non-plus-ultra in naturalness. Yields data that are naturalistic in every respect. Also gives important information about the culture. CON: Is not controlled at all. Is very difficult to get. Is tedious and time-consuming to transcribe. Is even more time-consuming to analyse. Doesn’t offer negative evidence and insight into low-frequency phenomena. Data resulting from conversation A: “How do you like the ELDP training so far?” B: “All I can say is they start too early and don’t give us enough breaks!”
Representativeness of a LDD corpus – Jalonke high frequency verb kolon ‘know’ Causative Reciprocal Complement Passive Perfect Many transitive uses
Representativeness of a LDD corpus – Jalonke low frequency verb Past NP subject Goal PP All uses are intransitive ? Causative? Perfect? ? ? ? Transitive uses? ? ? Passive?
Summary • Observed communicative events that are investigated in a qualitative way allow to • Get a first impression of the most frequent syntactic environments of the most frequent verbs. • Formulate hypotheses and prepare elicitation sessions. But: these data don’t tell us anything about markedness, about the full distributional range, about low frequency items and constructions, and about their semantic properties.
PRO: Yield phonologically natural utterances. Can be quantified to some extent. Are highly controlled, or at least seem to be. Yield phonologically natural utterances Offer negative evidence CON: Results depend heavily on the creativity of the researcher and the receptiveness of the consultant. Easily lead to misunderstandings that go by unnoticed. Can thus yield syntactically/semantically/prag-matically odd utterances. Data resulting from contextualising elicitation “How do you greet in the morning?”
PRO: Are easy when starting work on an unknown language. Give good data to work on phoneme inventory, basic lexicon, and for lexical comparison. Are quantifiable and highly controlled. Offer negative evidence. CON: Yield phonologically odd utterances. Can easily lead to misunderstandings due to the lack of context. Translatable items are limited in number. Hyper-cooperative consultants may create neologisms and produce calques to be helpful. Data resulting from translational equivalent elicitation “How do you say ‘bee’ in Dida?”
PRO: Are controlled and quantifiable. Can give results for domains that are difficult to cover otherwise. Give comparable results for many fields. Offer negative evidence. CON: Very often do not test acceptability of the utterance, but rather of the context provided for it. Can therefore very often be contradicted by the same and/or different speakers. Data resulting from acceptability judgements “Can I say ‘this book’ when the book is lying over there?
Summary • Elicited data that are inspected in a qualitative way allow to • Get the full distributional range of a given item/construction. • Test the semantic properties of that item/construction. • Provide negative evidence, i.e. information on unattested structures/uses, ungrammaticality, etc. But: these data are often influenced by the metalanguage/elicitation method and not naturalistic at all.
Your turn • Please take five minutes to think about other data collection methods you use, in particular about stimuli-based data: • Which media do you use if you collect data based on non-verbal stimuli? • How do you rate the quality of the data obtained with stimuli? • Have you encountered any problems when working with stimuli? • Do you have recommendations to make regarding specific stimuli that worked well? • We will compare your observations in the plenary.
Types of stimuli • Static stimuli: • Comics • Picture books • Photos • Dynamic stimuli: • Acted videos • Animated videos • Staged life events • Interactive stimuli: • Puzzle tasks • Map tasks • Matching games
Static stimuli • Picture books • Topological relations picture book • Frog story • Photos • Positional verbs picture book • Comics • Calvin & Hobbes • Tintin • Asterix & Obelix
Dynamic stimuli • Acted videos: • Staged events • Cut & Break • Pear film • Animated videos: • Fish film • Event triads • ECOM clips
Interactive stimuli • Matching/sorting games games: • Basic colour terms Munsell chips • Men and tree • Cluedo • Puzzles: • Eisenbeiss/Matsuo puzzle • Map tasks/route descriptions: • HCRC map task • Table top route description task
PRO: Are highly controlled, quantifiable and comparable. Yield phonologically, semantically and syntactically accurate data. Are free from linguistic interference of the metalanguage and from misunderstandings of context. Can be used for nonlinguistic categorisation tasks. CON: Validity of the data depends on coverage of the domain under inspection by the stimulus. If gaps in parameters, data can be severely flawed. Cross-cultural applicability can be limited. Use is limited to visually depictable scenes. Data resulting from static stimuli
PRO: Yield phonologically, syntactically and semantically quantifiable and comparable data etc. (see previous slide). Can be used for nonlinguistic categorisation tasks. CON: See previous slide and: Require the use of high-tech, which is complicated if not impossible in many field settings. Depending on the abstractness of the stimulus and the purpose of the elicitation, misunderstandings can occur. Data resulting from dynamic stimuli
PRO: Allow controlled interaction of two or more speakers. Yield quantifiable and comparable data. Can be used for nonlinguistics categorisation tasks. CON: May create culturally inappropriate or strange situations. Since the true purpose of the interaction is normally not known to the consultants, misunderstandings occur easily. Data resulting from interactive stimuli
Posture verbs in stative positions(Ameka, de Witte & Wilkins 1999) Goemai: The stick is hanging on the tree trunk. Jalonke: Tam-<< kiran-xiwurixuntun-na ma.stick-DEF lean-PF tree trunk-DEF at‘The stick is leaning against the tree trunk.’ English/Dutch: The bottle is standing on the rock. Jalonke: Biniir-<< d$$-xi g<m-<< fari.bottle-DEF sit-PF rock-DEF on‘The bottle is sitting on the rock.’
Event segmentation: ECOM clips (Bohnemeyer & Caelen 1999) English: The ball rolled from the square past the house to the triangle. Yukatek: The ball is at the square, and it goes rolling, and then it passes the house, and then it arrives at the triangle. Single clause; single verb Multiple independent clauses
Posture verbs in caused positions (Hellwig & Lüpke 1999) Differences between stative and caused positions: Same posture verb used : Jalonke. Different verbs with same extension used: Goemai. Different verbs with different extensions used: English and Dutch. Semantic differences: In Jalonke and Goemai, objects with a base sit/are ‘sat’, even when their longest axis is vertical. In English and Dutch, they stand, but are put (English) or ‘sat’ (Dutch). English: She puts the bottle on the table. Jalonke/Goemai/Dutch: She ‘sits’ the bottle on the table.
Cut & break verbs (Bohnemeyer, Bowerman & Brown 2001) English: cut (with scissors) Dutch: knippen ‘cut with scissors’ Jalonke: cut-iterative (because cloth has already been cut). English: cut (with knife) Dutch: snijden ‘cut with a knife’ Jalonke: cut (because fish hasn’t been cut yet).
The Puzzle Task (Eisenbeiss & Matsuo 2003) • Children have to describe puzzle pieces in order to be handed the piece to be handed to them • The pictures are selected in order to elicit descriptions of external possession and to ‘force’ the children to verbalise all the relevant contrasts
The HCRC map task (HCRC Edinburgh) The instruction giver’s map The instruction follower’s map Crucial: landmarks on both maps are not identical in order to increase motivation to communicate.
The men and tree matching game (MPI Nijmegen) • Two consultants, a ‘director’ and a ‘matcher’ have identical sets of photos with similar scenes. • The director describes a photo to the matcher, who has to find the matching picture. • The photos are selected to uncover the categories triggering the choice of the matching photos – in this case, intrinsic vs. absolute frames of reference
Ad hoc stimuli • New technologies enable fieldworkers to create stimuli ‘ad hoc’ in the field: • Digital photos • Video clips • Animations • Although generally not usable for cross-linguistic comparison, these stimuli can yield interesting data difficult to get otherwise.
Action descriptions (Lüpke 2005, ms.) • Videos recorded in the field that are described by consultants. • PRO: • Yield fine-grained event descriptions difficult to obtain otherwise. • Can be used to cover semantic domains not attested so far in the corpus. • CON: • Don’t constitute a ‘speech event’ in the sense of Hymes.
Photos and Powerpoint animations • Useful for ethnobotany • Sequences of stills from digital video or ppt animations can be used to elicit stages of an event