Corpus linguistics and language teaching The next nexus? Doug Biber Northern Arizona University
Goals of the talk • Introduce corpus linguistics • Present case studies illustrating the surprising findings that emerge from corpus-based research • Discuss the application of corpus research to classroom teaching and materials development
What is corpus linguistics? • A research approach for describing language use: How do speakers and writers actually use the vocabulary and grammar resources available in a language?
What is a corpus? • A large, principled collection of ‘natural’ texts stored on computer • A corpus should ‘represent’ particular language varieties or registers (e.g., conversation or university textbooks) • Design is important: texts must be sampled from particular target registers • Size is equally important: Some language features are rare but still have systematic patterns of use
Characteristics of corpus-based analysis (I) • Relies on computer-assisted techniques • Concordancers (‘KWIC’ displays = ‘Key Word In Context’) • Computer programs • Automatic (e.g., grammatical ‘taggers’) • Interactive (to code grammatical variants)
Characteristics of corpus-based analysis (II) • Analyses are empirical • Uses both quantitative and qualitative / interpretive techniques • Meaningful analyses must be motivated by linguistic research questions (not simply by the availability of a corpus)
So what is corpus linguistics? • A research approach – A way of thinking about language • Shines the spot light on language use: registers and language for specific purposes • Allows investigation of language choice: Why does a speaker use a particular word or grammatical form rather than alternatives? • Allows investigation of meaning in context: why synonyms are usually not interchangeable • Allows investigation of language preference: what forms are rare? What is especially common?
Corpus descriptions capture the complexities of actual use • Language use is often systematic but complex • Corpus-based studies can consider the range of relevant factors and the interactions among factors • Corpus analysis describes the patterns of use, but it cannot directly determine how those findings are relevant for language learning • That is, corpus analyses provide the basis for informed decisions by teachers – not necessarily the immediate content of our language teaching
Case studies • Vocabulary • Grammar • Lexico-grammar
Corpus-based descriptions of vocabulary:Selected reference works Learner dictionaries based on corpora: Longman Dictionary of Contemporary English (LDOCE); Collins COBUILD English Dictionary Vocabulary textbooks based on corpora: McCarthy and O’Dell; Basic Vocabulary in Use Thornbury; Natural Grammar Academic studies of collocation: Sinclair 1991; Partington 1998
Case studies on vocabulary • Corpus-based dictionaries • Collocation • Semantic prosody
Case studies on vocabulary (1):Corpus-based dictionaries • The order of meanings reflects use e.g. LDOCE entry for concerned: Meaning 1: ‘involved in something’ (reach an agreement with all concerned) Meaning 2: ‘worried’ (concerned about how little I eat) • Identifies common words and register differences Words moderately common in speech (not writing -- LDOCE) flood, hopefully, messy, potato, shave, underneath Words moderately common in writing (not speech -- LDOCE) focus, glance, moreover, pollution, scope, underlying
Case studies on vocabulary (2):Collocations For example: Large number(s) ‘quantity’ scale proportion amount versus Great deal (of) ‘impressive’ importance majority (see Firth 1957; Sinclair 1991; Partington 1998; Biber, Conrad, Reppen 1998)
Case studies on vocabulary (3):Semantic prosody Copular verbs that mean ‘become’: turn black, red, white, pale come alive, loose, true, unstuck go crazy, mad, wrong, bad (Longman Grammar of Spoken and Written English, 444-445) (cf. Partington 1998)
Corpus-based studies of grammar • Demonstrative pronouns: this versus that • Word classes: nouns, verbs, pronouns • Dependent clauses: that-clauses versus to-clauses • (From the Longman Grammar of Spoken and Written English)
Case studies on grammar (1)The grammar of individual words: Demonstrative pronouns this versus that • The traditional description of the difference: • This refers to a thing near the speaker • That refers to something that is not near the speaker
The grammar of individual words (cont.) Demonstrative pronouns that versus this
Demonstrative pronouns that versus this (cont.) • Examples of that in conversation (vague or situational reference) That was delicious. A: I was, I was flat on my back. B: Uh, I can't sleep like that • Examples of this in academic writing (text deixis) GAAP requires that a business use the accrual basis. This means that the accountant records revenues as they are earned…
Case studies on grammar (2) The register distribution of grammatical classes: Nouns, verbs, personal pronouns
Case studies on grammar (3)Syntactic features Dependent clauses are common in writing but rare in speech:Contrasting intuitions with actual use
Verb + that-clause in conversation: I know (that) I told you. I think (that) we picked it up. • Extraposed to-clauses in academic prose: It is important to specify the states … It is difficult to maintain a consistent level… It is impossible to liquefy a gas …
Corpus-based studies of lexico-grammar Case studies from the Longman Grammar of Spoken and Written English: • The grammatical ‘patterns’ of individual words: tell and promise (cf. Hunston and Francis 2000; Thornbury 2004) • Passive verbs: common and rare • Common verbs with that-clauses in conversation
Case studies on lexico-grammar (1) The grammar of words: tell versus promise • Both verbs have identical valency patterns: • They can occur as monotransitive verbs (with a direct object) • or as ditransitive verbs (with a direct object and an indirect object)
Example of TELL in newspapers – expressing both the addressee AND the content of the message: Cheney told[Navy Secretary H. Lawrence Garrett][that he would cancel the $50 billion project] … • Example of PROMISE in newspapers – expressing only the content of the promise: The company promised[to donate about $500,000 to the cause] …
Case studies on lexico-grammar (2) The words of grammar: Verbs with passive voice
Verbs with passive voice • Selected verbs that almost always occur with passive voice in academic prose (over 70% of the time): • Verbs of scientific methodology: be analyzed, be calculated, be collected, be measured, be tested • Their occurrence is measured in a few parts per million. • Verbs expressing logical relations and interpretations: be based (on), be associated (with), be attributed (to), be interpreted (as), be regarded (as) • Their presence must be regarded as especially undesirable.
Verbs with passive voice (2) • Selected transitive verbs that almost never occur in the passive voice: agree, guess, have, like, love, quit, reply, try, want, watch, wish, wonder
Case studies on lexico-grammar (3) Verbs controlling that-clauses versus to-clauses
Verbs that control that-clauses • Almost 200 verbs attested in the LSWE Corpus (e.g., feel, realize, hear, assume, suggest, ensure, indicate, imply, propose) • Only 4 verbs are extremely common in conversation: think, say, know, guess
Language for specific purposes • Language use is mediated by register • That is, notions like ‘common’, ‘rare’, and ‘typical’ are usually not meaningful for general English. • Rather, language features and patterns are typical of particular registers. • Case study of modal verbs in university registers
Why are there so many prediction modals in class management? These usually serve (indirect) directive functions: • I'd like you to review your quizzes • I would encourage you to add this to your stack of materials • and then assignment six will be due Tuesday
Students using corpora in the classroom • The student as researcher: Data-driven learning (e.g., article use) (Johns – e.g., 1991, ELR Journal) • LSP applications: student concordancing based on a specialized corpus (see, e.g., Donley and Reppen 2001, TESOL Journal; Gavioli and Aston 2001) • Do students benefit? Yes: enhances vocabulary learning and transfer of word knowledge (Cobb 1997, System; 1999, CALL)
General considerations for curricula, materials development, and lesson planning • What language features and grammatical topics to include / exclude • What vocabulary to include • Sequencing • Providing meaningful practice
Using corpus-based materials in the classroom: Issues (1) • How to adapt corpus-based research findings? • What kinds of corpus findings are useful for learners? • How to adapt natural text for classroom use? • What kinds of gains in proficiency should we expect from corpus-based materials?
Developing corpus-based materials for the classroom: Issues (2) • How important is frequency / typicality? What about representation of specific target registers? • Difficulty and learnability of the construction; inter-language sequences – natural order of acquisition. • To what extent are current practices actually informed by research on acquisition?? • Unreliability of intuitions
Future research directions • Need for empirical research on the translation of corpus research findings to classroom materials: • Overall distribution of grammatical features Issues of inclusion and sequencing • Collocation and lexico-grammatical patterns Issues of word choice and practice within a lesson • Discourse factors influencing grammatical variation and choice Presentation and practice within a lesson • What kinds of gains in proficiency, in response to what kinds of materials?