The Armchair and the Machine Corpus-Assisted Discourse Studies Alan Partington Lorient 14/09/07 Corpus-Assisted Discourse Studies ( CADS ) What does CADS do? Examples (politics & media) & Types of research questions / methodologies Teaching material? “two types of linguist”
Corpus-Assisted Discourse Studies
the Armchair linguist …
“sits in a deep soft comfortable armchair, with his eyes closed and his hands clasped behind his head.
Once in a while he opens his eyes, sits up abruptly shouting, “Wow, what a neat fact!”, grabs his pencil, and writes something down.
Then he paces around for a few hours in the excitement of having come still closer to knowing what language is really like.”
the Corpus linguist …
“has all the primary facts that he needs, in the form of approximately one zillion running words, and he sees his job as that of deriving secondary facts from his primary facts.
At the moment he is busy determining the relative frequencies of the eleven parts of speech as the first word of a sentence”
“These two don’t speak to each other very often,
but when they do the corpus linguist says to the armchair linguist, ‘Why should I think that what you tell me is true?’,
and the armchair linguist says to the corpus linguist, ‘Why should I think that what you tell me is interesting?’”
…corpus linguists have so far contributed little to answering classic questions of cognitive and social theory; they have hardly considered the relevance of corpus evidence to questions about the mental lexicon and the construction of the social world (though one of Halliday’s central topics)
(Stubbs 2006: 15)
…could be related …may be reducible… may also be internally related … seems to show … might also provide … show how we could do real ‘ordinary language philosophy’ …
New ways of observing
New ways of thinking
New ways of observing = astronomy
New ways of thinking = model of universe
New ways of observing = radio-telescopy
New ways of thinking = theory of creation
New ways of observing = inductive data-driven
New ways of thinking = lexical grammar
Investigate (and compare) discourse types(DTs):
to “not get caught in using corpora just to tell you more about what you know already”
(Sinclair 2004: 183)
Statistical OVERVIEW (very quickly)
“Quantitative” approach (“general” language dictionaries, grammars)
DETAILED analysis, even single texts
Compare DT(a) – DT(b) – DT(n)
Compare DT(a) – BNC / BoE
Corpus: “Black box” – Keep out!
Detailed knowledge of DT:
Stubbs (1996; 2001)
Newspool: Partington, Morley & Haarman (eds) 2004
CorDis: Morley & Bayley (eds) forthcoming
“I’ve been doing CADS for years and never knew it”
(Geoffrey Williams, Siena 2006)
Berlusconi’s election speeches (Garzone & Santulli 2004)
Word lists (WordSmith):
Italia; stato; libertà
Lo stato when it is run by the Left:
autoritario, burocratico, invasivo, moloch, padrone, stato-partito (authoritarian, bureaucratic, invasive, moloch, bossy, a party-state)
Lo stato when treated to the Forza Italia cure becomes:
amico, civile, di diritto, liberale, moderno (friend, civilised, lawful, liberal, modern)
Libertà is the third most frequent noun;
but it is rarely attached to an individual in the co-text. Whose liberty?
How does P achieve G with language?
What does this tell us about P?
Comparative: how do P1 and P2 differ?
world (468 - 136):
global dimension, attack on the international community, not just USA
war (351 - 60)
Reaction must be: declare war on terrorism, launch an international war
enemy (106 - 20)
Collocates: semantic preference forthe unknown
in- and –un words: