Harnessing corpora for real and virtual elt purposes
1 / 20

Harnessing Corpora for real and virtual ELT purposes - PowerPoint PPT Presentation

  • Uploaded on

Harnessing Corpora for real and virtual ELT purposes. IFELT Belinda Maia FLUP 10/11.2003. What is a corpus?. CORPUS - 13c: from Latin corpus body - plural corpora )

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Harnessing Corpora for real and virtual ELT purposes' - richard-jefferson

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Harnessing corpora for real and virtual elt purposes

Harnessing Corpora for real and virtual ELT purposes


Belinda Maia



What is a corpus
What is a corpus?

  • CORPUS- 13c: from Latin corpus body - plural corpora)

  • A body of texts, utterances or other specimens considered more or less representative of a language, stored as an electronic database.

  • A corpus corpora may store many millions of running words

  • A corpus can betaggedto identify and classify words and other formations

  • A corpus can be searched using concordancing programmes

An example of concordancing from the bnc
An example of concordancing(from the BNC)

A0R 2231 Maybe with twists of bacon.

A35 256 This substantial, 15-minute orchestral movement was inspired by three paintings of Innocent X by Francis Bacon, themselves based on Velasquez.

A6N 1311 They could cook vegetables and meat simply, deal with eggs and bacon and porridge, and they were able to bake and housekeep, learning as they went along.

AAX 286 Sir Richard Body, MP Hirohito, shy god who liked bacon & eggs.

ABB 67 Remembering bacon and ham, the versatility of the pig can be stretched to pies, sandwiches and ham, egg and chips.

ABB 236 The Smoked Trout & Parma Ham Mousse (see p18) is merely decorated with slices of the ham and the Carbonnade of Beef is enriched by using diced ham instead of bacon.


  • Example – courtesy Catherine Ball at: http://www.georgetown.edu/faculty/ballc/corpora/tutorial2.html#RTFToC16

  • A01 2 ^ *'_*' stop_VB electing_VBG life_NN peers_NNS **'_**' ._.A01 3 ^ by_IN Trevor_NP Williams_NP ._.A01 4 ^ a_AT move_NN to_TO stop_VB \0Mr_NPT Gaitskell_NP from_INA01 4 nominating_VBG any_DTI more_AP labour_NNA01 5 life_NN peers_NNS is_BEZ to_TO be_BE made_VBN at_IN a_AT meeting_NNA01 5 of_IN labour_NN \0MPs_NPTS tomorrow_NR ._.

Types of corpora
Types of Corpora

  • Monolingual corpora - in which the texts are all in the same language

  • Parallel and/or aligned corpora - in which originals and translations are aligned so that both texts appear on the screen together and you can see how the translator has translated the original.

  • Comparable corpora - in which a selection of original texts has been made in two or more languages dealing with the same subject or genre.

Types of corpora1
Types of Corpora

  • Specialized corpora - texts on specialized subjects for the extraction of terminology and complementary explanatory material - definitions, explanations etc.

  • Concurrent corpora - used to describe texts taken from newspapers on the same subject on approximately the same dates.

  • 'Do-it-yourself ' or ‘disposable’ corpora - small specialized corpora for the purpose of teaching translation or language

Corpora and lexicography
Corpora and Lexicography

  • COBUILD = Collins Publishers + University of Birmingham – 1980s

    • Corpora work that revolutionised lexicography

  • TODAY - All serious lexicography uses corpora - e.g.

    • Oxford English Dictionary http://www.oed.com/

    • Academia das Ciências de Lisboa

Corpora grammar
Corpora & Grammar

  • The Longman Grammars of English (Quirk, Greenbaum, Svartvik, Leech and others)

    • Based on corpora – the classical corpora now availableon CD-ROM through ICAME

    • http://www.hd.uib.no/icame.html

  • BIBER, D., S. JOHANSSON, G. LEECH, S. CONRAD & E. FINEGAN. 1999. Longman Grammar of Spoken and Written English. Harlow: Pearson Education Ltd. 

The corpora debate
The corpora debate

  • The bigger the corpus, the better

  • The carefully chosen ‘representative’ corpora

  • Chomsky > the average educated speaker was a better source

  • Big corpora are not necessarily representative – e.g. The Hansard corpus

  • Any selection of texts – is a selection

Harnessing corpora for real and virtual elt purposes

  • Very Large corpora exist and are very useful

  • Much research work nowadays is done with small selected corpora for studying:

    • different registers

    • special subjects

Using official corpora en
Using official corpora - EN

  • British National Corpus at: http://sara.natcorp.ox.ac.uk/lookup.html- 50 examples of any word or expression for free on-line

  • CD-ROM of 100 million words available

  • The COBUILD projecthttp://titania.cobuild.collins.co.uk/form.html

  • 40 Examples on-line

Using official corpora pt
Using official corpora - PT

  • AC/DC, CetemPúblico – Portuguese monolingual corpora

  • COMPARA – aligned English/Portuguese corpus

  • All at http://www.linguateca.pt

Language learning teaching and corpora
Language Learning/Teaching and corpora

  • How can a language teacher use corpora?

  • Why should a language learner need to know about corpora?

  • What can be learnt?

How can a language teacher use corpora
How can a language teacher use corpora?

  • The teacher can:

    • find an enormous amount of material for use in class, for exercises

    • check on real usage and compare it to textbooks used

  • BUT:

  • Must be aware that corpora sometimes prove the textbook wrong!

What can be learnt
What can be learnt?

  • Corpora as reference material for:

    • Lexical work

    • Syntactic study

    • Textual analysis

    • Observing language ‘in action’

    • Learning about a wide variety of areas

The student
The student

  • Can be trained to search autonomously for information of all kinds

    • Finding texts that supply real knowledge

    • Finding texts that serve as models for style and register

    • Finding correct collocations of individual words

Do it yourself corpora
Do-it-yourself corpora

  • Suggestion:

  • Train students to make and use their own corpora by:

    • Collecting texts off the Internet

    • Using the ‘Find’ function in Word

    • Broadening their vocabulary

Useful sites
Useful sites

Catherine N. Ball:

Tutorial: Concordances and Corpora

  • http://www.georgetown.edu/faculty/ballc/corpora/tutorial.html

  • Tim John’s Data-driven learning at: http://web.bham.ac.uk/johnstf/

Useful sites1
Useful sites

  • Concordance the whole Web at: http://www.webcorp.org.uk/

  • And, of course, – Google at:

  • http://www.google.com