1 / 24

A Brief Intro to Corpus Techniques in ELT Research

A Brief Intro to Corpus Techniques in ELT Research. By Erkan Karabacak. Several Important Buzzwords. corpus : corpora : concordancer : keyword in context (KWIC) :. Several Important Buzzwords. corpus : a collection of texts corpora : corpus in plural concordancer : a search engine

holly
Download Presentation

A Brief Intro to Corpus Techniques in ELT Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Brief Intro to Corpus Techniques in ELT Research By Erkan Karabacak

  2. Several Important Buzzwords corpus: corpora: concordancer: keyword in context (KWIC):

  3. Several Important Buzzwords corpus: a collection of texts corpora: corpus in plural concordancer: a search engine keyword in context (KWIC): a list of words in context

  4. Examples of Corpora •The Oxford Text Archive www.ota.ox.ac.uk •Warwick Centre for Applied Linguistics http://www2.warwick.ac.uk/fac/soc/al/ •Open American National Corpus http://americannationalcorpus.org/OANC/OANC-1.0.1-UTF8.zip

  5. Examples of Concordancers • Monoconc • Wordsmith Tools • Concordance • Simple Concordance Program • WConcord • TextStat • AntConc

  6. KWIC (Keyword in Context)

  7. Today AntConc is our concordancer We will use BASE and texts from our students as our corpora We will do some simple analyses to answer some language related questions.

  8. How to use AntConc? Open the read-me file online and read it. http://www.antlab.sci.waseda.ac.jp/software/README_antconc3.2.1.txt

  9. How to install AntConc? By Laurence Anthony, Waseda University, Tokyo http://www.antlab.sci.waseda.ac.jp/software/antconc3.2.1w.exe Open Google andsearch for “download AntConc”

  10. What will we analyze? We need a collection of texts (corpus) of an adequate size.

  11. The British Academic Spoken English Corpus • developed at the Universities of Warwick and Reading • a collection of transcripts of lectures and seminars recorded at two universities in the UK during the period 1998-2005. • recorded in a variety of university departments. four broad disciplinary groups, • each represented by 40 lectures and 10 seminars.

  12. These groups are: • Arts and Humanities • Life and Medical Sciences • Physical Sciences • Social Studies and Sciences.

  13. Today we will use: • Arts and Humanities , 40 text files, untagged • Life and Medical Sciences, 40 text files, untagged • Physical Sciences • Social Studies and Sciences.

  14. An excerpt: …now what are you reading now he asked as i put down the book and reached for my jacket i was labouring over Troilus and Criseyde reading an essay on Criseyde's character you love this rubbish eh he laughed you'll end up an old professor wanking by the fireside putting aside your pipe and warming up your hand first i should say this is not a autobiographical work [laughter] in any s-, in any way right [laughter] er sm0003: you've said that before [laughter] nm0001: warming up your hand first [laughter] i looked at him sternly only a joke man he said with mocking reassurance only a joke i sat on the bus deep in thought trying to work out why she should have betrayed him so easily why after all those pure shy exchanges the secret glances

  15. How will we analyze this corpus? Open AntConc FileOpen FilesSelect the files you would like to analyze by ctrl+shift (or clicking with your mouse’s left button)Open You will see the selected files in the left window (titled “corpus files”)

  16. Word ListLet’s get an idea of our corpus. What is the size of the corpus? (How many words (tokens) are there?) How many different words (types) are there? Click “Word List” Make your selectionsStart

  17. Concordance Let’s search for a single word.

  18. Activity 1: Some fun questions: Which lectures are the most fun? Which lectures did not have a lesson plan? What part of speech mostly follows a pause?

  19. How to analyze tagged corpora <struct type="tok" from="36" to="40"> <feat name="msd" value="DT" /> <feat name="base" value="this" /> <feat name="affix" value="" /> </struct> <struct type="tok" from="41" to="43"> <feat name="msd" value="VBZ" /> <feat name="base" value="be" /> <feat name="affix" value="s" /> </struct> <struct type="tok" from="29" to="34"> <feat name="base" value="right" /> <feat name="msd" value="NN" /> </struct> <struct type="tok" from="34" to="35"> <feat name="base" value="," /> <feat name="msd" value="," /> </struct>

  20. Activity 2: Keyword Analysis Let’s say we want to create a dictionary of medical terms. Our analysis corpus is BAWE Life and Medical Sciences

  21. Activity 3: Action Research What are the most frequently used 10 lexical bundles by American students? What are the most frequently used 10 lexical bundles by Chinese students?

  22. What else can we do? Of course, AntConc is not enough for every type of analysis. An applied linguist who wishes to analyze large language data not only should know several application programs, but also learn a programming language; such as PERL

  23. We can create a diachronic corpus from our students papers and observe their development. We can tag texts for their part of speech or for other information. We can automatically compile corpora from online sources.

  24. We can do all of the above for other languages (Turkish, Chinese, Russian, and so on) We can do EVERYTHING a linguist might need to do with texts.

More Related