1 / 34

Corpus analysis (2)

Corpus analysis (2). Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com. Outline of the session. Lecture Keyword Reference corpus Key keyword Lab WST keyword AntConc keyword Wmatrix keyword / key concept. What is keyword?.

yosef
Download Presentation

Corpus analysis (2)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Corpus analysis (2) Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

  2. Outline of the session • Lecture • Keyword • Reference corpus • Key keyword • Lab • WST keyword • AntConc keyword • Wmatrix keyword / key concept

  3. What is keyword? • Keywords are those words whose frequency is exceptionally high (positive keywords) or low (negative keywords) in comparison with a reference corpus • Keywords usually refer to positive keywords • But negative keywords are equally interesting (see Xiao and McEnery 2005) • They appear at the very end of your listing, in a different colour in WordSmith • They are omitted automatically from a keywords database and a keyword plot

  4. Why keyword analysis? • Indicating the ‘aboutness’ (Scott 1999) of a particular text or corpus • Contents analysis, discourse analysis • Also revealing the salient features which are functionally related to a particular genre (Xiao and McEnery 2005) • Genre analysis, stylistic analysis

  5. How to do keyword analysis • Make a wordlist of the target corpus • Locate or make a word list of a reference corpus • Scott (2005) “In search of a bad reference corpus” • http://www.methodsnetwork.ac.uk/redist/pdf/es1_05scott.pdf • The reference corpus is usually larger than the target corpus • The appropriateness of a reference corpus depends on your research questions! • Compare the frequency of each item in the two wordlists to extract keywords – done automatically • Analyse and interpret keywords – you will do it!

  6. Keywords in the Blair text • Target corpus – just one text • ‘Why Blair is so determined not to run into sands’ (The Times, 16th November 2005) • http://www.timesonline.co.uk/tol/news/politics/article590683.ece • Local copy available • Reference corpus • The 100-million-word BNC • Tool • WST Keyword

  7. Wordlists of the Blair text and the BNC BNC list: www.lexically.net/downloads/version4/BNC_World.zip

  8. Creating keyword list

  9. Keyword extraction in progress Warning: It can take time if you have loaded two large wordlists

  10. Keywords in the Blair text

  11. … our choice of “reference corpus” is important …!

  12. Keyword: Plot view Plot view

  13. What companies do keywords keep?

  14. Companies of “public” The most frequent company of the keyword “public” is...?

  15. Key clusters key clusters

  16. Key keywords • A key keyword is one which is "key" in more than one of a number of related texts • The more texts it is "key" in, the more "key key" it is • Can avoid extracting keywords which are unusually frequent in only a small number of files • Can be created automatically and as simple to extract as keywords • n.b. Negative keywords are omitted automatically from a key keyword list

  17. Making a batch wordlist

  18. Batch making keyword lists

  19. Batch making keyword lists

  20. Making a KW database

  21. Key keywords • An "associate" is a key word that appears in the same text An "associate" is a keyword that appears in the same text key coverage of the corpus

  22. Keyword in AntConc target corpus reference corpus

  23. Keyword in AntConc Blair text against "Hard Cash"

  24. Wmatrix: Keywords and key concepts • POS and semantic tagging in session 4 • Keyword / key concept analysis the manifestoes of Labour and Libdem • Labour • http://ucrel.lancs.ac.uk/wmatrix/tutorial/labour%20manifesto%202005.pdf • Libdem • http://ucrel.lancs.ac.uk/wmatrix/tutorial/libdem%20manifesto%202005.pdf • Saved as plain text files (local copies available) • Login with your account • http://ucrel.lancs.ac.uk/wmatrix2.html

  25. Tagging Wizard

  26. Tagging in progress

  27. Tagging result

  28. Labour frequency list

  29. KWIC concordance

  30. “My folders” Upload and tag the Libdem text …and click on “My folders” Warning: Your folder view may look different!

  31. Open Labour folder and select libdem in “keyword compared to” dropdown box

  32. Keyword list to download!

  33. Keyword cloud – even more interesting!

  34. New Labour’s key concepts(“Key concepts compared to”)

More Related