Corpus analysis 2
This presentation is the property of its rightful owner.
Sponsored Links
1 / 42

Corpus analysis (2) PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on
  • Presentation posted in: General

Corpus analysis (2). Corpus Linguistics Richard Xiao [email protected] Outline of the session. Lecture Keyword Reference corpus Key keyword Practical WST keyword AntConc keyword Wmatrix keyword / key concept Extra: keyword analysis with CQPweb. What is a keyword?.

Download Presentation

Corpus analysis (2)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Corpus analysis 2

Corpus analysis (2)

Corpus Linguistics

Richard Xiao

[email protected]


Outline of the session

Outline of the session

  • Lecture

    • Keyword

    • Reference corpus

    • Key keyword

  • Practical

    • WST keyword

    • AntConc keyword

    • Wmatrix keyword / key concept

    • Extra: keyword analysis with CQPweb


What is a keyword

What is a keyword?

  • Keywords are those words whose frequency is exceptionally high (positive keywords) or low (negative keywords) in comparison with a reference corpus

    • Keywords usually refer to positive keywords

    • But negative keywords are equally interesting (see Xiao and McEnery 2005)

      • They appear at the very end of your listing, in a different colour in WordSmith

      • They are omitted automatically from a keywords database for key keyword analysis and a keyword plot


Why keyword analysis

Why keyword analysis?

  • Indicating the ‘aboutness’ (Scott 1999) of a particular text or corpus

    • Contents analysis, discourse analysis

  • Also revealing the salient features which are functionally related to a particular genre (Xiao and McEnery 2005)

    • Genre analysis, stylistic analysis


How to do keyword analysis

How to do keyword analysis

  • Make a wordlist of the target corpus

  • Locate or make a word list of a reference corpus

    • Scott (2005) “In search of a bad reference corpus”

      • http://www.methodsnetwork.ac.uk/redist/pdf/es1_05scott.pdf

    • The reference corpus is usually larger than the target corpus

    • The appropriateness of a reference corpus depends on your research questions!

  • Compare the frequency of each item in the two wordlists to extract keywords – done automatically

  • Analyse and interpret keywords – you will do it!


Keywords in the party speeches

Keywords in the party speeches

  • Target corpus – just one text

    • David Cameron's speech at the Conservative conference (10 October 2012, Manchester)

      • http://www.bbc.co.uk/news/uk-politics-15189614

      • Local copy available (David_speech Unicode text) - download and unzip the file into a file folder:

        www.fass.lancs.ac.uk/projects/corpus/data/workshop3texts.zip

  • Reference corpus

    • The 100-million-word BNC: download and unzip (local copy available)

      www.lexically.net/downloads/version4/BNC_World.zip

  • Tool

    • WST Keyword


Wordlist of david s speech

Wordlist of David’s speech


Creating keyword list

Creating keyword list


Keyword extraction in progress

Keyword extraction in progress

Warning: It can take time if you have loaded two large wordlists


Keywords in david s speech

Keywords in David’s speech

What do these

keywords tell us?

Negative keyword


Keyword plot view

Keyword: Plot view


What companies do keywords keep

What companies do keywords keep?


Why marriage

Why “marriage”?


Key clusters

Key clusters

Similar to word clusters,

but only keywords are used.


Key keywords

Key keywords

  • A key keyword is one which is "key" in more than one of a number of related texts

    • The more texts it is "key" in, the more "key key" it is

    • Can avoid extracting keywords which are unusually frequent in only a small number of files

  • Can be created automatically and as simple to extract as you do for keywords

  • n.b. Negative keywords are omitted automatically from a key keyword list


Making a batch wordlist

Making a batch wordlist

Specify a folder where you can write


Batch making keyword lists

Batch making keyword lists


Batch making keyword lists1

Batch making keyword lists

Specify a folder where you can write


Making a kw database

Making a KW database


Key keywords1

Key keywords

An "associate" is a keyword that appears in the same text

key coverage of the corpus


Keyword in antconc

Keyword in AntConc

target corpus

reference corpus


Keyword in antconc1

Keyword in AntConc

Key words in David's speech (in relation to Ed's speech)


Wmatrix keywords and key concepts

Wmatrix: Keywords and key concepts

  • POS and semantic tagging

  • Keyword / key concept analysis in Cameron’s speech in comparison with Miliband’s speech

  • Copy and paste the speeches into two separate text files

    • http://www.bbc.co.uk/news/uk-politics-15189614

    • http://www.labour.org.uk/ed-milibands-speech-to-labour-party-conference

  • Save the two texts as David_speech.txt and Ed_speech.txt

    www.fass.lancs.ac.uk/projects/corpus/data/workshop3texts.zip


Wmatrix keywords and key concepts1

Wmatrix: Keywords and key concepts

  • Login with your account using zhejiangxx account

    • http://ucrel.lancs.ac.uk/wmatrix3.html


Tagging wizard

Tagging Wizard


Tagging in progress

Tagging in progress


Tagging result

Tagging result


Labour frequency list

Labour frequency list


Kwic concordance

KWIC concordance


My folders

“My folders”

Upload and tag Ed’s speech

…and click on “My folders”

Warning: Your folder view may look different!


Open david speech folder and select ed speech in keyword compared to dropdown box

Open David_speech folder and select Ed_speech in “Keyword compared to” dropdown box


Keyword list to download

Keyword list to download!


Keyword cloud even more interesting

Keyword cloud – even more interesting!


David s key concepts key concepts compared to

David’s key concepts(“Key concepts compared to”)


Keyword analysis in online corpora

Keyword analysis in online corpora

  • Using Lancaster’s CQPweb to compare British English (LOB+FLOB) and American English (Brown + Frown)

  • Login CQPweb

    • http://cqpweb.lancs.ac.uk

  • Similar analysis can be done at BSFU’s CQPweb corpus hub (different corpora)

    • http://124.193.83.252/cqp/

    • Account: ID=pass=test


Creating subcorpora

Creating subcorpora


Creating subcorpus bre

Creating subcorpus BrE


Creating subcorpus ame

Creating subcorpus AmE


Making wordlists

Making wordlists


Wordlist available now

Wordlist available now


Computing keywords

Computing keywords

You can make adjustments to the statistical measure, cut-off point, and minimum frequency according your research purposes.


Keywords in bre and ame

Keywords in BrE and AmE


  • Login