Using Corpus Tools in Discourse Analysis - PowerPoint PPT Presentation

using corpus tools in discourse analysis n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Using Corpus Tools in Discourse Analysis PowerPoint Presentation
Download Presentation
Using Corpus Tools in Discourse Analysis

play fullscreen
1 / 26
Using Corpus Tools in Discourse Analysis
409 Views
Download Presentation
theola
Download Presentation

Using Corpus Tools in Discourse Analysis

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12

  2. What is a corpus? • An collection of a large number of texts of a particular type in digital format which can be easily searched and manipulated with computer programs What is corpus linguistics ? • The analaysis of collections of texts (corpora) with computer tools in order to detect grammatical, lexical or discourse level patterns, often with the aim of comparing those patterns with those found in other collections of texts.

  3. Examples of corpus assisted discourse analysis • Flowerdew (1997, 2002) • Anlaysis of the speeches of Gov. Chris Patten and CE Tung CheeHwa • common themes: free market economy, freedom of the individual, rule of law • Divergent themes: democracy, stability and harmony • Rey (2001) • Startrek characters from 1966 to 1993 • Female language has shifted from being more relational to more informational • Male language has shifted from being more informational to more relational

  4. Advantages of using corpora • Easily detecting grammatical and lexical patterns in a large number of texts • Reducing researcher bias • Efficiently detecting differences among varieties, registers, genres, and Discourses • Corpus based (deductive) vs. Corpus driven (inductive) analysis

  5. Disadvantages of using corpora • Separation of discourse from its social context • Corpus data usually confined to text (cannot account for images, non-verbal behavior and other aspects of multimodal discourse) • Frequency does not equal importance (sometimes very important messages are implicit or ‘taken for granted’ rather than explicit) • ‘People don’t say what they mean and people don’t mean what they say’ • Words have multiple meanings and word meanings change over time and according to the context in which they are used

  6. Tools for corpus analysis • Online corpora and concordancers • Collins Bank of English • British National Corpus • Corpus of Contemporary American English • International Corpus of English • General vs. Specialized Corpora • Software tools • AntConc • ConcApp • WordSmith Tools

  7. Preparing corpora • Collecting data (Internet? Scanning files?) • Txt files • Separate files for different texts • ‘Cleaning’ files • ‘Tagging’

  8. Procedures in corpus analysis • Type token ratio • Dispersion plots • Frequency lists • Concordance data • Collocation calculations • Keyword calculations

  9. Example • Lady Gaga’s lyrics • Total of 59 songs • Reference corpus: 100 top songs from November 2010

  10. Type Token Ratio Number of types divided by the number of tokens

  11. Type Token Ratio • Low indicates narrow range of subjects, lack of variety or frequent repetition • High indicates wide range of subjects, great variation, less frequent repetition • BNC Written = 45.53 • BNC Spoken = 32.96 • Baker’s Holiday Pamphlets = 40.03 • 100 Song Corpus = 9.07 • Gaga Corpus = 11.4

  12. Frequency lists

  13. Frequency • Function words (articles, prepositions, conjunctions, pronouns, etc.) • Useful in answering questions about style, register • Pronouns can be particularly important • Content words (nouns, verbs, adjectives, adverbs) • Useful in answering questions about topics/ Discourses

  14. Top 5 function words • 100 Song Corpus • I • you • the • and • it • Gaga Corpus • I • you • the • oh • me I = 5.09% me = 1.3% 1 = 4.4% me = 2.03% Murphey 1992: The word count revealed that the total referents in first person (I, me, my, mine, etc.) amounted to 10% of the total words

  15. ‘t (not) • 100 Song Corpus • Ranked 7 • 1.3% • Gaga Corpus • Ranked 9 • 1.59%

  16. Top 5 content words • 100 Song Corpus • like • no • can • baby • know • (love) (0.42%) • Gaga Corpus • love (0.98%) • baby • can • want • know

  17. Concordances

  18. Concordances • Can reveal contexts of frequent words • Sorting strategies • Searching for patterns

  19. Concordances

  20. Collocation • ‘Co-location’ • The frequency with which words appear close to other words • ‘You shall know a lot about a word from the company it keeps.’ (Firth 1957) • Span (xL, xR)

  21. Top 5 collocates for ‘I’ • 100 Song Corpus • ‘m • and • can • Know • ‘ll • Gaga Corpus • ‘m • want • ‘ll • don’t • can Span: 1L, 1R

  22. Top 5 collocates of ‘love • 100 Song Corpus • I • you • my • me • the • Gaga Corpus • I • fu • want • ‘t • revenge Span 5l, 5R

  23. Keywords • The frequency of words in a corpus in relation to another corpus • The statistical significance of a keyword's frequency in a given corpus, relative to a reference corpus.

  24. Keywords

  25. Keywords: semantic domains • lover* • romance* • love* • loves • fame* • fancy* • ribbons* • glitter • fashion • vanity • rich • presents • famous • retro* • bang* • shake* • dirty* • grease* • bad* • teeth • monster • filthy • oh* • eh*

  26. What does this analysis tell us out Lady Gaga lyrics? • Style and texture • Whos doing whats • Discourses and ideology