1 / 43

Corpora in sociolinguistic studies

Corpora in sociolinguistic studies. Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com. Aims of this session. Lecture Corpora vs. sociolinguistics Some examples of corpus-based sociolinguistic studies Case study of amplifiers in the BNC Lab session

zudora
Download Presentation

Corpora in sociolinguistic studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Corpora in sociolinguistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

  2. Aims of this session • Lecture • Corpora vs. sociolinguistics • Some examples of corpus-based sociolinguistic studies • Case study of amplifiers in the BNC • Lab session • Using BNCweb to explore sociolinguistic variation in the BNC

  3. Corpora vs. sociolinguistics • Sociolinguistics has traditionally focused on phonological and grammatical variations in terms of “features and rules” (de Beaugrande 1998: 133) • The use of corpus data can bring sociolinguistics “some interesting prospects” (de Beaugrande 1998: 137) • “Real data also indicate that much of the socially relevant variation within a language does not concern the phonological and syntactic variations” (de Beaugrande 1998: 133)

  4. Corpora vs. sociolinguistics • “Corpus can help sociolinguistics engage with issues and variations in usage that are less tidy and abstract than phonetics, phonology, and grammar, and more proximate to the socially vital issues of the day…corpus data can help us monitor the ongoing collocational approximation and contestation of terms that refer to the social conditions themselves and discursively position these in respect to the interests of various social groups” (de Beaugrande 1998: 135)

  5. Corpora vs. sociolinguistics • Sociolinguistics has traditionally been based upon empirical data (e.g. fieldwork), but the use of standard corpora in this field has been limited • the need to operationalize sociolinguistic theory into appropriate research questions that can be addressed using measurable categories suitable for corpus research • the lack of sociolinguistic metadata encoded in currently available corpora (with the possible exception of the BNC) • the lack of sociolinguistically rigorous sampling in corpus construction • Corpus-based sociolinguistic studies have so far largely been restricted to the area of gender studies at the lexical level (e.g. lexical items referring to men/women)

  6. Some examples • Caldas-Coulthard and Moon (1999) • Women were frequently modified by adjectives indicating physical appearance (e.g. beautiful, pretty, lovely) whereas men were frequently modified by adjectives indicating importance (e.g. key, big, great,main) • Hunston (1999) • “Right” modifying “man” is work-related (‘the right man for the job’) whereas the typical meaning of right co-occurring with women is man-related (‘the right woman for this man’)

  7. Some examples • Holmes and Sigley (2002) used Brown/LOB and Frown/FLOB/WWC (Wellington Corpus of Written New Zealand English) to track social change in patterns of gender marking between 1961 and 1991 • “while women continue to be the linguistically marked gender, there is some evidence to support a positive interpretation of many of the patterns identified in the most recent corpora, since the relevant marked contexts reflect inroads made by women into occupational domains previously considered as exclusively male.” (ibid: 261)

  8. Some examples • Baker (2004) undertook a corpus-based keyword analysis of the debates over a Bill to equalize the age of sexual consent for gay men with the age of consent for heterosexual sex at sixteen years in the House of Lords in the UK between 1998 and 2000 • ‘homosexual’was associated with acts whereas ‘gay’was associated with identities • Those who argued for the reform focused on equality and tolerance; those who argued against it linked homosexuality to danger, ill health, crime and unnatural behaviour

  9. Case study • How do the Britons use amplifiers: A sociolinguistic perspective • Based on Xiao, R. and Tao, H. (2006) A corpus-based sociolinguistic study of amplifiers in British English. Sociolinguistic Studies 1(2): 241-273

  10. Amplifiers in sociolinguistic studies • Amplifiers such as very, so, absolutely and totally are a common type of intensifiers that semantically function to increase intensification • Earlier studies have largely concentrated on the structural and semantic properties of amplifiers • Since the 1970s numerous studies focusing on amplifiers have been conducted in the areas of gendered language and language and power

  11. Amplifiers in sociolinguistic studies • Stoffel (1901:101) and Jespersen (1922:250) observed, impressionistically, that the use of amplifiers was characteristic of women’s speech • Robin Lakoff, a pioneer in research on language and gender, draws attention to women’s use of amplifiers (and hedges) as a prominent feature of “powerless language” (Lakoff 1973, 1975, 1990) • Women often use expressions such as I like him so much so as to “weasel on” or be vague about the intensity of their emotions, and this effect is achieved through the semantic vagueness of amplifiers such as so (Lakoff 1975: 55)

  12. Case study of amplifiers • Xiao and Tao (2006) provide a comprehensive account of (33) common amplifiers in British English as mirrored by the attested language use in the 100M word BNC • How differently, if at all, do men and women use amplifiers, in quantitative and qualitative terms? • Do one’s age, social class and level of education affect their use of amplifiers? • Are the gender and age of audience relevant parameters of gendered language? • How are amplifiers used differently in different discourse modes and registers? • In what way has the use of amplifiers developed over the past decades?

  13. Which gender uses amplifiers more often? • The question of which gender uses amplifiers more frequently has been a long standing issue since Stoffel (1901), Jespersen (1922) and especially Lakoff’s work in the 1970s • A number of competing and conflicting answers have been proposed to this question • Different authors have offered different answers largely because different amplifiers are studied and different databases are used • A large balanced corpus like the BNC and the wider range of amplifiers under investigation will produce more reliable results

  14. Distribution of amplifiers across genders (Frequency per M words; significant LL scores highlighted)

  15. Distribution of amplifiers across genders (Frequency per M words; significant LL scores highlighted)

  16. Distribution of amplifiers across genders • Our data provides mixed results for the observations that link amplifiers to the female language use • There are internal variations in gendered preferences between speech and writing, and among individual amplifiers • When all amplifiers are taken as a whole, the difference between men and women in speech is not statistically significant (LL=0.002, 1 d.f., p=0.965) • In writing women use amplifiers significantly more frequently than men (3766.81 and 3078.10 instances per million words respectively)

  17. Distribution of amplifiers across genders • Why do women use amplifiers much more frequently than men only in writing? • A puzzle that cannot be resolved easily by Jespersen’s (1922:250) comments • “The fondness of women for hyperbole will very often lead the fashion with regard to adverbs of intensity.” • A closer look at the genre variations in the written BNC might cast some light

  18. Male and female use of amplifiers in writing * Similar in most written genres * Male use is even more frequent in Biographies and Institutional documents * Higher frequencies of use by male in these two genres are offset by exceptionally high frequency of female use in Instructional writing

  19. Male and female use of amplifiers in writing • Instructional writing is largely procedural and has an informational focus • Hardly surprising that amplifiers are relatively infrequent in instructional texts such as manuals • Women’s exceptionally high frequency of amplifier use in this genre may be associated with “their greater emotional expressiveness and sociability” (Carli 1990)

  20. Male and female preferences • The higher normalized frequency for women in writing is mainly the result of three amplifiers (really, very, quite) • Men and women demonstrate different preferences for individual amplifiers, supporting the findings reported in some earlier works (e.g. Bradac et al. 1995, Stenström 1999)

  21. Typical male and female usages • Some amplifiers (perfectly, pretty, totally,very) are used more frequently by men in speech only while the same items are used more frequently by women in writing only • Men use more maximizers indicating the upper extreme of a scale (e.g. absolutely, completely, utterly) while women are likely to use more boosters denoting a high degree on a scale (e.g. badly, greatly, so)

  22. Amplifiers vs. power • Since Lakoff (1975), intensifiers (amplifiers and hedges) have become one of the important linguistic features in research of gendered language as well as in studies of language and power • Is the frequent use of amplifiers an indicator of powerful or powerless language? • Debatable and without a simple answer • Any judgement of the contribution of amplifiers to a powerful or powerless language style must be subjective

  23. Amplifiers vs. power • In the tradition established by Lakoff (1975) and followed by Holmes (2001), amplifiers and hedges, among other types of intensifiers, are indicators of tentativeness and hesitancy, which characterize a powerless language style • However, there are also many studies which regard amplifiers as an indicator of powerful language • e.g. McEwen and Greenberg (1970), Newcombe and Arnkoff (1979), Bradac, Schneider, Hemphill and Tardy (1980), and Bradac and Mulac (1984)

  24. Amplifiers vs. power • Apart from such different interpretations of the role of intensifiers in language vs. power, the dynamic interaction between linguistic features such as intensifiers with other constraints makes it even more difficult to judge the contribution of amplifiers to power and language • Amplifiers do not appear to display consistent behaviour in affecting a powerful or powerless language style • Inconsistent behaviour of amplifiers is the result of “a danger of seeing what you want to see” (Swann 1992: 198), depending upon “the observer’s situation and expectations” (Mizokami 2001: 143) • Conclusion: Inclusion of intensifiers in powerful or powerless speech style messages may produce spurious, unreliable results (cf. Hosman 1989: 402)

  25. Distribution of amplifiers across age groups • Poynton (1990) notes that the use of amplifiers is generational, e.g., middle-aged and elderly people use amplifiers more frequently than young people • Singh (2005) also notes a qualitative difference • Elderly people frequently use very, the mid-generation uses really, while teenagers prefer so

  26. Distribution of amplifiers across age groups • Young people (aged 15-24 & 25-34) generally use amplifiers more often than older people in both speech and writing • The two discourse modes also display a marked contrast for some age groups (e.g. 0-14)

  27. Distribution of amplifiers across age groups • In speech, those aged 15–24 and 25–34 are the most frequent users of amplifiers while children below 15 use amplifiers least frequently • Children below 15 are the most frequent users of amplifiers in the written BNC, though young people aged 15–24 and 25–34 still use amplifiers as frequently as they do in speech • While young children’s (aged 1-14) under-use of amplifiers in speech might be explained from a developmental perspective, this approach cannot explain their over-use in writing • Stoffel (1901:102) referred to children as “ladies’ men”, meaning that like ladies – and as a result of being influenced by ladies, who are likely to spend more time with children – children also prefer the use of amplifiers • This observation is only partly supported by our data, i.e., the written BNC

  28. Cross-tab of age and gender in writing • A cross-tabulation of age and gender provides a partial explanation as to why children under 15 are the most frequent users of amplifiers in writing • The high frequency of use by this age group is mostly contributed by female writers (with a female/male ratio of 2.01)

  29. Distribution of amplifiers across age groups • The exceptionally high frequency of use by children in the written BNC is also the result of skewed distribution of amplifiers across genres • Amplifiers are most common in personal letters (8231.71 per M words), school essays (6353.65), and email (4402.83), in comparison with an average of 2822.4 for all written genres in the BNC • Samples of writing by children below 15 are only found in two genres, school essays and miscellaneous writing (with NF = 2867.1, also above the average), whereas they are not represented in genres with a low frequency of amplifiers (e.g. administrational texts)

  30. Qualitative difference across ages • Three most common amplifiers (very, quite, really) are used by all age groups in both speech and writing • Absolutely and bloody are used frequently by most age groups in speech whereas particularly is used frequently by most age groups in writing only • Adolescents’ use of amplifiers is largely restricted to a small handful of most common used items whereas they have a larger number of infrequent items in comparison with other age groups • Children under 15 do not appear to use amplifiers other than very, often and really frequently, especially in speech • In writing, there are 13 infrequent items with a frequency below 10 for this age group while the corresponding figures for other age groups range from 4 to 7

  31. Social classes in the BNC • AB = (upper) middle class • managerial and professional • C1 = lower middle class • supervisory and clerical • C2 = skilled working class • skilled manual • DE = working class/underclass • unskilled manual and unemployed

  32. Distribution of amplifiers across social classes • The frequency of amplifiers declines steadily from AB to C1, and then to C2/DE, but the contrast between C1 and DE is not as marked as those for AB versus C1, and for C1 versus C2 • DE shows a higher frequency than C2 because the swearword included in this study, bloody, is much more frequent in the data for DE than for other classes • bloody is a frequent amplifier (over 100 instances per M words) used by all social classes except AB, with its NF increasing steadily from higher to lower class • AB (70.24), C1 (169.89), C2 (226.47), DE (300.98) LL=612.64 for 3 d.f., p<0.001

  33. Tag questions: Sociolinguistic variation • Forms of tag questions • It's a joke isn't it? • Still, I can't win them all, can I? • Pragmatic functions of tag questions (Holmes 1983) • Facilitating • Softening • Challenging

  34. BNCWeb signup / login http://bncweb.lancs.ac.uk/bncwebSignup/user/login.php

  35. Extracting tag questions in the BNC (n't | _VM0 | _V[BDH][BDZ]) _PNP \? n.b. Note the use of white spaces in the search pattern!

  36. BNCWeb Distribution

  37. Encoded metadata

  38. Speaker gender

  39. Speaker age

  40. Speaker social class For a full study of tag questions, see Tottie, Gunnel and Hoffmann, Sebastian (2006) Tag Questions in British and American English. Journal of English Linguistics. 34(4):283-311

  41. Extra practice with BNCweb • Some facts (Daily Mail, 19/01/2009) • 87% of Britons swear on a daily basis • They swear 14 times a day on average

  42. Extra practice with the BNCweb • Questions • Do female speakers swear more often than males? • Which age groups swear frequently and which tend to avoid swearing? • Is swearing related to the social class of the speaker? • Do people swear more often when they speake or write?

  43. Extra practice with the BNCweb • Case study: sociolinguistic variation of f**k in the BNC • “one of the most interesting and colourful words in the English language today” that can be used to describe pain, pleasure, hatred and even love” (Andersson and Trudgill, 1992: 60) • Tips: search for {fuck} • so-called “lemma” search • For a full study of the swear word, see McEnery, Tony and Xiao, Richard (2004) Swearing in Modern British English: The Case of Fuck in the BNC. Language and Literature 13: 235-268

More Related