Investigating speech thought and writing presentation in a corpus of spoken british english
Download
1 / 10

Investigating speech, thought and writing presentation in a corpus of spoken British English - PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on

Investigating speech, thought and writing presentation in a corpus of spoken British English. An AHRB funded project under the supervision of Mick Short, Elena Semino and Tony McEnery. Research Assistants John Heywood and Dan McIntyre. Project outline.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Investigating speech, thought and writing presentation in a corpus of spoken British English' - rich


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Investigating speech thought and writing presentation in a corpus of spoken british english

Investigating speech, thought and writing presentation in a corpus of spoken British English

An AHRB funded project under the supervision of

Mick Short, Elena Semino and Tony McEnery

Research Assistants

John Heywood and Dan McIntyre


Project outline
Project outline corpus of spoken British English

  • To compare speech, thought and writing presentation in spoken and written English.

  • To build a new corpus of 260,000 words of spoken British English to compare with the ST&WP Written English Corpus (1995-99).

  • To investigate the presentation of speech, thought and writing in the ST&WP Spoken Corpus by tagging with the Leech and Short (1981) category set.

  • To further test and adapt the Leech and Short (1981) model of S&TP.

  • The project is funded until February 2003.


Construction of the corpus
Construction of the corpus corpus of spoken British English

  • 120 texts - approximately 260,000 words.

  • Texts rich in ST&WP taken from the British National Corpus (BNC) and the Centre for North West Regional Studies (CNWRS) oral history archives at Lancaster University.

  • CNWRS interview tapes digitised to be time-aligned with text.


Number and distribution of nwrs files in the corpus
Number and distribution of NWRS files in the corpus corpus of spoken British English

NWRS Archive

Family and Social Life Archive Childhood and Schooling Archive

Male Female Male Female

1890-1940 1940-1970 1890-1940 1940-1970

7 records 7 records 8 records 8 records 15 records 15 records

i.e. 60 files with an equal balance of male and female speakers in each age-range


Number and distribution of bnc files in the corpus
Number and distribution of BNC files in the corpus corpus of spoken British English

BNC spoken data

Spoken Demographic Spoken Context-

Governed

Male Female

0-14 15-24 25-34 35-44 45-59 60+ 0-14 15-24 25-34 35-44 45-59 60+

5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files 5 files

i.e. 60 files with an equal balance of male and female speakers in each age-range


The development of the tag set
The development of the tag-set corpus of spoken British English

Leech & Short (1981)

The ST&WP Written Project (1995…)

3 main genres: Fiction, Biography & Autobiography, and Newspaper Journalism: each divided into Serious/Popular sections.

embedded, hypothetical, inferred, quote


The development of the tag set new tags
The development of the tag-set corpus of spoken British English– new tags

The ST&WP Spoken Project (2001)

BNC spoken demographic data and NWRS oral history interviews

embedded, negative / absence,hypothetical, inferred, quote, reiterated, interrogative, imperative, uncompleted, 2 / 3 / 4


A 15 field tag set 5 main categories
A 15-field tag-set: 5 main categories corpus of spoken British English


A 15 field tag set 10 category attributes
A 15-field tag-set: 10 category attributes corpus of spoken British English


Issues arising
Issues arising corpus of spoken British English

  • Technical issues:

    • Legibility.

    • Comparability between NWRS and BNC data.

  • Tagging issues:

    • Comparability between written and spoken corpora.

    • What counts as ST&WP?

    • Functional and formal criteria.

    • Embedding.

    • Repetition (e.g. he said he said well he said).

    • Report of ‘mention’.

    • Reading, hearing, listening and singing dogs!


ad