1 / 12

Annotating the HKCSE Pragmatically

Annotating the HKCSE Pragmatically. Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies. mail: weissermar@gmail.com. web: martinweisser.org. Outline. The Conversion Process Pre-processing Requirements Annotation & Post-processing

jayme
Download Presentation

Annotating the HKCSE Pragmatically

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Annotating the HKCSE Pragmatically Martin WeisserVisiting ProfessorSchool of English and EducationGuangdong University of Foreign Studies mail: weissermar@gmail.com web: martinweisser.org

  2. Outline • The Conversion Process • Pre-processing Requirements • Annotation & Post-processing • Searching & Exploring the Corpus • Conclusion

  3. The Conversion Process I – Issues • how to convert to DART XML format? • identify original conventions • some documented in Cheng et al. (2008) • some undocumented  • use tone unit marking? • unfortunately tone units in Brazil’s system for ‘discourse intonation’ ≠ C-units  • → no ‘sentence’ intonation inferable directly • remove prosodic information, apart from stress and tone movements, to ensure readability • handle overlap • exact extent not marked or inferable  • → better to delete • etc.

  4. The Conversion Process II – Original Format

  5. The Conversion Process III – the Conversion Editor save output original input file conversion resultview conversion script editor

  6. The Conversion Process IV – Conversion Results converted to DART XML format retained stress marking converted & moved tone marking converted ‘non-speech’ to comments added gender attribute added speaker type attribute moved pauses to next turn

  7. Pre-processing Requirements • creating new resources in/for DART • adapt DART modules to handle mixed case • ‘synthesise’ domain-specific lexicon • create domain-specific topic ‘thesaurus’ • pre-processing • fix conversion errors • identify/mark incomplete words • split turns • add punctuation, partly based on original prosodic features • etc.

  8. Annotation & Post-processing I –Steps • annotation in DART • fully automated • less than 80 sec for • 24 files • ~72,100 words • ~10,300 C-Units • Post-processing to fix potential errors on the levels of • syntax: potentially missing syntax rules • pragmatics: missing inferencing rules or modes (‘IFIDs’) • semantics: incorrectly identified topics

  9. Annotation & Post-processing II –Annotation Result automatically split off DM identifiedsyntacticcategory annotated identifiablespeech acts

  10. Searching the Corpus • easily searchable viaDART • speech act stats hyperlinked to concordancer • formulaic patterns or disfluencies via n-grams • manual searches in concordancer for specific • speech acts • syntactic categories + speech acts • speech acts + speaker types • speech acts + gender • responses to questions • searches for specific tone features

  11. Conclusion • DART annotation enriches the HKCSE through • adding syntactic and pragmatic annotation • ability to analyse features based on (functional) C-units, rather than intonation units • new search options based on the above features

  12. References • Cheng, W. Greaves, C. and Warren, M. 2008. A Corpus-driven Study of Discourse Intonation: the Hong Kong Corpus of Spoken English (prosodic). Amsterdam/Philadelphia: John Benjamins. • Weisser, M. 2010. Annotating Dialogue Corpora Semi-Automatically: a Corpus-Linguistic Approach to Pragmatics. Unpublished Habilitation (professorial) thesis, University of Bayreuth. • Weisser, M. 2012; forthcoming 2014. Pragmatic annotation. In: Aijmer, K. & Rühlemann, C. (Eds.). Corpus Pragmatics: a Handbook. Cambridge: CUP. • Weisser, M. 2014. The DART Manual. • Weisser, M. (in progress). DART – the Dialogue Annotation and Research Tool.

More Related