1 / 23

Labelling Emotional User States in Speech: Where's the Problems, where's the Solutions?

Labelling Emotional User States in Speech: Where's the Problems, where's the Solutions?. Anton Batliner, Stefan Steidl University of Erlangen HUMAINE WP5-WS, Belfast, December 2004. Overview. decisions to be made mapping data onto labels the human factor and later on: again some mapping

skyler
Download Presentation

Labelling Emotional User States in Speech: Where's the Problems, where's the Solutions?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Labelling Emotional User States in Speech:Where's the Problems, where's the Solutions? Anton Batliner, Stefan SteidlUniversity of Erlangen HUMAINE WP5-WS, Belfast, December 2004

  2. Overview • decisions to be made • mapping data onto labels • the human factor • and later on: again some mapping • illustrations • the sparse data problem • new dimensions • new measure for labeller agreement • and afterwards: what to do with the data? • handling of databases • we proudly present: CEICES • some statements

  3. Mapping data onto labels I • catalogue of labels • data-driven (selection from "HUMAINE"-catalogue?) • should be a semi-open class • unit of annotation • word (phrase or turn) in speech • in video? • alignment of video time stamps with speech data necessary

  4. Mapping data onto labels II • categorical (hard/soft) labelling vs. dimensions • formal vs. functional labelling * • functional: holistic user states • formal: prosody, voice quality, syntax, lexicon, FAPs, ... • reference baseline • speaker-/user-specific • neutral phase at beginning of interaction • sliding window * emotion content vs. signs of emotion?

  5. The human factor I • expert labellers vs. naïve labellers • experts: • experienced, i.e. consistent • with (theoretical) bias • expensive • few • "naïve" labellers: • maybe less consistent • no bias, i.e., ground truth? • less expensive • more • representative data = many data = high effort • are there "bad" labellers? • does high interlabeller agreement really mean „good“ labellers?

  6. The human factor II:evaluation of annotations WP3 kappa etc. engineering past WP9

  7. and later on • mapping of labels onto cover classes • sparse data • classification performance • embedding into the application task • small number of alternatives • criteria? • dimensional labels adequate? * • human processing • system restrictions* cf. the story of 33 vs. 2 levels of accentuation

  8. The Sparse Data Problem • un-balanced distribution (Pareto?) • (too) few for robust training • down- or up-sampling necessary for testing • looking for "interesting" ("provocative"?) data: does this mean to beg the question?

  9. The Sparse Data Problem: Some Frequencies, word-based not neutral in %: - /23 (27/9.6) 10.3/4.6 15.4/8 with/without emph. • scenario-specific: ironic vs. motherese/reprimanding • emphatic: in-between • rare birds in AIBO: surprised, helpless, bored $ consensus labelling

  10. Towards New Dimensions • from categories to dimensions • confusion matrices = similarities Non-Metrical Multi-Dimensional Solution (NMDS)

  11. 11 emotional user state labels, data-driven,word-based * • joyful • surprised • motherese • neutral (default) • rest (wast-paper-basket, non-neutral) • bored • helpless, hesitant • emphatic (possibly indicating problems) • touchy (=irritated) • angry • reprimanding * effort: 10-15 times real-time

  12. confusion matrix: majority voting 3/5 vs. rest; if 2/2/1, both 2/2 as maj. vot. ("pre-emphasis") A T R J M E N W S B H Angry 43 13 12 00 00 1218 00 00 00 00 Touchy 0442 11 00 00 1323 00 00 02 00 Reprim. 03 15 45 00 01 1418 00 00 00 00 Joyful 00 00 01 54 02 07 32 00 00 00 00 Mother. 00 00 01 00 61 04 30 00 00 00 00 Emph. 01 05 06 00 01 5329 00 00 00 00 Neutral 00 02 01 00 02 1377 00 00 00 00 Wast-p. 00 07 06 00 08 192132 00 01 01 Surpr. 00 00 00 00 00 2040 00 40 00 00 Bored 00 14 01 00 01 1228 01 00 39 00 Helpl. 00 01 00 02 00 1237 03 00 00 41 R: reprimanding, W: waste paper basket category

  13. "traditional" emotional dimensions in feeltrace:VALENCE and AROUSAL

  14. NMDS: 2-dimensional solution with 7 labels,„relative“ majority with „pre-empasis“ = orientation ? = valence ?

  15. and back • from categories to dimensions • what about the way back? • automatic clustering? • thresholds • ....

  16. Towards New Quality Measures  Stefan Steidl Entropy-Based Evaluation of Decoders

  17. Handling of Databases • http://www.phonetik.uni-muenchen.de/Forschung/BITS/index.html • Publications • The Production of Speech Corpora (ISBN: 3-8330-0700-1) • The Validation of Speech Corpora (ISBN: 3-8330-0700-1) The Production of Speech Corpora Florian Schiel, Christoph Draxler Angela Baumann, Tania Ellbogen, Alexander Steffen Version 2.5 : June 1, 2004

  18. CEICES • Combining Efforts for Improving automatic Classification of Emotional user States, a "forced co-operation" initiative under the guidance of HUMAINE • evaluation of annotations • assessment of F0 extraction algorithms • assessment of impact of single feature (classes) • improvement of classification performance via sharing of features

  19. Ingredients of CEICES • speech data: German AIBO database • annotations: • functional, emotional user states, word-based • (prosodic peculiarities, word-based) • manually corrected • segment boundaries for words • F0 • specifications of Train/Vali/Test, etc. • reduction of effort: ASCII file sharing via portal • forced co-operation via agreement

  20. corr. F0 aut. F0 wordbound. pitch-lab.

  21. Agreement • open for non-HUMAINE partners • nominal fee for distribution and handling • commitments • to share labels and extracted feature values • to use specified sub-samples • expected outcome • assessment of F0 extraction, impact of features, ... • set of feature classes/vectors with evaluation • common publication(s)

  22. some statements • annotation has to be data-driven • there is no bad labellers • classification results have to be used for labelling assessment • automatic labelling is not good enough - or, maybe you should call it „extraction“ • each label type has to be mapped onto very few categorical classes at the end of the day

  23. Thank you for your attention

More Related