wp3 speech and emotion analysis recognition n.
Skip this Video
Loading SlideShow in 5 Seconds..
WP3 speech and emotion (analysis & recognition) PowerPoint Presentation
Download Presentation
WP3 speech and emotion (analysis & recognition)

Loading in 2 Seconds...

play fullscreen
1 / 26

WP3 speech and emotion (analysis & recognition) - PowerPoint PPT Presentation

  • Uploaded on

human language technologies. hlt. WP3 speech and emotion (analysis & recognition). Databases and Annotations. UERLN: SYMPAFLY. Fully automatic speech dialogue telephone system for flight reservation and booking, different system stages; 270 Dialogues.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'WP3 speech and emotion (analysis & recognition)' - neviah

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
uerln sympafly
  • Fully automatic speech dialogue telephone system for flight reservation and booking, different system stages; 270 Dialogues.
  • Annotations: word-based emotional user states, prosodic and conversational peculiarities; dialogue (step) success; emotional user states  distribution follows nested Pareto (80/20) principle
uerln aibo
  • Children's interaction (age 10-12, 51 children, 9.2 hours of speech) with SONY’s AIBO robot, Wizard-of-Oz-scenario; cf. WP5 (plus English and read speech)
  • Annotations: word-based emotional user states (holistic, 5 labellers) and prosodic peculiarities; alignment of children's utterances with AIBO's actions; manual correction of F0, labelling of voice quality. Emotional user states for the English data.
aibo disobedient from motherese to angry
AIBO disobedient: frommotherese to angry

g'radeaus Aibolein ja M fein M gut M machst M du M *da M | *tz l"aufst du mal bitte nach links | stopp E Aibo stopp | nach links E umdrehen | nein M <*ne> nein M <*ne> nein M <*ne> so M weit M *simma M noch M nicht M aufstehen M Schlafm"utze M komm M hoch M | ja M so M ist M es M <*is> guter M Hund M lauf mal jetzt nach links | nach links Aibo | Aibolein M aufstehen M *son M sonst M werd' M ich M b"ose M hoch E | nach A links A | Aibo A nach A links A | Aibolein A ganz A b"oser A Hund A jetzt A stehst A du A auf A | hoch A | dreh dich ein bisschen | ja M so ist es <*is> gut stopp Aibo stopp | *tz lauf g'radeaus |

uerln different conceptualizations
UERLN: Different Conceptualizations

Remote control tool Pet dog

Straight on little Aibo ok great

You‘re doing fine now please

to the left stop Aibo stop turn

to the left no no no we aren´t that

far yet get up sleepyhead get up

yes that´s a good dog now go

left left Aibo little Aibo get up

else I´m getting angry get up

Aibo left little Aibobad boy now

get up turn a little ok that´s

fine stop Aibo stop straight on

Aibo straight on stop Aibo

stop turn round to the left

Aibo get up turn round

to the left Aibo get up turn

round, to the left Aibo

get up get up Aibo now go left now straight on Aibo st´ straight on

itc targhe
ITC: Targhe
  • Fully automatic speech dialogue telephone system
      • 15,6 hours of Italian natural speech
      • 9444 files (turns) -> 450 emotionally rich
    • Word-level
      • Orthographic transcription and word segmentation
      • Prosodic peculiarities annotated
    • Turn-level
      • Holistic emotion labels
  • Sympafly(cf. UERLN)
  • for comparison and benchmarking
uka ldc2002s28
UKA: LDC2002S28
  • Elicited emotional speech database; native American English
  • labels: 1 of 15 holistic speaker states per utterance; used in algorithm and feature set development
uka isl meeting corpus
UKA: ISL Meeting Corpus
  • 18 recordings of multi-party (mean 5.1 participants) meetings; mean 35 minute duration; American English
  • Annotations: orthographic transcription; Verbmobil II, and discourse-level annotations.
assessment of data collection
Assessment of Data Collection:
  • focus on
    • spontaneous, realistic data
    • important/new types of dialogues/interaction
    • evaluation of annotations
  • considerable percentage of realistic (processed and available) databases world-wide
uerln features
UERLN: Features
  • large feature vector for a context of  2 words:
    •  95 prosodic (duration, energy, F0, pauses)
    •  80 spectral (HNR, formant based frequencies and energy)
    • 24 MFCC
    •  30 POS
  • Language Models & dialogue based features
itc features
ITC: Features
  • Baseline feature set
    • 96 features
    • Based on energy, duration, and pitch
  • Final feature set
    • 273 features (many redundant)
    • Based on energy, duration, pitch, and pauses
    • Different pitch extractors tried
      • Normalized Cross Correlation
      • Weighted Auto Correlation
      • UERLN PDA
    • Different subsets compared
    • Different tests to reduce the feature space
      • Principal component analysis
uka 133 acoustic features
UKA: 133 Acoustic Features
  • pitch, unvoiced/unvoiced energy, quartiles (15)
  • voice quality, Praat metrics (11)
  • harmonicity, quartiles (5) and Praat metrics (3)
  • zero-crossing rate vs energy, histogram (20)
  • correlation/regression, coefficients (36)
  • vocal tract volume, quartiles (25)
  • duration/timing, verbmobil features (18)
  • UERLN: Linear Discriminant Analysis LDA, Decision Trees (CARTs), Neural Networks NN, Support Vector machines SVM, Gaussian Mixtures GM, Language Models LM
  • ITC: Decision Trees (CARTs), Neural Networks NN
  • UKA: Linear, Neural Networks NN, Support Vector machines SVM
uerln classification i sympafly
UERLN classification I: SympaFly

GM/NN, 2 classes, neutral vs. problem, l≠t

LDA, 4 classes

SVM/CART, 2 classes, loo

dialogue step success, 2 classes, SVM: CL 82.5

dialogue success, 2 classes, CART: CL 85.4

RR: overall rec. rate

CL: class-wise averaged rec. rate

uerln classification ii aibo
UERLN classification II: AIBO
  • joyful
  • surprised
  • motherese
  • neutral (default)
  • rest (non-neutral)
  • bored
  • helpless, hesitant
  • emphatic
  • touchy (=irritated)
  • angry
  • reprimanding

4 classes "AMEN", NN

itc classification ii
ITC Classification II:
  • Final feature set
    • 273 (acoustic/temporal) features
    • 2 class problem (neutral and non neutral)

RR = overall rec. rate; CL = class-wise averaged rec. rate

N = neutral turns; NN = Non neutral turns

uka classification ii
UKA Classification II:

133 utterance-level prosodic features, 15 classes,

acted speech, 8 speakers:

assessment of features
Assessment of Features
  • a pool of many different features/feature groups implemented/compared
  • prosodic features better (more consistent) than "spectral" features in realistic speech
  • combination of knowledge sources improves performance
  • relevance of single features (feature classes)?
assessment of classifications
Assessment of Classifications
  • not much difference between different classifiers in classification performance (linear classifiers highly competitive in speaker-independent classification)
  • large differences between speaker-dependent and speaker-independent classification
categories dimensions

Categories & Dimensions

cf. also tomorrow

uka meeting annotation
UKA: Meeting Annotation

Meeting audio appears to be rich in non-neutral speech.

Open-set holistic labeling of 5 meetings by 3 labellers

UKA: towards new Dimensions for Social Interaction in Meetings denoting conflict, bulding community, or skepticism etc.

weakpower strong

self support group

assessment of categories dimensions
Assessment of Categories & Dimensions
  • New categories, new dimensions, new consistency measure
  • prototypical "full-blown" emotions are rare
  • labels depend on type of data (call center, human-robot, different types of multi-party meeting)
  • new dimensions that do not model emotions but interaction between participants in communication
  • new entropy based consistency measure