1 / 9

Nathan Imse Kelly Peterson

Using Emotion Recognition and Dialog Analysis to Detect Trouble in Communication in Spoken Dialog Systems. Nathan Imse Kelly Peterson. Goal -- Detecting Trouble. Problem in communication grow quickly when the system does not recognize that an error occurred

yul
Download Presentation

Nathan Imse Kelly Peterson

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Emotion Recognition and Dialog Analysis to Detect Trouble in Communication in Spoken Dialog Systems Nathan Imse Kelly Peterson

  2. Goal -- Detecting Trouble • Problem in communication grow quickly when the system does not recognize that an error occurred • Spikes in emotions like anger and frustration are highly correlated with problems in such systems. • In realistic data, 'pure' emotions may not be present, so we are looking at 'troubles in communication' (Batliner et al, 2003) • This is a binary distinction • Dialog patterns that break from the norm are also highly indicative of trouble; being able to detect those breaks would help in keeping dialogs from getting out of hand.

  3. Proposals • Our proposals :  • Module additions and enhancements • Architecture enhancements • Data for our analysis • DARPA Communicator 2000/2001 data (dialog acts) • Ang et al. (2002) annotations of emotion (anger, frustration, neutral)

  4. Acoustic Features • Features pertaining to the audio signal • pitch • intensity • duration • voice quality • Huge feature space • Feature selection/pruning is critical

  5. Linguistic Features (aka non-acoustic) • Pretty much any feature that isn't directly extracted from the audio signal • Usually based on text • ASR - easy/fast, but error-prone • Transcription - precise, but slow/expensive • cannot be done in real time • Shallow • ngrams, POS tagging, etc... • Deep • dialog acts, specialized grammars, etc...

  6. Module Proposals • Emotion Grammar => build numbers from statistical training • normalize for depth/complexity of sentences? • Dialog Acts => build a statistical model of dialog act sequences; use perplexity to detect problems • how far should the horizon be? • System Repetition => don't let the system repeat itself too many times • simple flag => set and forget; minimal computing power • classifier => captures exceptions; flexible

  7. Architecture starting point(Batliner et al, 2003)

  8. Architecture Proposal

  9. Dryas Monkey Approves! See? He's smiling!

More Related