Exploiting Word-level Features for Emotion Prediction
This presentation is the property of its rightful owner.
Sponsored Links
1 / 1

Why predict emotions? PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on
  • Presentation posted in: General

Exploiting Word-level Features for Emotion Prediction. Why predict emotions?. 1. Affective computing – direction for improving spoken dialogue systems Emotion detection (prediction) Emotion handling.

Download Presentation

Why predict emotions?

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Why predict emotions

Exploiting Word-level Features for Emotion Prediction

Why predict emotions?

1

  • Affective computing – direction for improving spoken dialogue systems

  • Emotion detection (prediction)

  • Emotion handling

Poster by Greg Nicholas. Adapted from paper by Greg Nicholas, Mihai Rotaru, & Diane Litman

Feature granularity levels

2

Detecting emotion: train a classifier on features extracted from user turns.

Types of features:

Turn Level

Word Level

Previous work uses mostly features computed over the entire turn.

[1] uses pitch features computed at the word-level

Amplitude

Approximations of pitch contours

Approximation of pitch contour

Lexical

Pitch

Duration

Offers a better approximation of the pitch contour (e.g. captures the big changes in uttering the word “great.”)

Efficient but offers a coarse approximation of the pitch contour.

We concentrate on Pitchfeatures to detect Uncertainty

3

4

Problems classifying the overall turn emotion

Techniques to solve this problem

  • Word-level is more complicated:

  • Label granularity mismatch: label at turn level, features at word level

  • Variable number of features per turn

  • Turn-level is simple:

  • Labeling granularity = turn

  • One set of features per turn

Technique 1: Word-level emotion model (WLEM)

Technique 2: Predefined subset of sub-turn units (PSSU)

Train: word-level model with turn’s emotion label

Predict: emotion label of each word

Combine: majority voting of predictions

Combine: Concatenate features from 3 words (first, middle, last) into a conglomerate feature set

Train & predict: turn-level model with turn’s emotion

Example student turn: “The force of the truck”

Turn-level speech:

The force of the truck

Turn-level speech:

The force of the truck

Word-level feature set:

Word-level feature set:

extract

extract

“the”

“force”

“of”

“the”

“truck”

“the”

“force”

“of”

“the”

“truck”

Word-level feature set:

(Five sets)

Turn-level feature set:

predict

(One set)

combine

……

Word-level predictions:

PSSU feature set:

“the”

“force”

“of”

“the”

“truck”

Non-uncertain

Uncertain

Non-uncertain

“the force of the truck”

… … …

Uncertain

Non-uncertain

predict

?

predict

“the”

“of”

“truck”

Overall turn prediction:

(One prediction)

Overall turn prediction:

Overall turn prediction:

Uncertain

(One prediction)

combine

Predict

Uncertain

Overall turn prediction:

Non-uncertain (3/5)

Non-uncertain

  • Issues:

  • Turn  Word level labeling assumption

  • Majority voting is a very simple scheme

  • Issues:

  • Might lose details from discarded words

Experimental Results

5

Recall/Precision

[1] showed that the WLEM method works better than turn-level

Comparison of recall and precision for predicting uncertain turns

Used in [2] at breath-group level but not at word level

Corpus

  • ITSPOKE dialogues

  • Domain: qualitative physics tutoring

  • Backend: WHY2-Atlas, Sphinx2 speech recognition, Cepstral text-to-speech

Future work

6

Overall prediction accuracy

  • Many alterations could further improve these techniques:

  • Annotate each individual word for certainty instead of whole turns

  • Include other features pictured above: lexical, amplitude, etc.

  • Try predicting in a human-human dialogue context

  • Better combination techniques (e.g. confidence weighting)

  • More selective choices for PSSU than the middle word of the turn (e.g. longest word in the turn, ensuring the word chosen has domain-specific content)

Corpus comparison with previous study [1]

Baseline: 77.79%

  • WLEM word-level slightly improves upon turn-level (+0.56%)

  • PSSU word-level show a much better improvement (+2.14%)

    • Overall, PSSU is best according to this metric as well

  • Turn-level: Medium recall/precision

  • WLEM: Best recall, lowest precision

    • Tends to over-generalize

  • PSSU: Good recall, best precision

    • Much less over-generalization, overall best choice

[1] M. Rotaru and D. Litman, "Using Word-level Pitch Features to Better Predict Student Emotions during Spoken Tutoring Dialogues," Proceedings of Interspeech, 2005.

[2] J. Liscombe, J. Hirschberg, and J. J. Venditti, "Detecting Certainness in Spoken Tutorial Dialogues,” Proceedings of Interspeech, 2005.


  • Login