1 / 24

Robust Recognition of Emotion from Speech

Robust Recognition of Emotion from Speech. Mohammed E. Hoque Mohammed Yeasin Max M. Louwerse {mhoque, myeasin, mlouwerse}@memphis.edu Institute for Intelligent Systems University of Memphis. Presentation Overview. Motivation Methods Database Results Conclusion. Motivations.

sharne
Download Presentation

Robust Recognition of Emotion from Speech

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Robust Recognition of Emotion from Speech Mohammed E. Hoque Mohammed Yeasin Max M. Louwerse {mhoque, myeasin, mlouwerse}@memphis.edu Institute for Intelligent Systems University of Memphis

  2. Presentation Overview • Motivation • Methods • Database • Results • Conclusion

  3. Motivations • Animated agents to recognize emotion in e-Learning environment. • Agents need to be sensitive and adaptive to learners’ emotion.

  4. Methods • Our method is partially motivated by the work of Lee and Naranyan [1], who first introduced the notion of salient words.

  5. Shortcomings of Lee and Narayan’s work Lee et al. argued that there is one-to-one correspondence between a word and a positive or negative emotion. This is NOT true for every case.

  6. Examples Confusion Flow Normal Delight Figure 1: Pictorial depiction of the word “okay” uttered with different intonations to express different emotions.

  7. More examples.. Scar!! Scar??

  8. More examples… Two months!! Two months??

  9. Our Hypothesis • Lexical information extracted from combined prosodic and acoustic features that correspond to intonation pattern of “salient words” will yield robust recognition of emotion from speech. • It also provides a framework for signal level analysis of speech for emotion.

  10. Creation of Database

  11. Details on the Database • 15 utterances were selected for four emotion categories: confusion/uncertain, delight, flow (confident, encouragement), and frustration [2]. • Utterances were stand-alone ambiguous expressions in conversations, dependent on the context. • Examples are “Great”, “Yes”, “Yeah”, “No”, “Ok”, “Good”, “Right”, “Really”, “What”, “God”.

  12. Details on the Database… • Three graduate students listened to the audio clips. • They successfully distinguished between the positive and negative emotions 65% of the time. • No specific instructions were given as to what intonation patterns to listen to.

  13. High Level Diagram Positive Word Level Utterances Feature Extraction Data Projection Classifiers Negative Figure 2. The high level description of the overall emotion recognition process.

  14. Emotion Positive Negative Delight Flow Confusion Frustration Hierarchical Classifiers Figure 3. The design of the hierarchical binary classifiers.

  15. Emotion Models using Lexical Information • Pitch: Minimum, maximum, mean, standard deviation, absolute value, quantile, ratio between voiced and unvoiced frames. • Duration: εtime εheight • Intensity: Minimum, maximum, mean, standard deviation, quantile. • Formant: First formant, second formant, third formant, fourth formant, fifth formant, second formant / first formant, third formant / first formant • Rhythm: Speaking rate.

  16. Duration Features Figure 4. Measures of F0 for computing parameters (εtime, εheight) which corresponds to rising and lowering of intonation. Inclusion of height and time accounts for possible low or high pitch accents.

  17. Types of Classifiers

  18. Shortcomings of Lee and Narayan’s work. (2004)

  19. Results

  20. Summary of Results

  21. 21 CLASSIFIERS ON POSITIVE AND NEGATIVE EMOTIONS.

  22. Limitations and Future work • Algorithm • Feature Selection • Discourse Information • Future efforts will include fusion of video and audio data in a signal level framework. • Database • Clipping arbitrary words from a conversation may be ineffective at various cases. • May need to look words in a sequence.

  23. Acknowledgments • This research was partially supported by grant NSF-IIS-0416128 awarded to the third author. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding institution.

  24. Questions?

More Related