1 / 21

Multimodal Information Analysis for Emotion Recognition

Multimodal Information Analysis for Emotion Recognition. (Tele Health Care Application) Malika Meghjani Gregory Dudek and Frank P. Ferrie. Content. Our Goal Motivation Proposed Approach Results Conclusion. Our Goal. Automatic emotion recognition using audio-visual information analysis.

fergus
Download Presentation

Multimodal Information Analysis for Emotion Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multimodal Information Analysis for Emotion Recognition (Tele Health Care Application) Malika Meghjani Gregory Dudek and Frank P. Ferrie

  2. Content • Our Goal • Motivation • Proposed Approach • Results • Conclusion

  3. Our Goal • Automatic emotion recognition using audio-visual information analysis. • Create video summaries by automatically labeling the emotions in a video sequence.

  4. Motivation • Map Emotional States of the Patient to Nursing Interventions. • Evaluate the role of Nursing Interventions for improvement in patient’s health. NURSING INTERVENTIONS

  5. Proposed Approach Visual Feature Extraction Visual based Emotion Classification Recognized Emotional State Data Fusion Decision Level Fusion Audio Feature Extraction Audio based Emotion Classification Feature Level Fusion

  6. Visual Analysis • Face Detection (Voila Jones Face Detector using Haar Wavelets and Boosting Algorithm) • Feature Extraction (Gabor Filter: 5 Spatial Frequencies and 4 orientations, 20 filter responses for each frame) • Feature Selection (Select the most discriminative features in all the emotional classes) • SVM Classification (Classification with probability estimates)

  7. Visual Feature Extraction = X ( 5 Frequencies X 4 Orientation)Frequency Domain Filters Feature Selection Automatic Emotion Classification

  8. Audio Analysis • Audio Pre-Processing (Remove leading and trailing edges) • Feature Extraction (Statistics of Pitch and Intensity contours and Mel Frequency Cepstral Coefficients) • Feature Normalization (Remove inter speaker variability) • SVM Classification (Classification with probability estimates)

  9. Audio Feature Extraction Audio Feature Extraction Automatic Emotion Classification Speech Rate PitchIntensitySpectrum Analysis Mel Frequency Cepstral Coefficients(Short-term Power Spectrum of Sound) Audio Signal

  10. SVM Classification

  11. Feature Selection • Feature selection method is similar to the SVM classification. (Wrapper Method) • Generates a separating plane by minimizing the weighted sum of distances of misclassified data points to two parallel planes. • Suppress as many components of the normal to the separating plane which provide consistent results for classification. Average Count of Error Features Distance Selected Bounding Planes

  12. Data Fusion • Decision Level: • Obtaining probability estimate for each emotional class using SVM margins. • The probability estimates from two modalities are multiplied and re-normalized to the give final estimation of decision level emotional classification. 2. Feature Level: Concatenate the Audio and Visual feature and repeat feature selection and SVM classification process.

  13. Database and Training Database: • Visual only posed database • Audio Visual posed database Training: • Audio segmentation based on minimum window required for feature extraction. • Corresponding visual key frame extraction in the segmented window. • Image based training and audio segment based training.

  14. Experimental Results

  15. Time Series Plot Surprise Sad Angry Disgust Happy 75% Leave One Subject Out Cross Validation Results ( * Cohen Kanade Database, Posed Visual only Database)

  16. Feature Level Fusion (*eNTERFACE 2005 , Posed Audio Visual Database)

  17. Decision Level Fusion (*eNTERFACE 2005 , Posed Audio Visual Database)

  18. Confusion Matrix

  19. Demo

  20. Conclusion Combining two modalities (Audio and Visual) improves overall recognition rates by 11% with Decision Level Fusion and by 6% with Feature Level Fusion Emotions where vision wins: Disgust, Happy and Surprise. Emotions where audio wins: Anger and Sadness Fear was equally well recognized by the two modalities. Automated multimodal emotion recognition is clearly effective.

  21. Things to do… • Inference based on temporal relation between instantaneous classifications. • Tests on natural audio-visual database (on-going).

More Related