multimodal+emotion+recognition - PowerPoint PPT Presentation

multimodal emotion recognition n.
Skip this Video
Loading SlideShow in 5 Seconds..
multimodal+emotion+recognition PowerPoint Presentation
Download Presentation

play fullscreen
1 / 109
Download Presentation
Download Presentation


- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. multimodal+emotion+recognition a.k.a. ‘better than the sum of its parts’ Kostas Karpouzis Assoc. researcher ICCS/NTUA

  2. multimodal+emotion+recognition • Three very different (and interesting!) problems • What is ‘multimodal’, why do we need it, what do we earn from that? • What is ‘emotion’ in HCI applications? • What can we recognize and, better yet, what should we recognize?

  3. multimodal+emotion+recognition • In terms of R&D, emotion/affect-aware human-computer interaction is a hot topic • Novel, interesting application for existing algorithms • Demanding test bed for feature extraction and recognition tasks • …and just wait until we bring humans in the picture!

  4. multimodal+emotion+recognition • In terms of R&D, emotion/affect-aware human-computer interaction is a hot topic • Dedicated conferences (e.g. ACII, IVA, etc.) and planned journals • Humaine Network of Excellence  Humaine Association • • Integrated Projects (CALLAS, Companions, LIREC, Feelix Growing, etc.)

  5. yours truly • Associate researcher at ICCS/NTUA, Athens • Completed post-doc within Humaine • Signals to signs of emotion • Co-editor of Humaine Handbook • Member of the EC of the Humaine Association • Emotion modelling and development in Callas, Feelix Growing FP6 Projects

  6. what next • first we define ‘emotion’ • terminology • semantics and representations • computational models • emotion in interaction • emotion in natural interaction

  7. what next • then ‘multimodal’ • modalities related to emotion and interaction • fusing modalities (how?, why?) • handling uncertainty, noise, etc. • which features from each modality? • semantics of fusion

  8. what next • and ‘recognition’ • from individual modalities (uni-modal) • across modalities (multi-modal) • static vs. dynamic recognition • what can we recognize? • can we extend/enrich that? • context awareness

  9. what next • affect and emotion aware applications • can we benefit from knowing a user’s emotional state? • missing links • open research questions for the following years

  10. defining emotion

  11. terminology • Emotions, mood, personality • Can be distinguished by • time (short-term vs. long-term) • influence (unnoticed vs. dominant) • cause (specific vs. diffuse) • Affect classified by time • short-term: emotions (dominant, specific) • medium-term: moods (unnoticed, diffuse) • and long-term: personality (dominant)

  12. terminology • what we perceive is the expressed emotion at a given time • on top of a person’s current mood, which may change over time, but not drastically • and on top of their personality • usually considered a base line level • which may differ from what a person feels • e.g. we despise someone, but are forced to be polite

  13. terminology • Affect is an innately structured, non-cognitive evaluative sensation that may or may not register in consciousness • Feeling is defined as affect made conscious, possessing an evaluative capacity that is not only physiologically based, but that is often also psychologically oriented. • Emotion is psychosocially constructed, dramatized feeling

  14. how it all started • Charles Darwin, 1872 • Ekman et al. since the 60s • Mayer and Salovey, papers on emotional intelligence, 90s • Goleman’s book: Emotional Intelligence: Why It Can Matter More Than IQ • Picard’s book: Affective Computing, 1997

  15. why emotions? • “Shallow” improvement of subjective experience • Reason about emotions of others • To improve usability • Get a handle on another aspect of the "human world" • Affective user modeling • Basis for adaptation of software to users

  16. name that emotion • so, we know what we’re after • but we have to assign it a name • in which we all agree upon • and means the same thing for all (most?) of us • different emotion representations • different context • different applications • different conditions/environments

  17. emotion representations • most obvious: labels • people use them in everyday life • ‘happy’, ‘sad’, ‘ironic’, etc. • may be extended to include user states, e.g. ‘tired’, which are not emotions • CS people like them • good match for classification algorithms

  18. labels • but… • we have to agree on a finite set • if we don’t, we’ll have to change the structure of our neural nets with each new label • labels don’t work well with measurements • is ‘joy’ << ‘exhilaration’ and in what scale? • do scales mean the same to the expresser and all perceivers?

  19. labels • Ekman’s set is the most popular • ‘anger’, ‘disgust’, ‘fear’, ‘joy’, ‘sadness’, and ‘surprise’ • added ‘contempt’ in the process • Main difference to other sets of labels: • universally recognizable across cultures • when confronted with a smile, all people will recognize ‘joy’

  20. from labels to machine learning • when reading the claim that ‘there are six facial expressions recognized universally across cultures’… • …CS people misunderstood, causing a whole lot of issues that still dominate the field

  21. strike #1 • ‘we can only recognize these six expressions’ • as a result, all video databases used to contain images of sad, angry, happy or fearful people • a while later, the same authors discussed ‘contempt’ as a possible universal, but CS people weren’t listening

  22. strike #2 • ‘only these six expressions exist in human expressivity’ • as a result, more sad, angry, happy or fearful people, even when data involved HCI • can you really be afraid when using your computer?

  23. strike #3 • ‘we can only recognize extreme emotions’ • now, happy people grin, sad people cry or are scared to death when afraid • however, extreme emotions are scarce in everyday life • so, subtle emotions and additional labels were out of the picture

  24. labels are good, but… • don’t cover subtle emotions and natural expressivity • more emotions are available in everyday life and usually masked • hence the need for alternative emotion representations • can’t approach dynamics • can’t approach magnitude • extreme joy is not defined

  25. other sets of labels • Plutchik • Acceptance, anger, anticipation, disgust, joy, fear, sadness, surprise • Relation to adaptive biological processes • Frijda • Desire, happiness, interest, surprise, wonder, sorrow • Forms of action readiness • Izard • Anger, contempt, disgust, distress, fear, guilt, interest, joy, shame, surprise

  26. other sets of labels • James • Fear, grief, love, rage • Bodily involvement • McDougall • Anger, disgust, elation, fear, subjection, tender-emotion, wonder • Relation to instincts • Oatley and Johnson-Laird • Anger, disgust, anxiety, happiness, sadness • Do not require propositional content

  27. going 2D • vertical: activation (active/passive) • horiz.: evaluation (negative/positive)

  28. going 2D • emotions correspond to points in 2D space • evidence that some vector operations are valid, e.g. ‘fear’ + ‘sadness’ = ‘despair’

  29. going 2D • quadrants useful in some applications • e.g. need to detect extreme expressivity in a call-centre application

  30. going 3D • Plutchik adds another dimension • vertical  intensity, circle  degrees ofsimilarity • four pairs of opposites

  31. going 3D • Mehrabian considers pleasure, arousal and dominance • Again, emotions are points in space

  32. what about interaction? • these models describe the emotional state of the user • no insight as to what happened, why the user reacted and how the user will react • action selection • OCC (Ortony, Clore, Collins) • Scherer’s appraisal checks

  33. OCC (Ortony, Clore, Collins) • each event, agent and object has properties • used to predict the final outcome/expressed emotion/action

  34. OCC (Ortony, Clore, Collins)

  35. OCC (Ortony, Clore, Collins) • Appraisals • Assessments of events, actions, objects • Valence • Whether emotion is positive or negative • Arousal • Degree of physiological response • Generating appraisals • Domain-specific rules • Probability of impact on agent’s goals

  36. Scherer’s appraisal checks 2 theoretical approaches: • “Discrete emotions” (Ekman, 1992; Ekman & Frisen, 1975: EMFACS) • “Appraisal theory” of emotion (Scherer, 1984, 1992)

  37. Scherer’s appraisal checks • Componential Approach • Emotions are elicited by a cognitive evaluation of antecedent events. • Patterning of reactions are shaped by this appraisal process. Appraisal dimensions are used to evaluate stimulus, in an adaptive way to the changes. • Appraisal Dimensions: Evaluation of significance of event, coping potential, and compatibility with the social norms

  38. Autonomic responses contribute to the intensity of the emotional experience. Stimulus (Bang!) Stimulus (loud) General autonomic Arousal (heart races) Perception/ Interpretation Context (danger) Particular emotion experienced (fear) Emotion experienced will affect future interpretations Of stimuli and continuing autonomic arousal

  39. Scherer’s appraisal checks • 2 theories, 2 sets of predictions:the example of Anger

  40. summary on emotion • perceived emotions are usually short-lasting events across modalities • labels and dimensions are used to annotate perceived emotions • pros and cons for each • additional requirements for interactive applications

  41. multimodal interaction

  42. a definition • Raisamo, 1999 • “Multimodal interfaces combine many simultaneous input modalities and may present the information using synergistic representation of many different output modalities”

  43. Twofold view • A Human-Centered View • common in psychology • often considers human input channels, i.e., computer output modalities, and most often vision and hearing • applications: a talking head, audio-visual speech recognition, ... • A System-Centered View • common in computer science • a way to make computer systems more adaptable

  44. Twofold view

  45. going multimodal • ‘multimodal’ is this decade’s ‘affective’! • plethora of modalities available to capture and process • visual, aural, haptic… • ‘visual’ can be broken down to ‘facial expressivity’, ‘hand gesturing’, ‘body language’, etc. • ‘aural’ to ‘prosody’, ‘linguistic content’, etc.

  46. multimodal design Adapted from [Maybury and Wahlster, 1998]

  47. paradigms for multimodal user interfaces • Computer as a tool • multiple input modalities are used to enhance direct manipulation behavior of the system • the machine is a passive tool and tries to understand the user through all different input modalities that the system recognizes • the user is always responsible for initiating the operations • follows the principles of direct manipulation [Shneiderman, 1982; 1983]

  48. paradigms for multimodal user interfaces • Computer as a dialogue partner • the multiple modalities are used to increase the anthropomorphism in the user interface • multimodal output is important: talking heads and other human-like modalities • speech recognition is a common input modality in these systems • can often be described as an agent-based conversational user interface

  49. why multimodal? • well, why not? • recognition from traditional unimodal databases had reached its ceiling • new kinds of data available • what’s in it for me? • have recognition rates improved? • or just introduced more uncertain features

  50. essential reading • Communications of the ACM,Nov. 1999, Vol. 42, No. 11, pp. 74-81