multimodal+emotion+recognition

multimodal+emotion+recognition a.k.a. ‘better than the sum of its parts’ Kostas Karpouzis Assoc. researcher ICCS/NTUA http://www.image.ntua.gr

multimodal+emotion+recognition • Three very different (and interesting!) problems • What is ‘multimodal’, why do we need it, what do we earn from that? • What is ‘emotion’ in HCI applications? • What can we recognize and, better yet, what should we recognize?

multimodal+emotion+recognition • In terms of R&D, emotion/affect-aware human-computer interaction is a hot topic • Novel, interesting application for existing algorithms • Demanding test bed for feature extraction and recognition tasks • …and just wait until we bring humans in the picture!

multimodal+emotion+recognition • In terms of R&D, emotion/affect-aware human-computer interaction is a hot topic • Dedicated conferences (e.g. ACII, IVA, etc.) and planned journals • Humaine Network of Excellence  Humaine Association • http://emotion-research.net • Integrated Projects (CALLAS, Companions, LIREC, Feelix Growing, etc.)

yours truly • Associate researcher at ICCS/NTUA, Athens • Completed post-doc within Humaine • Signals to signs of emotion • Co-editor of Humaine Handbook • Member of the EC of the Humaine Association • Emotion modelling and development in Callas, Feelix Growing FP6 Projects

what next • first we define ‘emotion’ • terminology • semantics and representations • computational models • emotion in interaction • emotion in natural interaction

what next • then ‘multimodal’ • modalities related to emotion and interaction • fusing modalities (how?, why?) • handling uncertainty, noise, etc. • which features from each modality? • semantics of fusion

what next • and ‘recognition’ • from individual modalities (uni-modal) • across modalities (multi-modal) • static vs. dynamic recognition • what can we recognize? • can we extend/enrich that? • context awareness

what next • affect and emotion aware applications • can we benefit from knowing a user’s emotional state? • missing links • open research questions for the following years

defining emotion

terminology • Emotions, mood, personality • Can be distinguished by • time (short-term vs. long-term) • influence (unnoticed vs. dominant) • cause (specific vs. diffuse) • Affect classified by time • short-term: emotions (dominant, specific) • medium-term: moods (unnoticed, diffuse) • and long-term: personality (dominant)

terminology • what we perceive is the expressed emotion at a given time • on top of a person’s current mood, which may change over time, but not drastically • and on top of their personality • usually considered a base line level • which may differ from what a person feels • e.g. we despise someone, but are forced to be polite

terminology • Affect is an innately structured, non-cognitive evaluative sensation that may or may not register in consciousness • Feeling is defined as affect made conscious, possessing an evaluative capacity that is not only physiologically based, but that is often also psychologically oriented. • Emotion is psychosocially constructed, dramatized feeling

how it all started • Charles Darwin, 1872 • Ekman et al. since the 60s • Mayer and Salovey, papers on emotional intelligence, 90s • Goleman’s book: Emotional Intelligence: Why It Can Matter More Than IQ • Picard’s book: Affective Computing, 1997

why emotions? • “Shallow” improvement of subjective experience • Reason about emotions of others • To improve usability • Get a handle on another aspect of the "human world" • Affective user modeling • Basis for adaptation of software to users

name that emotion • so, we know what we’re after • but we have to assign it a name • in which we all agree upon • and means the same thing for all (most?) of us • different emotion representations • different context • different applications • different conditions/environments

emotion representations • most obvious: labels • people use them in everyday life • ‘happy’, ‘sad’, ‘ironic’, etc. • may be extended to include user states, e.g. ‘tired’, which are not emotions • CS people like them • good match for classification algorithms

labels • but… • we have to agree on a finite set • if we don’t, we’ll have to change the structure of our neural nets with each new label • labels don’t work well with measurements • is ‘joy’ << ‘exhilaration’ and in what scale? • do scales mean the same to the expresser and all perceivers?

labels • Ekman’s set is the most popular • ‘anger’, ‘disgust’, ‘fear’, ‘joy’, ‘sadness’, and ‘surprise’ • added ‘contempt’ in the process • Main difference to other sets of labels: • universally recognizable across cultures • when confronted with a smile, all people will recognize ‘joy’

from labels to machine learning • when reading the claim that ‘there are six facial expressions recognized universally across cultures’… • …CS people misunderstood, causing a whole lot of issues that still dominate the field

strike #1 • ‘we can only recognize these six expressions’ • as a result, all video databases used to contain images of sad, angry, happy or fearful people • a while later, the same authors discussed ‘contempt’ as a possible universal, but CS people weren’t listening

strike #2 • ‘only these six expressions exist in human expressivity’ • as a result, more sad, angry, happy or fearful people, even when data involved HCI • can you really be afraid when using your computer?

strike #3 • ‘we can only recognize extreme emotions’ • now, happy people grin, sad people cry or are scared to death when afraid • however, extreme emotions are scarce in everyday life • so, subtle emotions and additional labels were out of the picture

labels are good, but… • don’t cover subtle emotions and natural expressivity • more emotions are available in everyday life and usually masked • hence the need for alternative emotion representations • can’t approach dynamics • can’t approach magnitude • extreme joy is not defined

other sets of labels • Plutchik • Acceptance, anger, anticipation, disgust, joy, fear, sadness, surprise • Relation to adaptive biological processes • Frijda • Desire, happiness, interest, surprise, wonder, sorrow • Forms of action readiness • Izard • Anger, contempt, disgust, distress, fear, guilt, interest, joy, shame, surprise

other sets of labels • James • Fear, grief, love, rage • Bodily involvement • McDougall • Anger, disgust, elation, fear, subjection, tender-emotion, wonder • Relation to instincts • Oatley and Johnson-Laird • Anger, disgust, anxiety, happiness, sadness • Do not require propositional content

going 2D • vertical: activation (active/passive) • horiz.: evaluation (negative/positive)

going 2D • emotions correspond to points in 2D space • evidence that some vector operations are valid, e.g. ‘fear’ + ‘sadness’ = ‘despair’

going 2D • quadrants useful in some applications • e.g. need to detect extreme expressivity in a call-centre application

going 3D • Plutchik adds another dimension • vertical  intensity, circle  degrees ofsimilarity • four pairs of opposites

going 3D • Mehrabian considers pleasure, arousal and dominance • Again, emotions are points in space

what about interaction? • these models describe the emotional state of the user • no insight as to what happened, why the user reacted and how the user will react • action selection • OCC (Ortony, Clore, Collins) • Scherer’s appraisal checks

OCC (Ortony, Clore, Collins) • each event, agent and object has properties • used to predict the final outcome/expressed emotion/action

OCC (Ortony, Clore, Collins)

OCC (Ortony, Clore, Collins) • Appraisals • Assessments of events, actions, objects • Valence • Whether emotion is positive or negative • Arousal • Degree of physiological response • Generating appraisals • Domain-specific rules • Probability of impact on agent’s goals

Scherer’s appraisal checks 2 theoretical approaches: • “Discrete emotions” (Ekman, 1992; Ekman & Frisen, 1975: EMFACS) • “Appraisal theory” of emotion (Scherer, 1984, 1992)

Scherer’s appraisal checks • Componential Approach • Emotions are elicited by a cognitive evaluation of antecedent events. • Patterning of reactions are shaped by this appraisal process. Appraisal dimensions are used to evaluate stimulus, in an adaptive way to the changes. • Appraisal Dimensions: Evaluation of significance of event, coping potential, and compatibility with the social norms

Autonomic responses contribute to the intensity of the emotional experience. Stimulus (Bang!) Stimulus (loud) General autonomic Arousal (heart races) Perception/ Interpretation Context (danger) Particular emotion experienced (fear) Emotion experienced will affect future interpretations Of stimuli and continuing autonomic arousal

Scherer’s appraisal checks • 2 theories, 2 sets of predictions:the example of Anger

summary on emotion • perceived emotions are usually short-lasting events across modalities • labels and dimensions are used to annotate perceived emotions • pros and cons for each • additional requirements for interactive applications

multimodal interaction

a definition • Raisamo, 1999 • “Multimodal interfaces combine many simultaneous input modalities and may present the information using synergistic representation of many different output modalities”

Twofold view • A Human-Centered View • common in psychology • often considers human input channels, i.e., computer output modalities, and most often vision and hearing • applications: a talking head, audio-visual speech recognition, ... • A System-Centered View • common in computer science • a way to make computer systems more adaptable

Twofold view

going multimodal • ‘multimodal’ is this decade’s ‘affective’! • plethora of modalities available to capture and process • visual, aural, haptic… • ‘visual’ can be broken down to ‘facial expressivity’, ‘hand gesturing’, ‘body language’, etc. • ‘aural’ to ‘prosody’, ‘linguistic content’, etc.

multimodal design Adapted from [Maybury and Wahlster, 1998]

paradigms for multimodal user interfaces • Computer as a tool • multiple input modalities are used to enhance direct manipulation behavior of the system • the machine is a passive tool and tries to understand the user through all different input modalities that the system recognizes • the user is always responsible for initiating the operations • follows the principles of direct manipulation [Shneiderman, 1982; 1983]

paradigms for multimodal user interfaces • Computer as a dialogue partner • the multiple modalities are used to increase the anthropomorphism in the user interface • multimodal output is important: talking heads and other human-like modalities • speech recognition is a common input modality in these systems • can often be described as an agent-based conversational user interface

why multimodal? • well, why not? • recognition from traditional unimodal databases had reached its ceiling • new kinds of data available • what’s in it for me? • have recognition rates improved? • or just introduced more uncertain features

essential reading • Communications of the ACM,Nov. 1999, Vol. 42, No. 11, pp. 74-81

multimodal+emotion+recognition

multimodal+emotion+recognition

Presentation Transcript

Emotion Recognition from Physiological Measurement (Biosignal)

Automatic Facial Emotion Recognition

multimodal emotion recognition and expressivity analysis ICME 2005 Special Session

Human Social Interaction perspectives from neuroscience

Emotion Recognition using the GSR Signal on Android Devices

Multimodal Emotion Recognition

Multimodal Deep Learning

Emotion Recognition on the Basis of Peripheral Physiological Data

Multimodal Information Analysis for Emotion Recognition

Emotion Recognition from Electromyography and Skin Conductance

Class 11: Emotions and Cognition I

m ultimodal emotion recognition

PROGRESS ON EMOTION RECOGNITION

An Emotion Recognition Journey

Error Correction of Continuous Handwriting Recognition by Multimodal Fusion

Emotion - Outline

Emotion Detection and Recognition Market to 2025-Industry Analysis, Applications, Opportunities and Trends |The Insight

Emotion Detection and Recognition Market

Global Emotion Detection & Recognition Market to reach $29.2 billion by 2022

Emotion Recognition using Image Processing

Multimodal Interfaces