C onveying A ffectiveness in L eading-edge L iving A daptive S ystems

ConveyingAffectivenessin Leading-edge Living AdaptiveSystems

The CALLAS Project http://www.callas-newmedia.eu Fiore Basile Metaware f.basile@metaware.it Diego Arnone Engineering Ingegneria Informatica diego.arnone@eng.it Glasgow, May 24th, 2007

Partners Engineering (ITA) VTT Electronics (FI) BBC (UK) MetaWare (ITA) Studio Azzurro (ITA) XIM (UK) Digital Video (ITA) Humanware (ITA) Nexture (ITA) University of Augsburg (DE) ICCS / NTUA (GR) University of Mons (BE) University of Teeside (UK) Helsinki Univ. Technology (FI) Paris 8 (FR) Scuola Normale Sup. (IT) University of Reading (UK) Fondazione Teatro Massimo (IT) HIT Laboratory (NZ) Glasgow, May 24th, 2007

Overview • Emotions and affectiveness: a fundamental element for a rich human-machine interaction and communication • Affective Interfaces are central to new media experience, and especially for entertainment: • Digital Theatre • Interactive TV • Augmented Reality Art • Interactive public performances Glasgow, May 24th, 2007

Objectives • Develop specific re-usable technologies for the multimodal processing of the emotional experience associated to Arts and Entertainment • Handle new and innovative categories of emotions, as well as improve the performance for existing modalities at the input level • Promote technology transfer for these results in particular towards SMEs (Small and Medium Enterprises) in the new media sector. Glasgow, May 24th, 2007

Working Areas • CALLAS Shelf • CALLAS Framework • CALLAS Showcases Glasgow, May 24th, 2007

The CALLAS Shelf

The CALLAS Shelf • It’s a set of components, each one processing one or more modalities, taking into account affective aspects of the interaction • Many input modalities are foreseen: speech, gesture, sound, facial expressions, sensors, etc. • All components aim to be reusable and capable of real-time processing • The Shelf components will be aggregated according to commonly used fusion patterns, and composed through a visual authoring environment Glasgow, May 24th, 2007

The CALLAS Shelf Input Shelf Components Glasgow, May 24th, 2007

FPMS I-Component University of Mons The word uttered and a corresponding emotion (active - passive) between a small list of Ekmanian and not Ekmanian emotions ESR Emotional Speech Recognition Speech Main Concrete Elements of Work (*) Emotion Robustness – proposing and implementing approaches for robust emotional speech recognition Emotion recognition – implementing speech-related features for supporting extraction of emotional information Glasgow, May 24th, 2007

VTT I-Components(1/3) VTT Electronics speech music constant noise environmental sound silence SCA Sound Capture and Analysis Audio streaming Main Concrete Elements of Work Sound Capture – designed for different situations and events in order to provide high quality audio capture Sound Analysis – in order to provide low and high level information from the audio input (e.g. mapping 6 the states in emotional states ) Glasgow, May 24th, 2007

VTT I-Components(2/3) VTT Electronics VFE Video Features Extraction Fast/slow Lot of/little amount Upper/lower body movement Video streaming on wide spaces Main Concrete Elements of Work Video Features – implementing and analysing a set of low level video features Audio Features– implementing and analysing a set of low level features to support in extracting contextual and emotional information Glasgow, May 24th, 2007

VTT I-Components(3/3) VTT Electronics S GBMT Gesture and Body Motion Tracking Sensors (*) Hand movement Video streaming Main Concrete Elements of Work Gesture recognition –data acquisition with different sensors and positions Emotion recognition - Outlining relevant emotion related gestures and body motions Glasgow, May 24th, 2007

UOA I-Components(1/2) University of Augsburg ESR-speech Emotional Speech Recognition (based on acoustic features) Emotional class recognition Audio files Main Concrete Elements of Work Emotion recognition – analysing the acoustic features in order to recognize emotional aspects Glasgow, May 24th, 2007

UOA I-Components(2/2) University of Augsburg Emotional class recognition: 2-9 emotional classes (*) ESR-linguistic Emotional Speech Recognition (based on linguistic features) Text files Main Concrete Elements of Work Implementation of a fusion model for linguistic and acoustic features Glasgow, May 24th, 2007

ICCS I-Components(1/4) ICCS/NTUA What the user is looking at: Up, down, right, left OR Degree of direction (by means of sensors) S GzR Gaze Recognition High resolution images of frontal faces and signals coming from many sensors Glasgow, May 24th, 2007

ICCS I-Components(2/4) ICCS/NTUA • Whissel’s quadrant or Expressions: • Neutral • Anger • Disgust • Fear • Sadness • Joy • Surprise R-bFER Rule-based Facial Expression Recognition Static images, Videos? FFD Facial Features Detection Coordinates of interest points of the face Glasgow, May 24th, 2007

ICCS I-Components(3/4) ICCS/NTUA GR Gesture Recognition One of six output states Representing recognized gestures Static images, Videos Coordinates of hands and head HHDT Hands and Head Detection and Tracking Glasgow, May 24th, 2007

ICCS I-Components(4/4) ICCS/NTUA GEA Gesture Expressivity Analysis Expressivity features for performed gestures Static images, Videos Coordinates of hands and head HHDT Hands and Head Detection and Tracking Glasgow, May 24th, 2007

ICCS I-Components • Main Concrete Elements of Work • Try to overcome issues introduced by imperfect recording conditions and personalized expressivity. • Enhance existing feature extraction with the introduction of measures of confidence on the feature values and the final emotion estimation. • Evaluate features with different emotional models according to specific application requirements on expected results. Glasgow, May 24th, 2007

Output Shelf Components

ICCS O-Component ICCS/NTUA An expressive model of user’s behaviour (by ECA) ES Expressivity Synthesis S Image sequences, sensors, history and personality details Glasgow, May 24th, 2007

UOA O-Component University of Augsburg NLG Natural Language Generation A text containing the desired utterance Attributes and values pairs describing the emotions Glasgow, May 24th, 2007

PAR8 O-Component Eye / head / gaze directions Virtual env. EA-ECA Emotional Attentive ECA APML expressive gestures • Main concrete elements of work • develop an ECA that is sensitive to and expressive • through aspects relating to emotion and attention Glasgow, May 24th, 2007

HIT O-Component ARToolKit is a software library for building Augmented Reality (AR) applications Main Concrete Elements of Work • Develop an interface for visual programming of AR applications • Add support for speech and gesture • Extend AR toolkit to include natural feature tracking Glasgow, May 24th, 2007

The CALLAS Framework

The CALLAS Framework Flexible application framework, based on the following approach: • Theory-neutral in terms of Modalities integration • Supporting the semantic processing of modalities • Supporting the development of applications featuring: • Blackboards • Custom pre-defined multimodal fusion models • Fusion of affective modalities Glasgow, May 24th, 2007

The CALLAS Framework • The CALLAS Framework aims to: • Ease the development of a specific kind of applications targeted to entertainment and arts • Ease the aggregation of shelf components into easy-to-reuse building blocks • Provide an intuitive metaphor suitable for non-technical users (mainly artists) willing to adapt and repurpose the CALLAS Showcases applications or their high-level components Glasgow, May 24th, 2007

The CALLAS Showcases

The CALLAS Showcases • The CALLAS Showcases are an on-going laboratory for experimenting with the fusion of affective modalities and their impact on digital arts • The target audience includes: digital arts, entertainment and digital theatre • They also serve as testbeds for the CALLAS Shelf components and the CALLAS Framework Glasgow, May 24th, 2007

The CALLAS Showcases • Augmented Reality for Art, Entertainment, and Digital Theatre • support the development of Augmented Reality Art installations in which user interaction is mediated by the detection of user emotions • demonstrate how real-time detection of the mood and the affective state of the people involved in a live performance (directors, actors, audience) can generate a new genre of “Digitally-enhanced performances”. • Interactive Installations for Public Spaces • Next-Generation Interactive Television Glasgow, May 24th, 2007

The CALLAS Showcases • Augmented Reality for Art, Entertainment, and Digital Theatre • Interactive Installations for Public Spaces • Explore how emotional states of members of a group can be conveyed to other members through mixed reality configurations and traces • Explore, in intensive group experiences, the implications of adding awareness of emotional states of remote or collocated members in addition to other contextual features. • Develop mixed reality applications combining sensors and user-controlled mechanisms • Next-Generation Interactive Television Glasgow, May 24th, 2007

The CALLAS Showcases • Augmented Reality for Art, Entertainment, and Digital Theatre • Interactive Installations for Public Spaces • Next-Generation Interactive Television • Develop the concept of affective Interactive TV • Based on the generation of affective content by ECA • ECA gets inputs from the broadcasted content and the user’s perceived viewing experience Glasgow, May 24th, 2007

Some examples.. Glasgow, May 24th, 2007

Emotional Tree (e-Tree) Dynamic growth, a function of perceived Affective relation ARToolkit Table top installation Multimodal Interaction: motion, non-verbal behaviour, interaction history, spoken comments … Artistic concept by Maurice Benayoun Glasgow, May 24th, 2007

Interactive TV Glasgow, May 24th, 2007

Technical Approach User Virtual spectator Interactive Story Emotional Speech Keyword spotting Affective categories Paralinguistic speech Non-verbal (body attitude) Interactive Storytelling Engine Multimodal Affective Analysis Glasgow, May 24th, 2007

Glasgow, May 24th, 2007

Collaboration ideas • CALLAS results will be very focused on Digital Arts and Entertainment • The project may become a viable channel for experimenting other project results in these sectors • CALLAS is open for collaboration both on the technical and evaluation sides Glasgow, May 24th, 2007

THANK YOU Fiore Basile, Metaware, Pisa, Italy Email: f.basile@metaware.it Diego Arnone, Engineering I. I., Roma, Italy Email: diego.arnone@eng.it Glasgow, May 24th, 2007

C onveying A ffectiveness in L eading-edge L iving A daptive S ystems