1 / 38

SmartKom: Fusion and Fission of Speech, Gestures, and Facial Expressions

SmartKom: Fusion and Fission of Speech, Gestures, and Facial Expressions. International Workshop on Man-Machine Symbiotic Systems Kyoto, 26 November 2002, p. 213. Wolfgang Wahlster. German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3

mio
Download Presentation

SmartKom: Fusion and Fission of Speech, Gestures, and Facial Expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SmartKom: Fusion and Fission of Speech, Gestures, and Facial Expressions International Workshop on Man-Machine Symbiotic Systems Kyoto, 26 November 2002, p. 213 Wolfgang Wahlster German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49 681) 302-5341 e-mail: wahlster@dfki.de WWW:http://www.dfki.de/~wahlster

  2. SmartKom: Merging Various User Interface Paradigms Graphical User interfaces Gestural Interaction Spoken Dialogue Facial Expressions Biometrics Multimodal Interaction

  3. Symbolic and Subsymbolic Fusion of Multiple Modes Facial Expression Recognition Speech Recognition Prosody Recognition Gesture Recognition Lip Reading Subsymbolic Fusion Symbolic Fusion - Graph Unification - Bayesian Networks - Neuronal Networks - Hidden Markov Models Reference Resolution and Disambiguation Modality-Free Semantic Representation

  4. Outline of the Talk • Using all Human Senses for Symbiotic Man-Machine Interaction • SmartKom: Multimodal, Multilingual and Multidomain Dialogues • Modality Fusion in SmartKom • Multimodal Discourse Processing • 5. Plan-based Modality Fission in SmartKom • 6. Conclusions

  5. SmartKom: A Highly Portable Multimodal Dialogue System SmartKom-Mobile SmartKom-Public SmartKom-Home/Office Application Layer MM Dialogue Back- Bone Public: Cinema, Phone, Fax, Mail, Biometrics Mobile: Car and Pedestrian Navigation Home: Consumer Electronics EPG

  6. SmartKom: Intuitive Multimodal Interaction Project Budget: € 25.5 million, funded by BMBF (Dr. Reuse) and industry Project Duration: 4 years (September 1999 – September 2003) The SmartKom Consortium: Main Contractor Scientific Director W. Wahlster DFKI Saarbrücken MediaInterface Saarbrücken Berkeley Dresden European Media Lab Uinv. Of Munich Univ. of Stuttgart Heidelberg Univ. of Erlangen Munich Stuttgart Ulm Aachen

  7. SmartKom`s SDDP Interaction Metaphor Webservices Service 1 Personalized Interaction Agent User specifies goal delegates task Service 2 cooperate on problems asks questions presents results Service 3 SDDP = Situated Delegation-oriented Dialogue Paradigm Anthropomorphic Interface = Dialogue Partner See: Wahlster et al. 2001 , Eurospeech

  8. Multimodal Input and Output in the SmartKom System Where would you like to sit?

  9. Symbiotic Interaction with a Life-like Character I‘d like to reserve tickets for this performance. Where would you like to sit? I‘d like these two seats. Smartakus Output: Speech, Gesture and Facial Expressions User Input: Speech, Gesture, and Facial Expressions User Input: Speech, Gesture, and Facial Expressions

  10. Multimodal Input and Output in SmartKomFusion and Fission of Multiple Modalities Input by the User Output by the Presentation agent + + Speech + + Gesture Facial Expressions + +

  11. SmartKom‘s Data Collection of Multimodal Dialogs Face-tracking Camera with Microphone Bird’s-eye Camera SIVIT- Camera LCD Beamer Screen Face-tracking Camera Microphone Array User Projected Webpage Side-view Camera Loudspeaker User Microphone Array Environmental Noise

  12. Personalized Interaction with WebTVs via SmartKom (DFKI with Sony, Philips, Siemens) Example: Multimodal Access to Electronic Program Guides for TV User: Switch on the TV. Smartakus: Okay, the TV is on. User: Which channels are presenting the latest news right now? Smartakus: CNN and NTV are presenting news. User: Please record this news channel on a videotape. Smartakus: Okay, the VCR is now recording the selected program.

  13. Using Facial Expression Recognition forAffective Personalization Processing ironic or sarcastic comments (1) Smartakus: Here you see the CNN program for tonight. (2) User: That’s great.  • Smartakus: I’ll show you the program of another channel for tonight. • (2’) User: That’s great.  (3’) Smartakus: Which of these features do you want to see?

  14. Recognizing Affect: A Negative Facial Expression of the User neutral negative

  15. The SmartKom Demonstrator System Multimodal Control of TV-Set Camera for Gestural Input Multimodal Control of VCR/DVD Player Microphone Camera for Facial Analysis

  16. Combination of Speech and Gesture in SmartKom This one I would like to see. Where is it shown?

  17. Multimodal Input and Output in SmartKom Please show me where you would like to be seated.

  18. Getting Driving and Walking Directions via SmartKom SmartKom can be used for Multimodal Navigation Dialogues in a Car User: I want to drive to Heidelberg. Smartakus: Do you want to take the fastest or the shortest route? User: The fastest. Smartakus: Here you see a map with your route from Saarbrücken to Heidelberg.

  19. Getting Driving and Walking Directions via SmartKom Smartakus: You are now in Heidelberg. Here is a sightseeing map of Heidelberg. User: I would like to know more about this church! Smartakus: Here is some information about the St. Peter's Church. User: Could you please give me walking directions to this church? Smartakus: In this map, I have high-lighted your walking route.

  20. SmartKom: Multimodal Dialogues with a Hybrid Navigation System

  21. Salient Characteristics of SmartKom • Seamless integration and mutual disambiguation of multimodal input and output on semantic and pragmatic levels • Situated understanding of possibly imprecise, ambiguous, or incom- plete multimodal input • Context-sensitive interpretation of dialog interaction on the basis of dynamic discourse and context models • Adaptive generation of coordinated, cohesive and coherent multimodal presentations • Semi- or fully automatic completion of user-delegated tasks through the integration of information services • Intuitive personification of the system through a presentation agent

  22. The High-Level Control Flow of SmartKom

  23. SmartKom’s Multimodal Dialogue Back-Bone Communication Blackboards Data Flow Context Dependencies Analyzers • Speech • Gestures • Facial Expressions • Speech • Graphics • Gestures Generators Dialogue Manager Modality Fusion Discourse Modeling Action Planning Modality Fission External Services

  24. Unification of Scored Hypothesis Graphs for Modality Fusion in SmartKom Clause and Sentence Boundaries with Prosodic Scores Scored Hypotheses about the User‘s Emotional State Gesture Hypothesis Graph with Scores of Potential Reference Objects Word Hypothesis Graph with Acoustic Scores Modality Fusion Mutual Disambiguation Reduction of Uncertainty Intention Hypotheses Graph Intention Recognizer Selection of Most Likely Interpretation

  25. SmartKom‘s Computational Mechanisms for Modality Fusion and Fission M3L: Modality-Free Semantic Representation Ontological Inferences Modality Fission Modality Fusion Planning Unification Overlay Operations Constraint Propagation

  26. The Overlay Operation Versus the Unification Operation Nonmonotonic and noncommutative unification-like operation Inherit (non-conflicting) background information two sources of conflicts: conflicting atomic values overwrite background (old) with covering (new) type clash assimilatebackground to the type of covering; recursion Unification Overlay cf. J. Alexandersson, T. Becker 2001

  27. Overlay Operations Using the Discourse Model Augmentation and Validation compare with a number of previous discourse states: fill in consistent information compute a score for each hypothesis - background pair: Overlay (covering, background) Intention Hypothesis Lattice Covering: Background: Selected Augmented Hypothesis Sequence

  28. An Example of the Overlay Operation Films on TV tonight U: What films are shown on TV tonight? .... U: I‘d rather go to the movies. Generalisation and Specialisation Go to the movies

  29. Smartkom‘s Three-Tiered Discourse Model Domain Layer DomainObject2 DomainObject1 Discourse Layer DO2 DO10 DO11 DO12 DO1 DO3 DO9 . . . Modality Layer VO1 LO4 LO5 LO6 LO2 GO1 LO3 . . . . . . reserve ticket first list  heidelberg System: This [] is a list of films showing in Heidelberg. User: Please reserve a ticket for the first one. DO = Discourse Object, LO = Linguistic ObjectGO = Gestural Object, VO = Visual Object cf. M. Löckelt et. al. 2002, N. Pfleger 2002

  30. The High-Level Control Flow of SmartKom

  31. Smartakus is a Self-Animated Interface Agent Presentation Navigation Idle Time System State Smartakus uses body language to notify the user that it is waiting for his input, that it is listening to him, that it has problems to understand his input, or that it is trying hard to find an answer to his question.g

  32. Some Complex Behavioural Patterns of the Interaction Agent Smartakus

  33. M3L Representation of the Multimodal Discourse Context Blackboard with Presentation Context of the Previous Dialogue Turn <?xml version="1.0"?> <presentationContent> [...] <abstractPresentationContent> <movieTheater structId="pid1234”> <entityKey> cinema_17a </entityKey> <name> Europa </name> <geoCoordinate> <x> 225 </x> <y> 230 </y> </geoCoordinate> </movieTheater> </abstractPresentationContent> [...] <panelElement> <map structId="PM23"> <boundingShape> <leftTop> <x> 0.5542 </x> <y> 0.1950 </y> </leftTop> <rightBottom> <x> 0.9892 </x> <y> 0.7068 </y> </rightBottom> </boundingShape> <contentReference> pid1234 </contentReference> </map> </panelElement> [...] </presentationContent>

  34. M3L Specification of a Presentation Task <presentationTask> <subTask> <presentationGoal> <inform> ... </inform> <abstractPresentationContent> ... <result> <broadcast id="bc1"> <channel> <name>EuroSport</name> </channel> <beginTime> <time> <at>2000-12-05T14:00:00</at> </time> </beginTime> <endTime> <time> <at>2000-12-05T15:00:00</at> </time> </endTime> <avMedium> <title>Sport News</title> <avType>sport</avType> ... </abstractPresentationContent> <interactionMode>leanForward</interactionMode> <goalID>APGOAL3000</goalID> <source>generatorAction</source> <realizationType>GraphicsAndSpeech</realizationType>

  35. SmartKom‘s Presentation Planner The Presentation Planner generates aPresentation Plan by applying a set of Presentation Strategies to the Presentation Goal. GlobalPresent Present AddSmartakus .... DoLayout EvaluatePersonaNode ... PersonaAction ... Inform ... Speak SendScreenCommand Smartakus Actions TryToPresentTVOverview ShowTVOverview ShowTVOverview SetLayoutData ... SetLayoutData Generation of Layout ShowTVOverview GenerateText SetLayoutData ... SetLayoutData cf. J. Müller, P. Poller, V. Tschernomas 2002

  36. SmartKom‘s Use of Semantic Web Technology M3L high Content XML medium Structure HTML low Layout Three Layers of Annotations Personalized Presentation cf.: Dieter Fensel, James Hendler, Henry Liebermann, Wolfgang Wahlster (eds.) Spinning the Semantic Web, MIT Press, November 2002

  37. Conclusions • Various types of unification, overlay, constraint processing, planning and ontological inferences are the fundamental processes involved in SmartKom‘s modality fusion and fission components. • The key function of modality fusion is the reduction of the overall uncertainty and the mutual disambiguation of the various analysis results based on a three-tiered representation of multimodal discourse. • We have shown that a multimodal dialogue sytsem must not only understand and represent the user‘s input, but its own multimodal output.

  38. First International Conference on Perceptive &Multimodal User Interfaces (PMUI’03) November 5-7th, 2003 Delta Pinnacle Hotel, Vancouver, B.C., Canada Conference Chair Sharon Oviatt, Oregon Health & Science Univ., USA Program Chairs Wolfgang Wahlster, DFKI, Germany Mark Maybury, MITRE, USA PMUI’03 is sponsored by ACM, and will be co-located in Vancouver with ACM’s UIST’03. This meeting follows three successful Perceptive User Interface Workshops (with PUI’01 held in Florida) and three International Multimodal Interface Conferences initiated in Asia (with ICMI’02 held in Pittsburgh).

More Related