1 / 21

Media Coordination in SmartKom

This document provides an overview of the SmartKom Consortium project, which focuses on media coordination and fusion in multimodal interaction. It discusses the situated delegation-oriented dialog paradigm, media coordination issues, media processing, media fusion, and media design. The document concludes with a summary of the project and its implementation.

rowenae
Download Presentation

Media Coordination in SmartKom

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dagstuhl Seminar “Coordination and Fusion in Multimodal Interaction” Media Coordination in SmartKom Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz GmbH Stuhlsatzenhausweg 3, Geb. 43.1 - 66123 Saarbrücken Tel.: (0681) 302-5346 Email: bert@dfki.de www.smartkom.org www.dfki.de/~bert

  2. Overview • Situated Delegation-oriented Dialog Paradigm • More About the System Software • Media Coordination Issues • Media Processing: The Data Flow • Processing the User‘s State • Media Fusion • Media Design • Conclusion

  3. The SmartKom Consortium Project Budget: € 25.5 million Project Duration: 4 years (September 1999 – September 2003) Main Contractor DFKI Saarbrücken MediaInterface Saarbrücken Berkeley Dresden European Media Lab Uinv. Of Munich Univ. of Stuttgart Heidelberg Univ. of Erlangen Munich Stuttgart Ulm Aachen

  4. Situated Delegation-oriented Dialog Paradigm Smartakus IT Services Service 1 Personalized Interaction Agent User specifies goal delegates task Service 2 cooperate on problems asks questions Service 3 presents results

  5. More About the System

  6. More About the System • Modules realized as independent processes • Not all must be there (critical path: speech or graphic input to speech or graphic output) • (Mostly) independent from display size • Pool Communication Architecture (PCA) based on PVM for Linux and NT • Modules know about their I/O pools • Literature: • Andreas Klüter, Alassane Ndiaye, Heinz Kirchmann:Verbmobil From a Software Engineering Point of View: System Design and Software Integration. In Wolfgang Wahlster: Verbmobil - Foundation of Speech-To-Speech Translation. Springer, 2000. • Data exchanged using M3L documents C:\Documents and Settings\bert\Desktop\SmartKom-Systeminfo\index.html • All modules and pools are visualizedhere ...

  7. Media Coordination Issues • Input: • Speech • Words • Prosody: boundaries, stress, emotion • Mimics: neutral, anger • Gesture: • Touch free (scenario public) • Touch sensitive screen • Output: • Display objects • Speech • Agent: posture, gesture, lip movement

  8. Media Processing: The Data Flow User State Domain Information System State Mimics (Neutral or Anger) Speech Gesture Speech Agent‘s Posture and Behaviour Display Objects with ref ID and Location Prosody (emotion) Media Fusion Presentation (Media Design) Interaction Modeling Dialog-Core

  9. The Input/Output Modules

  10. Processing the User‘s State

  11. Processing the User‘s State • User state: neutral and anger • Recognized using mimics and prosody • In case of anger activate the dynamic help in the Dialog Core Engine • Elmar Nöth will hopefully tell you more about this in his talk Modeling the User State - The Role of Emotions

  12. Media Fusion

  13. Gesture Processing • Objects on the screen are tagged with IDs • Gesture input • Natural gestures recognized by SIVIT • Touch sensitive screen • Gesture recognition • Location • Type of gesture: pointing, tarrying, encircling • Gesture Analysis • Reference object in the display described as XML domain model (sub-)objects (M3L schemata) • Bounding box • Output: gesture lattice with hypotheses

  14. Speech Processing • Speech Recognizer produces word lattice • Prosody inserts boundary and stress information • Speech analysis creates intention hypotheses with markers for deictic expressions

  15. Media Fusion • Integrates gesture hypotheses in the intention hypotheses of speech analysis • Information restriction possible from both media • Possible but not necessary correspondence of gestures and placeholders (deictic expressions/ anaphora) in the intention hypothesis • Necessary: Time coordination of gesture and speech information • Time stamps in ALL M3L documents!! • Output: sequence of intention hypothesis

  16. Media Design (Media Fission)

  17. Media Design • Starts with action planning • Definition of an abstract presentation goal • Presentation planner: • Selects presentation, style, media, and agent‘s general behaviour • Activates natural language generator which activates the speech synthesis which returns audio data and time-stamped phoneme/viseme sequence • Character Animation realizes the agent‘s behaviour • Synchronized presentation of audio and visual information

  18. Lip Synchronization with Visemes • Goal: present a speech prompt as natural as possible • Viseme: elementary lip positions • Correspondence of visemes and phonemes • Examples:

  19. Behavioural Schemata • Goal: Smartakus is always active to signal the state of the system • Four main states • Wait for user‘s input • User‘s input • Processing • System presentation • Current body movements • 9 vital, 2 processing, 9 presentation (5 pointing, 2 movements, 2 face/mouth) • About 60 basic movements

  20. Conclusion • Three implemented systems (Public, Home, Mobile) • Media coordination implemented • „Backbone“ uses declarative knowledge sources and is rather flexible • Lot‘s remains to be done • Robustness • Complex speech expressions • Complex gestures (shape and timing) • Implementation of all user states • .... • Reuse of modules in other contexts, e.g. in MIAMM

More Related