1 / 15

Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents

Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents. S. Kawamoto, et al. October 27, 200 4. Agenda. Introduction Toolkit Design and Outline Speech recognition module Speech synthesis module Facial image synthesis module Agent manager Virtual machine model

inigo
Download Presentation

Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October27, 2004

  2. Agenda • Introduction • Toolkit Design and Outline • Speech recognition module • Speech synthesis module • Facial image synthesis module • Agent manager • Virtual machine model • Task manager • Prototyping tools • Prototype Systems • Conclusions

  3. Introduction • An anthropomorphic spoken dialog agent (ASDA) is one of the next-generation human-computer interfaces • Many ASDA systems have been developed, but developing a high-quality ASDA system is still challenging • An unlimited number of life-like agent characters having different faces and voices just like human • For this reason, Galatea has been developed to provide a platform to build next-generation ASDA systems

  4. Introduction Features of the Toolkit • Easy customization • Model-based approaches • Once the model parameters are trained, facial expressions and voice quality can be controlled easily • Key techniques for natural spoken dialog • Incremental speech recognition, synchronization between speech and facial animation, etc • Modularity of functional units • Simple architecture to manage each functional unit • User can develop, improve, debug, etc • Open-source free software

  5. Toolkit Design and Outline Works as an inter-module communication manager Adding a new module for the function and connecting the module to the agent manager Directly managed by the modules which utilize the devices

  6. Command Interpreter Request Response Grammar Transformer Grammar Speech Recognition Engine Speech input Toolkit Design and Outline Speech Recognition Module (SRM) • Major interfaces of SRM areas follows: • Outputs • Recognition result (XML format) • Engine status(“busy”, “waiting”, ... ) • Control command • Reload grammar, changethe settings of thespeech recognition engine • Grammar representation • Transforms the XML grammar into a format that is accepted by the speech recognition engine

  7. Command Interpreter Dictionary Text Analyzer Speech Output Acoustic Models Waveform Generation Engine Toolkit Design and Outline Speech Synthesis Module (SSM) • Accept arbitrary Japanesetexts • Synthesize speech with a human voice • HMM-based speechsynthesis method isemployed • Synchronizing the lip movement with speech • SSM can interrupt speech output to cope with any interruption by the user

  8. Toolkit Design and Outline Facial Image Synthesis Module (FSM) • Supports high-quality facial image synthesis, animation control, precise lip-sync with voice • GUI is equipped to fit a generic face wire frame model onto a full-face snapshot image • Facial action control • Mouth shape • Facial expression

  9. Toolkit Design and Outline Agent Manager (AM) • Integrator of all the modules of the ASDA system • Play a central role of communication • Synchronization manager between SSM and FSM to achieve the precise lip-sync Macro-command interpreter Dispatcher

  10. Toolkit Design and Outline Virtual Machine Model • Module interface is modeled as a machine with slots • Each slot is indicates machine status • Changing the slot values by a common command set • “set Speak = now” means starting voice synthesis of a given text immediately

  11. Toolkit Design and Outline Task Manager (TM) • Define the dialog as a set of interactions which can be represented by a dialog description language • Goal in developing the TM is that the system can use several types of dialog description languages • VoiceXML • High-level language, task-oriented information and the intentions of the participants • PDOC (primitive dialog operation commands) • Low-level language, device events and sequence control

  12. Design Scenario Interaction Builder Create XISL Document XISL File web site Download and Execute XISL Application Developer Check Galatea MMI System Toolkit Design and Outline Prototyping Tools • “Galatea Interaction Builder (IB)”

  13. Prototype Systems

  14. Prototype Systems Echo-back task

  15. Conclusions • A human-like spoken dialog agent is one of the promising man-machine interfaces for the next generation • Galatea is a software toolkit to develop a human-like spoken dialog agent • Because of the high modularity and simple communication architecture, it will speed up the research and application development based on ASDA

More Related