Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents

Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October27, 2004

Agenda • Introduction • Toolkit Design and Outline • Speech recognition module • Speech synthesis module • Facial image synthesis module • Agent manager • Virtual machine model • Task manager • Prototyping tools • Prototype Systems • Conclusions

Introduction • An anthropomorphic spoken dialog agent (ASDA) is one of the next-generation human-computer interfaces • Many ASDA systems have been developed, but developing a high-quality ASDA system is still challenging • An unlimited number of life-like agent characters having different faces and voices just like human • For this reason, Galatea has been developed to provide a platform to build next-generation ASDA systems

Introduction Features of the Toolkit • Easy customization • Model-based approaches • Once the model parameters are trained, facial expressions and voice quality can be controlled easily • Key techniques for natural spoken dialog • Incremental speech recognition, synchronization between speech and facial animation, etc • Modularity of functional units • Simple architecture to manage each functional unit • User can develop, improve, debug, etc • Open-source free software

Toolkit Design and Outline Works as an inter-module communication manager Adding a new module for the function and connecting the module to the agent manager Directly managed by the modules which utilize the devices

Command Interpreter Request Response Grammar Transformer Grammar Speech Recognition Engine Speech input Toolkit Design and Outline Speech Recognition Module (SRM) • Major interfaces of SRM areas follows: • Outputs • Recognition result (XML format) • Engine status(“busy”, “waiting”, ... ) • Control command • Reload grammar, changethe settings of thespeech recognition engine • Grammar representation • Transforms the XML grammar into a format that is accepted by the speech recognition engine

Command Interpreter Dictionary Text Analyzer Speech Output Acoustic Models Waveform Generation Engine Toolkit Design and Outline Speech Synthesis Module (SSM) • Accept arbitrary Japanesetexts • Synthesize speech with a human voice • HMM-based speechsynthesis method isemployed • Synchronizing the lip movement with speech • SSM can interrupt speech output to cope with any interruption by the user

Toolkit Design and Outline Facial Image Synthesis Module (FSM) • Supports high-quality facial image synthesis, animation control, precise lip-sync with voice • GUI is equipped to fit a generic face wire frame model onto a full-face snapshot image • Facial action control • Mouth shape • Facial expression

Toolkit Design and Outline Agent Manager (AM) • Integrator of all the modules of the ASDA system • Play a central role of communication • Synchronization manager between SSM and FSM to achieve the precise lip-sync Macro-command interpreter Dispatcher

Toolkit Design and Outline Virtual Machine Model • Module interface is modeled as a machine with slots • Each slot is indicates machine status • Changing the slot values by a common command set • “set Speak = now” means starting voice synthesis of a given text immediately

Toolkit Design and Outline Task Manager (TM) • Define the dialog as a set of interactions which can be represented by a dialog description language • Goal in developing the TM is that the system can use several types of dialog description languages • VoiceXML • High-level language, task-oriented information and the intentions of the participants • PDOC (primitive dialog operation commands) • Low-level language, device events and sequence control

Design Scenario Interaction Builder Create XISL Document XISL File web site Download and Execute XISL Application Developer Check Galatea MMI System Toolkit Design and Outline Prototyping Tools • “Galatea Interaction Builder (IB)”

Prototype Systems

Prototype Systems Echo-back task

Conclusions • A human-like spoken dialog agent is one of the promising man-machine interfaces for the next generation • Galatea is a software toolkit to develop a human-like spoken dialog agent • Because of the high modularity and simple communication architecture, it will speed up the research and application development based on ASDA

Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents

Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents

Presentation Transcript

Open Source Genealogy Software

Classical Open Source Software Process Model

Management in Open Source Software Project

Open Source Software

Open Source Software For Education

Social impacts of ICT

Understanding the Requirements for Developing and Designing Open Source Software

Strategies for Developing and Deploying Free/Open Source Software

Open Source Tools and Proprietary Software

Detecting Misunderstandings in the CMU Communicator Spoken Dialog System

DARPA Communicator: The Development of Advanced Dialog Systems Using Open Source Software

Open Source vs. Proprietary Software

Evaluating Open Source Software ELAG 2009

Open Source Software For Education

“It’s Only Open Source…..”

Down With Oss

Open Source Software: A Case Study

Software Development Processes , Reuse and Knowledge Sources in Spoken Dialog System