Multimodal interaction for distributed interactive simulation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 47

Multimodal Interaction for Distributed Interactive Simulation PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on
  • Presentation posted in: General

Multimodal Interaction for Distributed Interactive Simulation. Philip R. Cohen, Michael Johnston, David McGee, Sharon Oviatt, Jay Pittman, Ira Smith, Liang Chen and Josh Clow Center for Human Computer Communication Oregon Graduate Institute of Science and Technology http://www.cse.ogi.edu/CHCC

Download Presentation

Multimodal Interaction for Distributed Interactive Simulation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Multimodal interaction for distributed interactive simulation

Multimodal Interaction forDistributed Interactive Simulation

Philip R. Cohen, Michael Johnston, David McGee, Sharon Oviatt, Jay Pittman, Ira Smith, Liang Chen and Josh Clow

Center for Human Computer Communication

Oregon Graduate Institute of Science and Technology

http://www.cse.ogi.edu/CHCC

Presenter: Keita Fujii


Overview

Overview

  • Background: Military simulation

    • LeatherNet

  • QuickSet: Multimodal interface for simulation

    • Architecture

    • Technical Issues

      • Gesture recognition

      • Multimodal integration

      • Agent infrastructure

    • Lessons learned


Background

Background

  • U.S. government is developing large-scale military simulation capabilities

    • >50,000 entities (e.g., a vehicle or a person) in a simulation

  • LeatherNet

    • Virtual simulation system for training platoon leaders and company commanders

    • Based on ModSAF (Modular Semi-Automated Forces) simulator

    • Support CommandVu

      • Wall-sized vertual reality


Required interface

Required Interface

  • Simulation interface should provide the following operations

    • Create entities

    • Supply their initial behavior

    • Interact with the entities

    • Review the results

  • Simulation interface should be

    • Multimodal: # of entities is large

    • On a portable size device: for mobility and affordability


Quickset

QuickSet

  • QuickSet

    • Multimodal interface for LeatherNet

    • Offers speech and pen-based gesture input

    • Runs on 3-lb hand-held PC

    • Based on Open Agent Architecture


Architecture

Architecture

ModSAF

Simulator

QuickSet

Interface

Multimodal

integration

agent

Simulation

agent

Speech

recognition

agent

CommandVu

agent

Natural

language

agent

Gesture

recognition

agent

Web

display

agent

Open Agent Architecture

CORBA

bridge

agent

Application

bridge

agent

CORBA Architecture


Architecture1

Architecture

ModSAF

Simulator

QuickSet

Interface

  • QuickSet interface

    • Draws map, icons, entities

    • Activates speech and gesture recognition agent when the pen is placed on the screen

  • Speech recognition agent

    • IBM’s VoiceType Application Factory

  • Gesture recognition agent

    • Analyzes pen input and gives N-best list of possible interpretations

Multimodal

integration

agent

Simulation

agent

Speech

recognition

agent

CommandVu

agent

Natural

language

agent

Gesture

recognition

agent

Web

display

agent

Open Agent Architecture

CORBA

bridge

agent

Application

bridge

agent

CORBA Architecture


Architecture2

Architecture

ModSAF

Simulator

QuickSet

Interface

  • Natural language agent

    • Analyzes a natural language input from speech recognition agent and produces typed feature structures

  • Multimodal integration agent

    • Accepts typed feature structures from the language agent and the gesture agent, unites those structures together and produces a multimodal interpretation

Multimodal

integration

agent

Multimodal

integration

agent

Simulation

agent

Speech

recognition

agent

CommandVu

agent

Natural

language

agent

Gesture

recognition

agent

Web

display

agent

Open Agent Architecture

CORBA

bridge

agent

Application

bridge

agent

CORBA Architecture


Architecture3

Architecture

Draw map and entities

ModSAF

Simulator

QuickSet

Interface

  • Simulation agent

    • Serves as the communication channel between OAA agents and ModSAF simulation system

  • CommandVu agent

    • Is also implemented as an agent so that the same multimodal interface (speech and gesture) can be used in CommandVu

Multimodal

integration

agent

Simulation

agent

Speech

recognition

agent

CommandVu

agent

Natural

language

agent

Gesture

recognition

agent

Web

display

agent

Open Agent Architecture

CORBA

bridge

agent

Application

bridge

agent

CORBA Architecture


Architecture4

Architecture

Draw map and entities

ModSAF

Simulator

QuickSet

Interface

  • Application bridge agent

    • Bridges APIs of the various applications, such as ModSAF, CommandVu

  • Web display agent

    • Allows a user to manipulate ModSAF simulation through Java applet on a WWW browser

  • CORBA bridge agent

    • Onverts OAA messages to CORBA IIOP/GIOP

Multimodal

integration

agent

Simulation

agent

Speech

recognition

agent

CommandVu

agent

Natural

language

agent

Gesture

recognition

agent

Web

display

agent

Open Agent Architecture

CORBA

bridge

agent

Application

bridge

agent

CORBA Architecture


Gesture recognition

Gesture recognition

  • QuickSet’s pen-based gesture recognizer

    • Consists of neural network and hidden Markov models

    • Combines results from two recognizers

      • To yield probabilities for each of the possible interpretations

Route:0.4Area:0.2Tank:0.01

Neural Net

Route:0.6Area:0.1Tank:0.01

combine

hidden Markov

model

Route:0.7Area:0.1Tank:0.1

Gesture


Multimodal integration

Multimodal integration

  • Based on unification operation over typed feature structures

    • If two pieces of partial information can be combined without loosing their consistency, combine them into a single result

Speech

recognition

operation draw_line

Line

operation draw_line

Line

(10,10)-(20,20)

Gesture

recognition

Point (10,10)

Line

(10,10)-(20,20)


Agent infrastructure

Goals

Agents

request

Agent infrastructure

  • Open Agent Architecture

    • All communication among the agents takes place through the facilitator agent

      • When an agent registers with the facilitator agent, it supplies a list of goals it can solve

      • Agents post goals to be solved to the facilitator agent

      • The facilitator agent forwards the goals to the agents that can solve them

    • Uses ICL (Interagent Communication Language)

      • Similar to KQML (Knowledge Query and Manipulation Language) and KIF (Knowledge Interchange Format)

Goals

Facilitator

Agents

register


Lessons learned

Lessons learned

  • Open Agent Architecture

    • Does not provide features for authentication or locking

      • Prevents one user’s speech from being combined with another user’s gesture

    • Does not support multi-thread

      • Cannot support a large number of users

    • Centralized architecture of facilitator agent

      • Not scalable

  • Multimodal interface

    • QuickSet proves that multimodal interaction offers the possibility of more robust recognition


Multimodal interaction for distributed interactive simulation

ANIMATED CONVERSATION:Rule-based Generation of Facial Expression, Gesture & Spoken Intonation for Multiple Conversational Agents

Justine Cassell, Catherine Pelachaud, Norman Badler, Mark Steedman, Brett Achorn, Tripp Becket, Brett Douville, Scott Prevost, Matthew Stone

Department of Computer & Information Science

University of Pennsylvania

Presenter: Keita Fujii


Overview1

Overview

  • Introduction

  • Background

    • Face expression

    • Hand gesture

  • System Architecture

    • Speech generation

    • Gesture Generation

    • Facial Expression Generation


Introduction

Introduction

  • This paper presents

    “automatically animating conversations between multiple human-like agents”

    • With speech, intonation, facial expressions, and hand gestures

    • Those expressions aresynthesized to make theagents look more realistic


Facial expression

Facial expression

  • Facial expression can perform

    • Syntactic functions

      • Accompanies the flow of speech

        • E.g., nodding the head, blinking

    • Semantic functions

      • Emphasizes a word

      • Substitutes for a word

      • Refers to an emotion

        • E.g., smiling and say “it is a NICE DAY.”

    • Dialogic functions

      • Regulate the flow of speech

        • Mutual gaze for smooth conversation turns


Hand gesture

Hand gesture

  • Hand gestures can be categorized into as

    • Iconics

      • Represents some feature of the word

        • E.g., make a rectangular while saying “a CHECK”

    • Metaphorics

      • Represents an abstract feature/concept

        • E.g., form a jaw-like shape with a hand and pull it while saying “I can WITHDRAW fifty dollars”

    • Deictics

      • Indicates a point in space

        • E.g., point to the ground and say “THIS bank”

    • Beats

      • Hand waves that occur with emphasized words etc

        • E.g., wave a hand while saying “all right”

  • Hand gestures, facial expressions, eye gaze and speech need to be synchronized


System architecture

System Architecture

Dialog Planner

World and Agent Model

Symbolic Gesture Specification

Symbolic Intonation Specification

Speech Synthesizer

Phoneme Timings

Gesture and Utterance Synchronization

Gesture PaT-Net

Facial PaT-Net

Movement Specification

Animation System

Sound

Graphic Output


Speech generation

Speech Generation

  • Dialog planner

    • Generates dialogs

      • Based on the common knowledge, the agent’s goal, and its believes

    • Dialog includes

      • The timing of the phonemes and pauses

      • The type and place of the accents

      • The type and place of the gestures

  • Speech Synthesizer

    • Generates sound data from the dialogs


Gesture generation

Gesture generation

  • Gesture is generated through three steps

    • Symbolic Gesture Specification

      • Decides what type of gesture to use for each word

    • PaT-Nets (Parallel Transition Networks)

      • Determines shape, position, transition and timing of gestures

    • Gesture Generator

      • Generates actual motion from the information sent by PaT-Nets


Symbolic gesture specification

Symbolic Gesture Specification

  • Determines the type of gesture

    • Words with literally spatial content (“check”) Iconic

    • Words with metaphorically spatial content (“account”) metaphoric

    • Words with physically spatializable content (“this bank”) deistic

    • Other new references  beat

    • Also based on the annotations from the dialog planner and classification of reference (new to speaker and listener, new to speaker but not to listener, or old)


Pet nets

PeT-Nets

  • PeT-Net is a finite state machine

    • Each state represents an action to be invoked

    • State transition is made either conditionally or probabilistically

    • Thus, a state transition generates a sequence of actions

      • Gesture PeT-Net generates gestures, Facial PeT-Net generates facial expressions

Send gestureinfo to gesture PaT-Net

Beat signaled

Send beatto beatPaT-Net

Gesture info complete

parsing

Gesture info found

Getgestureinfo


Coarticulation

Coarticulation

  • The structure of PeT-Net allows coarticulation

    • Two gestures occurs without intermediary relaxations

      • I.e., start the next gesture without waiting for the first one to finish

    • Coarticulation occurs when there is no sufficient time to finish a gesture

Finishgesture B

Finishgesture A

pausing

Startgesture A

Startgesture B


Gesture generator

Gesture Generator

  • The animation of a gesture is created as a combination of

    • Hand shape

    • Wrist control

    • Arm positioning

    • The system tries to get as close as possible to the gesture goals, but may fail because of coarticulation effects


Facial expression generation

Facial expression generation

  • Facial expression is generated through the same steps as gesture

    • Symbolic Facial Expression/Gaze Specification

      • Decides what type of expression to use for each word

    • Facial/Gaze PaT-Nets

      • Determines shape, position, transition and timing of gestures

    • Facial Expression/Gaze Generator

      • Generates actual motion from the information sent by PaT-Nets


Symbolic facial expression gaze specification

Symbolic Facial Expression/Gaze Specification

  • Symbolic Facial Expression Specification

    • Generates facial expressions connected to intonation

  • Symbolic Gaze Specification

    • Generates the following types of gaze expression

      • Planning

        • E.g., look away while organizing thought

      • Comment

        • E.g., look toward the listener when asking a question

      • Control

        • E.g., gaze at the listener when ending speech

      • Feedback

        • E.g., look toward the listener to obtain feedback


Pet nets1

PeT-Nets

  • Facial expression PeT-Nets

    • No information in paper

  • Gaze PeT-Net

    • Each node is characterized by a probability

      • An action of a node is invoked probabilistically

gaze

feedback

planning

Within turn

comment

control

Beginningof turn

Short turn

Back channel

accent

Utteranceanswer

End of turn

Turn request

Configuration

signal

Utterance

question


Facial expression gaze generator

Facial Expression/Gaze Generator

  • Facial expression generator

    • Classifies an expression into functional groups

      • Lip shape, conversational signal, punctuator, manipulator and emblem

    • Uses FACS

      • Represents an expression as a pair of timing and type

  • Gaze and head motion generator

    • Generates motion of eye and head

      • Based on the direction of gaze, timing, and duration


Direct manipulation vs interface agents

Direct ManipulationvsInterface Agents

Ben Shneiderman and Pattie Maes

Interactions, Nov. and Dec. 1997

Presenter: Keita Fujii


Introduction1

Introduction

  • This article is about a debate session in IUI* 97 and CHI** 97

  • Topic

    • Direct Manipulation vs Interface Agent

  • Speaker

    • Ben Shneiderman

      • From University of Maryland, Human-Computer Interaction Lab

      • Proponent of Direct Manipulation

    • Pattie Maes

      • MIT Media Laboratory

      • Proponent of Intelligent Agent

* Intelligent User Interface Workshop **Conference on Human Factors in Computing Systems


Overview2

Overview

  • Direct Manipulation

  • Software Agent

    • Benefits

    • Criticisms

    • Misconceptions

  • Objections to agent system

  • Agreement

  • Q & A


Direct manipulation ben shneiderman

Direct Manipulation(Ben Shneiderman)

  • User interface using information visualization techniques that provides

    • Overview

      • How much /what kind of information is in the system

    • Great control

      • E.g., zoom in, scroll, filter out

    • Predictability

      • User can expect what’s happing next

    • Detail-on-demand

  • Benefits

    • Reduce errors, and encourage exploration


Examples of direct manipulation

Examples of Direct Manipulation

  • FilmFinder

    • Organizes movies in 2D plane with years and popularity

  • Lifeline

    • Shows a case history graphically

  • Visible Human Explorer

    • Displays coronal section andcross sections of a human body


Software agent pattie maes

Software agent(Pattie Maes)

  • Software agent is the program that is

    • Personalized

      • Knows the individual user’s habits, preferences, and interests

    • Proactive

      • Provides or suggests information to user before being requested

    • Long-lived

      • Keeps running autonomously

    • Adaptive

      • Monitors the use’s interests as they change over time

    • Delegate

      • User can delegate some task to the agent

      • Agent acts on the user’s behalf


Examples of software agent

Examples of Software Agent

  • Letizia

    • Pre-loads web pages that the user may be interested in

  • Remembrance Agent

    • Remembers who sent email or whether email is replied

  • Firefly

    • Personal filters / personal critics

  • Yenta

    • Matchmaking agent

    • Introduces another user who shares the same interests


Benefits of software agent pattie maes

Benefits of software agent (Pattie Maes)

  • Software agents are necessary because

    • The computer system is getting more complex, unstructured, and dynamic

      • E.g., WWW

    • The users are becoming more naïve

      • End users are not trained to use computers

    • The number of tasks to be managed with computer are increasing

      • Some tasks need to be delegated to somebody


Criticisms of agents pattie maes

Criticisms of agents(Pattie Maes)

  • Well-designed interfaces are better

    • Even if the interface is perfect, you may just not want to delegate some tasks to somebody

  • Agents make the user dumb

    • Yes, it’s true. But as long as there’s always an agent available, it’s not a problem

  • Using agents implies giving up all control

    • You don’t have to have full control. As long as your task is satisfactorily done, that’s fine

    • However, the system must allow user to choose between direct manipulation and task delegation to the agent


Misconceptions about agent pattie maes

Misconceptions about agent (Pattie Maes)

  • Agents replaces user interface

  • Agents need to be personified or anthropomorphized

  • Agents need to rely on traditional AI

     They all are NOT true


Objections to agent system and responses to the objections both

Objections to Agent Systemand Responses to the Objections (Both)

  • “Agent” is not a realistic solution for making a good user interface because

    • Agent cannot be smart and fast enough to make some intelligent decision for human user

  • Direct manipulation is for

    • Professional users  not for end users

    • Very well structured and organized domain  not for ill structured and dynamic domain

  • Agent system can cooperate with Direct Manipulation

    • E.g., FilmFinder with agent making movie suggestions


Multimodal interaction for distributed interactive simulation

  • Anthropomorphic interfaces/representation are not appropriate

    • Agents do not have to be visible

  • There is no “agents” in the Firefly Web site

    • “Agent” has a broader meaning than “software agent,” so you need to distinguish different types of “agents”

      • Autonomous robots, synthetic characters, software agents etc


Direct manipulation agent system both

Visualization /

User Interface

Agent System

Direct manipulation & agent system(Both)

  • Agent is NOT an alternative but a complementary technique to direct manipulation (interface)

    • Agent system needs a good user interface that provides good understanding (overview) and control

    • Agent designer must pay attention to user-interface issues such as understanding and control

  • Two layer model

    • The user interface level

      • Predictable and controllable

    • The agent level

      • Adaptive, proactive system to increaseusability


Multimodal interaction for distributed interactive simulation

Q & A

  • Q. How do speech technologies affect direct manipulation and agent system?

    • A. Speech won’t be a generally usable tool because

      • It disrupt cognitive process

      • Low bandwidth communication

      • Ambiguous

    • A. Speech can be used as a supportive medium


Multimodal interaction for distributed interactive simulation

  • Q. How can user interface and/or agent system support time-critical decision-support environment where mistakes are critical?

    • Agent system is not suitable for such system because it is very hard to make agents that never make mistake


Multimodal interaction for distributed interactive simulation

  • Q. How can we build a direct manipulation system for vision challenged or blind users?

    • Direct manipulation can be used to make an interface for such users because direct manipulation depends on spatial relationships and blind users often are strong at spatial processing

  • Q. What is it about agents that you dislike? (to Ben Shneiderman)

    • “intelligent agent” notion avoids dealing with interface issues, but this will be changed


So where did they cheat

So, where did they cheat???


  • Login