Multimodal Interfaces Robust interaction where graphical user interfaces fear to tread

Multimodal InterfacesRobust interaction where graphical user interfaces fear to tread Philip R. Cohen Professor and Co-Director Center for Human-Computer Communication Oregon Health and Science Univ. http://www.cse.ogi.edu/CHCC and Natural Interaction Systems, LLC

Team Effort Co-PI: Sharon Oviatt Xiao Huang Ed Kaiser Sanjeev Kumar Rebecca Lunsford Richard Wesson Rajah Annamalai Alex Arthur Paulo Barthelmess Rachel Coulston Marisa Flecha-Garcia Multidisciplinary research

Multimodal Interaction • Use of one or more natural communication modalities—e.g. , Speech, gesture, sketch … • Advantages over GUI and unimodal systems • Easier to use; Less training • Robust, flexible • Preferred by users • Faster, more efficient • Supports new functionality • Applies to many different environments and form factors that challenge GUI, especially mobile ones

Potential Application Areas • Architecture and Design • Geographical Information Systems • Emergency Operations • Field-based Operations • Mobile Computing and Telecommunications • Virtual/Augmented Reality • Pervasive/Ubiquitous Computing • Computer-Supported Collaborative Work • Education • Entertainment

Challenges for multimodal interface design • More than 2 modes –e.g. spoken, gestural, facial expression, gaze; various sensors • Inputs are uncertain –vs. Keyboard/mouse • Corrupted by noise • Multiple people • Recognition is probabilistic • Meaning is ambiguous Design for uncertainty

Approach Gain robustness via • Fusion of inputs from multiple modalities • Using strengths of one mode to compensate for weaknesses of others—design time and run time • Avoiding/correcting errors • Statistical architecture • Confirmation • Dialogue context • Simplification of language in a multimodal context • Output affecting/channeling input

Demo Started with 50 & 100Mhz 486

Multimodal Architecture

Late MM Integration • Parallel recognizers and “understanders” • Time-stamped meaning fragments for each stream • Common framework for meaning representation – typed feature structures • Meaning fusion operation -- unification • Process for determining a joint interpretation (subject to semantic, and spatiotemporal constraints) • Statistical ranking • Flexible asynchronous architecture • Must handle unimodal and multimodal input

From speech (one of many hyp’s) “Evacuation route” Color: green Label: Evacuation route Object: Location: Color: green Label: Evacuationroute Line_obj [ ] Object: Line Create_line Line_obj From sketch Coordlist: [(95302,94360), (95305,94365)], …] Location: Coordlist: ISA Line [(95302,94360), (95305,94365)], …] Location: Create_line Line command [location: point[Xcoord: 95305,Ycoord: 94365 ]] command

MutualDisambiguation gesture object multimodal speech g1 o1 mm1 s1 mm2 g2 o2 s2 • Each input mode provides a set of scored recognition hypotheses s3 g3 o3 mm3 g4 mm4 • MD derives the best joint interpretation by unification of meaning representation fragments • PMM = αPS + βPG + C; learn α, β, and C over a multimodal corpus • MD stabilizes system performance in challenging environments

Benefits of mutual disambiguation Application RER Reference

Efficiency Benefits CPOF MM 16x faster (NIS) Lines & Areas

Demonstration CMU -- speech MIT – body tracking OHSU –multimodal fusion (speech + writing/sketch, 3D gesture) Stanford (NLP, dialogue)

Tangible Multimodal Systems for Safety-Critical Applications What’s Missing? A Division Command Post during an exercise McGee et al., CHI ‘02; Cohen & McGee, CACM’04

What they use

Many work practices rely on paper ATC -- Mackay ‘98 ICU -- Gorman et al., 2000

Why do they use paper? • Already know the interface • Poor computer interfaces • Fail-safe; robust to power outages • High resolution • Large/small scale • Cheap • Lightweight • Portable • Collaboration

Clinical Data Entry “Perhaps the single greatest challenge that has consistently confronted every clinical system developer is to engage clinicians in direct data entry” (IOM, 1997, p. 125) “To make it simple for the practitioner to interact with the record, data entry must be almost as easy as writing.” (IOM. 1997, p. 88)

Multimodal Interaction with Paper(NIS) Based on Anoto technology

Benefits • Most people (incl. kids, seniors) know how to use the pen • Portability (works over cell phone) • Ubiquity – paper is everywhere • Collaborative – multiple simult. pens • Next – use for note-taking, alone or in meetings; fuse with ongoing speech • Many new applications – e.g., architecture, engineering, education, field data capture

Elementary Science Education Sharon Oviatt

Quiet Interfaces that Help People Think Sharon Oviatt oviatt@cse.ogi.edu http://www.cse.ogi.edu/CHCC/

Multimodal Interfaces Robust interaction where graphical user interfaces fear to tread

Multimodal Interfaces Robust interaction where graphical user interfaces fear to tread

Presentation Transcript

Graphical User Interfaces

Graphical User Interfaces

Graphical User Interfaces

Graphical User Interfaces

Graphical User Interfaces

Graphical User Interfaces

Graphical User Interfaces

Graphical User Interfaces

GUIS: Graphical User Interfaces

Graphical user interfaces

Graphical User Interfaces I

Graphical User Interfaces

Programming graphical user interfaces

Graphical User Interfaces

Graphical User Interfaces

Graphical User Interfaces

Graphical User Interfaces (GUIs)

Graphical User Interfaces

Graphical User Interfaces