1 / 23

Multimodal Interfaces Robust interaction where graphical user interfaces fear to tread

Explore the use of natural communication modalities in interface design, providing advantages over GUI and unimodal systems for easier, faster, and more efficient interaction. Discover potential application areas and challenges for designing multimodal interfaces.

reavis
Download Presentation

Multimodal Interfaces Robust interaction where graphical user interfaces fear to tread

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multimodal InterfacesRobust interaction where graphical user interfaces fear to tread Philip R. Cohen Professor and Co-Director Center for Human-Computer Communication Oregon Health and Science Univ. http://www.cse.ogi.edu/CHCC and Natural Interaction Systems, LLC

  2. Team Effort Co-PI: Sharon Oviatt Xiao Huang Ed Kaiser Sanjeev Kumar Rebecca Lunsford Richard Wesson Rajah Annamalai Alex Arthur Paulo Barthelmess Rachel Coulston Marisa Flecha-Garcia Multidisciplinary research

  3. Multimodal Interaction • Use of one or more natural communication modalities—e.g. , Speech, gesture, sketch … • Advantages over GUI and unimodal systems • Easier to use; Less training • Robust, flexible • Preferred by users • Faster, more efficient • Supports new functionality • Applies to many different environments and form factors that challenge GUI, especially mobile ones

  4. Potential Application Areas • Architecture and Design • Geographical Information Systems • Emergency Operations • Field-based Operations • Mobile Computing and Telecommunications • Virtual/Augmented Reality • Pervasive/Ubiquitous Computing • Computer-Supported Collaborative Work • Education • Entertainment

  5. Challenges for multimodal interface design • More than 2 modes –e.g. spoken, gestural, facial expression, gaze; various sensors • Inputs are uncertain –vs. Keyboard/mouse • Corrupted by noise • Multiple people • Recognition is probabilistic • Meaning is ambiguous Design for uncertainty

  6. Approach Gain robustness via • Fusion of inputs from multiple modalities • Using strengths of one mode to compensate for weaknesses of others—design time and run time • Avoiding/correcting errors • Statistical architecture • Confirmation • Dialogue context • Simplification of language in a multimodal context • Output affecting/channeling input

  7. Demo Started with 50 & 100Mhz 486

  8. Multimodal Architecture

  9. Late MM Integration • Parallel recognizers and “understanders” • Time-stamped meaning fragments for each stream • Common framework for meaning representation – typed feature structures • Meaning fusion operation -- unification • Process for determining a joint interpretation (subject to semantic, and spatiotemporal constraints) • Statistical ranking • Flexible asynchronous architecture • Must handle unimodal and multimodal input

  10. From speech (one of many hyp’s) “Evacuation route” Color: green Label: Evacuation route Object: Location: Color: green Label: Evacuationroute Line_obj [ ] Object: Line Create_line Line_obj From sketch Coordlist: [(95302,94360), (95305,94365)], …] Location: Coordlist: ISA Line [(95302,94360), (95305,94365)], …] Location: Create_line Line command [location: point[Xcoord: 95305,Ycoord: 94365 ]] command

  11. MutualDisambiguation gesture object multimodal speech g1 o1 mm1 s1 mm2 g2 o2 s2 • Each input mode provides a set of scored recognition hypotheses s3 g3 o3 mm3 g4 mm4 • MD derives the best joint interpretation by unification of meaning representation fragments • PMM = αPS + βPG + C; learn α, β, and C over a multimodal corpus • MD stabilizes system performance in challenging environments

  12. Benefits of mutual disambiguation Application RER Reference

  13. Efficiency Benefits CPOF MM 16x faster (NIS) Lines & Areas

  14. Demonstration CMU -- speech MIT – body tracking OHSU –multimodal fusion (speech + writing/sketch, 3D gesture) Stanford (NLP, dialogue)

  15. Tangible Multimodal Systems for Safety-Critical Applications What’s Missing? A Division Command Post during an exercise McGee et al., CHI ‘02; Cohen & McGee, CACM’04

  16. What they use

  17. Many work practices rely on paper ATC -- Mackay ‘98 ICU -- Gorman et al., 2000

  18. Why do they use paper? • Already know the interface • Poor computer interfaces • Fail-safe; robust to power outages • High resolution • Large/small scale • Cheap • Lightweight • Portable • Collaboration

  19. Clinical Data Entry “Perhaps the single greatest challenge that has consistently confronted every clinical system developer is to engage clinicians in direct data entry” (IOM, 1997, p. 125) “To make it simple for the practitioner to interact with the record, data entry must be almost as easy as writing.” (IOM. 1997, p. 88)

  20. Multimodal Interaction with Paper(NIS) Based on Anoto technology

  21. Benefits • Most people (incl. kids, seniors) know how to use the pen • Portability (works over cell phone) • Ubiquity – paper is everywhere • Collaborative – multiple simult. pens • Next – use for note-taking, alone or in meetings; fuse with ongoing speech • Many new applications – e.g., architecture, engineering, education, field data capture

  22. Elementary Science Education Sharon Oviatt

  23. Quiet Interfaces that Help People Think Sharon Oviatt oviatt@cse.ogi.edu http://www.cse.ogi.edu/CHCC/

More Related