1 / 23

Multi-Modal Dialogue in Personal Navigation Systems

Multi-Modal Dialogue in Personal Navigation Systems. Arthur Chan. Introduction . The term “multi-modal” General description of an application that could be operated in multiple input/output modes. E.g Input: voice, pen, gesture, face expression. Output: voice, graphical output.

ivanbritt
Download Presentation

Multi-Modal Dialogue in Personal Navigation Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan

  2. Introduction • The term “multi-modal” • General description of an application that could be operated in multiple input/output modes. • E.g • Input: voice, pen, gesture, face expression. • Output: voice, graphical output

  3. Multi-modal Dialogue (MMD) in Personal Navigation System • Motivation of this presentation • Navigation System provides MMD • an interesting scenario • a case why MMD is useful • Structure of this presentation • 3 system papers • AT&T MATCH • speech and pen input with pen gesture • Speechworks Walking Direction System • speech and stylus input • Univ. of Saarland REAL • Speech and pen input • Both GPS and a magnetic tracker were used.

  4. Multi-modal Language Processing for Mobile Information Access

  5. Overall Function • A working city guide and navigation system • Easy access restaurant and subway information • Runs on a Fujitsu pen computer • Users are free to • give speech command • draw on display with stylus

  6. Types of Inputs • Speech Input • “show cheap italian restaurants in chelsea” • Simultaneous Speech and Pen Input • Circle and area • Say “show cheap italian restaurants in neighborhood” at the same time. • Functionalities include • Review • Subway routine

  7. Input Overview • Speech Input • Use AT&T Watson speech recognition engine • Pen Input (electron Ink) • Allow usage of pen gesture. • It could be a complex, pen input • Use special aggregation techniques for all this gesture. • Inputs would be combined using lattice combination.

  8. Pen Gesture and Speech Input • For example: • U: “How do I get to this place?” • <user circled one of the restaurant displayed on the map> • S: “Where do you want to go from?” • U “25th St & 3rd Avenue” • <user writes 25th St & 3rd Avenue> • <System compute the shortest route >

  9. Summary • Interesting aspects of the system • Illustrate the real life scenario where multi-modal inputs could be used • Design issue: • how different inputs should be used together? • Algorithmic issue: • how different inputs should be combined together?

  10. Multi-modal Spoken Dialog with Wireless Devices

  11. Overview • Work by Speechworks • Jointly conducted by speech recognition and user interface folks • Two distinct elements • Speech recognition • In a embedded domain, which speech recognition paradigm should be used? • embedded speech recognition? • network speech recognition? • distributed speech recognition? • User interface • How to “situationlize” the application?

  12. Overall Function • Walking Directions Application • Assume user walking in an unknown city • Compaq iPAQ 3765 PocketPC • Users could • Select a city, start-end addresses • Display a map • Control the display • Display directions • Display interactive directions in the form of list of steps. • Accept speech input and stylus input • Not pen gesture.

  13. Choice of speech recognition paradigm • Embedded speech recognition • Only simple commands could be used due to computation limits. • Network speech recognition • Bandwidth is required • Sometimes network would be cut-off • Distributed speech recognition • Client takes care of front-end • Server takes care of decoding • <Issues: higher complexity of the code. >

  14. User Interface • Situationalization • Potential scenario • Sitting at a desk • Getting out of a cab, building, subway and preparing to walk somewhere • Walking somewhere with hands free • Walking somewhere carrying things • Driving somewhere in heavy traffic • Driving somewhere in light traffic • Being the passenger in a car • Being in highly noisy environment.

  15. Their conclusion • Balances of audio and visual information • Could be reduced to 4 complementary components • Single-modal • 1, Visual Mode • 2, Audio Mode • Multi-modal • 3, Visual dominant • 4, Visual dominant

  16. A glance of UI

  17. Summary • Interesting aspects • Great discussion on • how speech recognition could be used in an embedded domain • how the user would use the dialogue application

  18. Multi-modal Dialog in a Mobile Pedestrian Navigation System

  19. Overview • Pedestrian Navigation System • Two components: • IRREAL : indoor navigation system • Use magnetic tracker • ARREAL: outdoor navigation system • Use GPS

  20. Speech Input/Output • Speech Input: • HTK / IBM Viavoice embedded and Logox was being evaluated • Speech Output: • Festival

  21. Visual output • Both 2D and 3D spatialization supported

  22. Interesting aspects • Tailor the system for elderly people • Speaker clustering • to improve recognition rate for elderly people • Model selection • Choose from two models based on likelihood • Elderly models • Normal adult models

  23. Conclusion • Aspects of multi-modal dialogue • What kind of inputs should be used? • How speech and other inputs could be combined/interacted? • How users would use the system? • How the system should respond to the users?

More Related