1 / 30

Speech User Interfaces

Speech User Interfaces. CS 160, Spring 2002 Professor James Landay February 20, 2002. UI Hall of Fame or Shame?. Dialog box ask if you want to delete. UI Hall of Shame!. Dialog box ask if you want to delete Problems? use of color problematic Yes (green), No (red) R-G color deficiency

dalmar
Download Presentation

Speech User Interfaces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech User Interfaces CS 160, Spring 2002 Professor James Landay February 20, 2002

  2. UI Hall of Fame or Shame? • Dialog box • ask if you want to delete

  3. UI Hall of Shame! • Dialog box • ask if you want to delete • Problems? • use of color problematic • Yes (green), No (red) • R-G color deficiency • cultural mismatch • Western • green good • red bad • Eastern & others differ

  4. Speech User Interfaces CS 160, Spring 2002 Professor James Landay February 20, 2002

  5. Outline • Review • Motivation for speech UIs • Speech recognition • UI problems with speech UIs • SpeechActs: Guidelines for speech UIs • Announcements • Speech UI design tools • Multimodal UIs

  6. Review • Why do we prototype? • get feedback on our design from customers – faster & cheaper • Why use low-fi prototypes? • traditional methods take too long & focus designers & customers on the wrong (visual) issues • What is the Wizard of Oz technique? • faking the interaction • What is the advantage of using informal tools like SILK, DENIM, & SUEDE? • advantages of electronic medium (editing, reuse, distribution, etc.) • faster than traditional UI tools • do not focus designers/customers on the wrong issues • ability to support testing & analysis of resulting data

  7. Information & Services I-Land vision by Streitz, et. al. Motivation for Speech UIs:Pervasive Information Access

  8. I-Land vision by Streitz, et. al. UIs in the Pervasive Computing Era • Future computing devices won’t have the same UI as current PCs • wide range of devices • small or embedded in environment • often w/ “alternative” I/O & w/o screens • information appliances

  9. Read my important email Information Access via Speech

  10. Speech UI Motivation • Smaller devices -> difficult I/O • people can talk at ~ 90 wpm -> high speed • “Virtually unlimited” set of commands • Freedom for other body parts • imagine you are working on your car & need to know something from the manual • Natural • evolutionarily selected for • reading, writing, & typing are not (too new)

  11. Why are Speech UIs Hard to Get Right? • Speech recognition far from perfect • imagine inputting commands w/ the mouse & getting the wrong result 5-20% of the time • Speech UIs have no visible state • can’t see what you have done before or what affect your commands have had • Speech UIs are hard to learn • how do you explore the interface? how do you find out what you can say?

  12. Speech UIs Require • Speech recognition • the computer understanding what the customer is saying • Speech production (or synthesis) • the computer talking to the customer

  13. Speech Recognition • Continuous vs. non-continuous • Speaker independent vs. dependent • Speech often misunderstood by people • feedback via speech, facial expressions, & gesture • Recognizers trained with real samples • often get gender-based problems • Based on probabilities (HMMs - Bayes) • trigrams of sounds or words • Several popular recognizers • Nuance, SpeechWorks, IBM ViaVoice

  14. Speech Production • Three frequency regions of great intensity visible on oscilloscope • come from larynx, throat, mouth • Two needed for recognition but “tinny” • Can generate emotion affect in speech • Demo • anger, disgust, gladness, sadness, fear, & surprise http://cahn.www.media.mit.edu/people/cahn/emot-speech.html

  15. Recognition Problems • Poor recognition • humans < 1% error rate on dictation • top recognition systems get 5-10% error rates • computers don’t use much context • Background noise • even worse recognition rates (20-40% error) • Slow • simple matter of hardware getting faster • in 10 years gone from 5 high-end workstations required to some speech systems running on laptops or even PDAs

  16. More Recognition Problems • Isolated, short words difficult • common words become short • Segmentation • silly versus sill lea • Spelling • mail vs. male -> need to understand language

  17. Speech UI Problems • Speech UI no-nos • modes (no feedback) • certain commands only work when in specific states • deep hierarchies (aka voice mail hell) • Verbose feedback wastes time/patience • only confirm consequential things • use meaningful, short cues • Interruption • half-duplex communication (i.e., no barge-in support) • Too much speech on the part of customer is tiring • Speech takes up space in working memory • can cause problems when problem solving

  18. SpeechActs: Guidelines for Speech UIs • Speech interface to computer tools • email, calendar, weather, stock quotes • Establish common ground & shared context • make sure people know where they are in the conversation • Pacing • recog. delays are unnatural, make it clear when this occurs • barge-in lets user interrupt like in real conversations • tapering of prompts • progressive assistance: short errors messages at first, longer when user needs more help • implicit confirmation: include confirm in next command

  19. SpeechActs Video

  20. Announcements • Task analysis / Contextual inquiry HW • average = 79/100, stdev. 8.4 • Low-fi user test due Monday • questions • If you haven’t gotten a laptop yet, check with Wai-ling after class

  21. SUEDE:Low-fi Prototyping for Speech-based UIs • Supports design practice • example scripts • Wizard of Oz • error simulation • iterative design (design-test-analysis) • Informal user interface • no speech recognition/synthesis • need not be programming expert • fast & fluid design

  22. machine prompt user response

  23. SUEDE Summary • SUEDE supports speech-based UI design • moving from concrete examples to abstractions • allows designer to accept responses that aren’t exactly what they originally had in mind • embeds iterative design w/ design-test-analyze • Designers using SUEDE need not be experts in speech recognition technology

  24. One Vision of Future User Interfaces • Star Trek style UI • verbally ask the computer for information • may be common in mobile/hands-busy situations • problem: hard to design, build, & use! • requires perfect speech recognition & language understanding

  25. Our Vision of Future User Interfaces • Multimodal, Context-aware UIs • multimodal • uses multiple input modalities (speech & gesture) to disambiguate • user says “move it to this screen” while pointing • context-aware • apps can be aware of location, user, what they are doing, … • people are talking -> don’t rely on speech I/O • Problem: how to prototype & test new ideas? • Informal UI Design Tools! • combine Wizard of Oz & informal storyboarding

  26. Multimodal Error Correction • Dictation error correction study • found users are better at correcting recognition errors with a different input modality • recognizer got it wrong the first time -> it will get it wrong the second time • hyperarticulating aggravates • Correct dictation errors with • vocal spelling, writing, typing, etc

  27. Summary • Speech UIs • may permit more natural computer access • allow us to use computers in more situations • are hard to get to work well • lack of visible state, tax working memory, recognition problems, etc. • UI tools are needed for speech UI design • Multimodal UIs address some of the problems with pure speech UIs • help disambiguate • help w/ correction

  28. Next Time • Web Design • Reading • The Limits of Speech Recognition by Schneiderman • Optional: Designing SpeechActs: Issues in Speech User Interfaces by Nicole Yankelovich, Gina-Anne Levow, & Matt Marx

More Related