Tutorial

Tutorial Developing and Deploying Multimodal Applications James A. LarsonLarson Technical Servicesjim @ larson-tech.com SpeechTEK WestFebruary 23, 2007

Developing and Deploying Multimodal Applications • What applications should be multimodal? • What is the multimodal application development process? • What standard languages can be used to develop multimodal applications? • What standard platforms are available for multimodal applications? Developing & Delivering Multimodal Applications

Acoustic Microphone Speech Keypad Key Keyboard Pen Ink Tactile Mouse GUI Joystick Scanner Photograph Visual Still camera Movie Video camera Capturing Input from the User Input Device Medium Mode Developing & Delivering Multimodal Applications

Capturing Input From the User Multimodal Input Device Medium Mode Acoustic Microphone Speech Keypad Key Keyboard Pen Ink Tactile Mouse GUI Joystick Scanner Photograph Visual Still camera Gaze tracking Gesture reco Video camera RFID Digital data Electronic Biometric GPS Developing & Delivering Multimodal Applications

Presenting Output to the User Output Device Medium Mode Acoustic Speaker Speech Text Photograph Visual Display Movie Tactile Joystick Pressure Developing & Delivering Multimodal Applications

Presenting Output to the User Multimedia Output Device Medium Mode Acoustic Speaker Speech Text Photograph Visual Display Movie Tactile Joystick Pressure Developing & Delivering Multimodal Applications

Multimodal and Multimedia Application Benefits • Provide a natural user interface by using multiple channels for user interactions • Simplify interaction with small devices with limited keyboard and display, especially on portable devices • Leverage advantages of different modes in different contexts • Decrease error rates and time required to perform tasks • Increase accessibility of applications for special users • Enable new kinds of applications Developing & Delivering Multimodal Applications

Exercise 1 • What new multimodal applications would be useful for your work? • What new multimodal applications would be entertaining to you, your family, or friends? Developing & Delivering Multimodal Applications

Voice as a “Third Hand” • Game Commander 3 • http://www.gamecommander.com/ Developing & Delivering Multimodal Applications

Voice-Enabled Games • Scansoft’s VoCon Games Speech SDK • http://www.scansoft.com/games/ • PlayStation® 2 • Nintendo® GameCube™ • http://www.omnipage.com/games/poweredby/ Developing & Delivering Multimodal Applications

Education Tucker Maxon School of Oral Education http://www.tmos.org/ Developing & Delivering Multimodal Applications

Education Reading Tutor Project http://cslr.colorado.edu/beginweb/reading/reading.html Developing & Delivering Multimodal Applications

Multimodal Applications Developed by PSU and OHSU Students • Hands-busy • Troubleshooting a car’s motor • Repairing a leaky faucet • Tune musical instruments • Construction • Complex origami artifact Project book for children • Cooking—Talking recipe book • Entertainment • Child’s fairy tale book Audio-controlled juke box Games (Battleship, Go) Developing & Delivering Multimodal Applications

Multimodal Applications Developed by PSU and OHSU Students (continued) • Data collection • Buy a car Collect health data Buy movie tickets Order meals from a restaurant Conduct banking business Locate a business Order a computer Choose homeless pets from an animal shelter • Authoring Photo album tour • Education • Flash cards—Addition tables Download Opera and the speech plug-inGo to www.larson-tech.com/mm-Projects/Demos.htm Developing & Delivering Multimodal Applications

New Application Classes • Active listening • Verbal VCR controls: start, stop, fast forward, rewind, etc. • Virtual assistants • Listen for requests and immediately perform them • - Violin tuner - TV Controller - Environmental controller - Family-activity coordinator • Synthetic experiences • Synthetic interviews Speech-enabled games Education and training • Authoring content Developing & Delivering Multimodal Applications

Two General Uses of Multiple Modes of Input • Redundancy—One mode acts as backup for another mode • In noisy environments, use keypad instead of speech input. • In cold environments, use speech instead of keypad. • Complementary—One mode supplements another mode • Voice as a third hand • “Move that (point) to there (point)” (late fusion) • Lip reading = video + speech (early fusion) Developing & Delivering Multimodal Applications

Potential Problems with Multimodal Applications • Voice may make an application “noisy.” • Privacy and security concerns • Noise pollution • Sometimes speech and handwriting recognition systems fail. • False expectations of users wanting to use natural language. Developing & Delivering Multimodal Applications

Potential Problems with Multimodal Applications • Voice may make an application “noisy.” • Privacy and security concerns • Noise pollution • Sometimes speech and handwriting recognition systems fail. • False expectations of users wanting to use natural language. • Full natural language processing requires: • Knowledge of outside world • History of the user-computer interaction • Sophisticated understanding of language structure • “Natural language-like” simulates natural language for a small domain, short history, and specialized language structures Developing & Delivering Multimodal Applications

Potential Problems with Multimodal Applications • Voice may make an application “noisy.” • Privacy and security concerns • Noise pollution • Sometimes speech and handwriting recognition systems fail. • False expectations of users wanting to use natural language. Possible only on Star Trek • Full “natural language” processing requires: • Knowledge of outside world • History of the user-computer interaction • Sophisticated understanding of language structure • “Natural language-like” simulates natural language for a small domain, short history, and specialized language structures. Incorrectly called “NLP” Developing & Delivering Multimodal Applications

Adding a New Mode to an Application • Only if… • The new mode enables new features not previously possible. • The new modes dramatically improves the usability • Always…. • Redesign the application to take advantage of the new mode. • Provide backup for the new mode. • Test, test, and test some more. Developing & Delivering Multimodal Applications

Exercise 2 • Where will multimodal applications be used? • A. At home • B. At work • C. “On the road” • D. Other? Developing & Delivering Multimodal Applications

The Playbill—Who’s Who on the Team • Users—Their lives will be improved by using the multimodal application • Interaction designer—Designs the dialog—when and how the user and system interchange requests and information • Multimodal programmer—Implements VUI • Voice talent—Records spoken prompts and messages • Grammar writer—Specifies words and phrases the user may speak in response to a prompt • TTS specialist—Specifies verbal and audio sounds and inflections • Quality assurance specialist—Performs tests to validate the application is both useful and usable • Customer—Pays the bills • Program manager—Organizes the work and makes sure it is completed according to schedule and under budget Developing & Delivering Multimodal Applications

Development Process • Investigation Stage • Design Stage • Development Stage • Testing Stage • Sustaining Stage Each stage involves users Iterative refinement Developing & Delivering Multimodal Applications

Development Process • Investigation Stage • Design Stage • Development Stage • Testing Stage • Sustaining Stage • Identify the Application • Conduct ethnography studies • Identify candidate applications • Conduct focus groups • Select the application Developing & Delivering Multimodal Applications

Developing & Delivering Multimodal Applications

Exercise 3 • What will be the “killer” consumer multimodal applications? Developing & Delivering Multimodal Applications

Development Process • Investigation Stage • Design Stage • Development Stage • Testing Stage • Sustaining Stage • Specify the Application • Construct the conceptual model • Construct scenarios • Specify performance and preference requirements Developing & Delivering Multimodal Applications

Specify Performance and Preference Requirements Is the application useful? Is the application enjoyable? Performance Preference Measure users’ likes and dislikes. Measure what the users actually accomplished. Validate that the users enjoyed the application and will use it again again. Validate that the users achieved success. Developing & Delivering Multimodal Applications

User Task Measure Typical Criteria Speak a command Word error rate Less than 3% The caller supplies values into a form Enters valid values into each field of a form < 5 seconds per value Navigate a list The user successfully selects the specified option. Greater than 95% Purchase a product The user successfully completes the purchase option. Greater than 93% Performance Metrics Developing & Delivering Multimodal Applications

User Task Measure Typical Criteria Exercise 4 Specify performance metrics for the multimodal email application Developing & Delivering Multimodal Applications

Question Typical Criteria On a scale from 1 to 10, rate the help facility. The average caller score is greater than 8. On a scale from 1 to 10, rate the ease of use of this application. The average caller score is greater than 8. Would you recommend using this voice portal to a friend? Over 80% of callers respond by saying “yes.” What would you be willing to pay to each time you use this application? Over 80% of callers indicate that they are willing to pay $1.00 or more per use. Preference Metrics Developing & Delivering Multimodal Applications

Question Typical Criteria Exercise 5 Specify preference metrics for the multimodal email application Developing & Delivering Multimodal Applications

Preference Metrics (Open-ended Questions) • What did you like the best about this voice-enabled application? (Do not change these features.) • What did you like the least about this voice-enabled application? (Consider changing these features.) • What new features would you like to have added? (Consider adding these features in this or a later release.) • What features do you think you will never use? (Consider deleting these features.) • Do you have any other comments and suggestions? (Pay attention to these responses. Callers frequently suggest very useful ideas.) Developing & Delivering Multimodal Applications

Development Process • Investigation Stage • Design Stage • Development Stage • Testing Stage • Sustaining Stage • Develop the Application • Specify the persona • Specify the modes and modalities • Specify the dialog script Developing & Delivering Multimodal Applications

UI Design Guidelines • Guidelines for Voice User Interfaces • Bruce Balentine and David P. Morgan. How to Build a Speech Recognition Application, Second Edition. http://www.eiginc.com Guidelines for Graphical User Interfaces • Research-Based Web Design and Usability Guidelines. U.S. Department of Health and Human Services. http://www.usability.gov/pdfs/guidelines.html • Guidelines for Graphical User Interfaces • Common Sense Guidelines for Developing Multimodal User Interfaces.W3C Working Group Note. 19 April 2006 http://www.w3.org/2002/mmi/Group/2006/Guidelines/ Developing & Delivering Multimodal Applications

Common-sense Suggestions1. Satisfy Real-World Constraints • Task-oriented Guidelines • 1.1. Guideline: For each task, use the easiest mode available on the device. • Physical Guidelines • 1.2. Guideline: If the user’s hands are busy, then use speech. • 1.3. Guideline: If the user’s eyes are busy, then use speech. • 1.4. Guideline: If the user may be walking, use speech for input. • Environmental Guidelines • 1.5. Guideline: If the user may be in a noisy environment, then use a pen, keys or mouse. • 1.6. Guideline: If the user’s manual dexterity may be impaired, then use speech. Developing & Delivering Multimodal Applications

Exercise 6 • What input mode(s) should be used for each of the following tasks? • A. Selecting objects • B. Entering text • C. Entering symbols • D. Enter sketches or illustrations Developing & Delivering Multimodal Applications

Common-sense Suggestions2. Communicate Clearly, Concisely, and Consistently with Users • Consistency Guidelines • 2.1. Phrase all prompts consistently. • 2.2. Enable the user to speak keyword utterances rather than natural language sentences. • 2.3. Switch presentation modes only when the information is not easily presented in the current mode. • 2.4. Make commands consistent. • 2.5. Make the focus consistent across modes. • Organizational Guidelines • 2.6. Use audio to indicate the verbal structure. • 2.7. Use pauses to divide information into natural “chunks.” • 2.8. Use animation and sound to show transitions. • 2.9. Use voice navigation to reduce the number of screens. • 2.10. Synchronize multiple modalities appropriately. • 2.11. Keep the user interface as simple as possible. Developing & Delivering Multimodal Applications

Common-sense Suggestions3. Help Users Recover Quickly and Efficiently from Errors • Conversational Guidelines • 3.1. Users tend to use the same mode that was used to prompt them. • 3.2. If privacy is not a concern, use speech as output to provide commentary or help. • 3.3. Use directed user interfaces, unless the user is always knowledgeable and experienced in the domain. • 3.4 Always provide context-sensitive help for every field and command. Developing & Delivering Multimodal Applications

Common-sense Suggestions3. Help Users Recover Quickly and Efficiently from Errors (Continued) • Reliability Guidelines • Operational status • 3.5. The user always should be able to determine easily if the device is listening to the user. • 3.6. For devices with batteries, users always should be able to determine easily how much longer the device will be operational. • 3.8. Support at least two input modes so one input mode can be used when the other cannot. • Visual feedback • 3.8. Present words recognized by the speech recognition system on the display, so the user can verify they are correct. • 3.9. Display the n-best list to enable easy speech recognition error correction • 3.10. Try to keep response times less than 5 seconds. Inform the user of longer response times. Developing & Delivering Multimodal Applications

Common-sense Suggestions4. Make Users Comfortable • Listening mode • 4.1. Speak after pressing a speak key. which automatically releases after the user finishes speaking. • System Status • 4.2. Always present the current system status to the user. • Human-memory Constraints • 4.3. Use the screen to ease stress on the user’s short-term memory. Developing & Delivering Multimodal Applications

Common-sense Suggestions4. Make Users Comfortable (Continued) • Social Guidelines • 4.4. If the user may need privacy, use a display rather than render speech. • 4.5. If the user may need privacy, use a pen or keys. • 4.6. If the device may be used during a business meeting, then use a pen or keys (with the keyboard sounds turned off). • Advertising Guidelines • 4.7. Use animation and sound to attract the user’s attention. • 4.8. Use landmarks to help the know where he is. Developing & Delivering Multimodal Applications

Common-sense Suggestions4. Make Users Comfortable (continued) • Ambience • 4.9 Use audio and graphic design to set the mood and convey emotion in games and entertainment applications. • Accessibility • 4.10 For each traditional output technique, provide an alternative output technique. • 4.11. Enable users to adjust the output presentation. Developing & Delivering Multimodal Applications

Books • Ramon Lopez-Cozar Delgado and Masahiro Araki. Spoken, Multilingual and Multimodal Dialog Systems—Development and Assessment. West Sussex, England: Wiley, 2005. • Julie A. Jacko and Andrew Sears (Editors) The Human-Computer Interaction Handbook—Fundamentals, Evolving technologies, and Emerging Applications. Mahwah, New Jersey: Lawrence Erlbaum Associates, 2003. Developing & Delivering Multimodal Applications

Development Process • Investigation Stage • Design Stage • Development Stage • Testing Stage • Sustaining Stage • Test The Application • Component test • Usability test • Stress test • Field test Developing & Delivering Multimodal Applications

Testing Resources • Jeffrey Rubin. Handbook of Usability Testing. New York: Wiley Technical Communication Library, 1994. • Peter and David Leppik. Gourmet Customer Service. Eden Prairie, MN: VocalLabs, 2005. sales@vocalabs.com Developing & Delivering Multimodal Applications

Development Process • Investigation Stage • Design Stage • Development Stage • Testing Stage • Sustaining Stage • Deploy and Monitor the Application • User Survey • Usage reports from log files • User feedback and comments Developing & Delivering Multimodal Applications

W3C Multimodal Interaction Framework • Recognition Grammar • Semantic Interpretation • Extended Multimodal Annotation (EMMA) • Speech Synthesis • Interaction Managers General description of speech application components and how they relate Developing & Delivering Multimodal Applications

Tutorial

Tutorial

Presentation Transcript

Tutorial

Tutorial

Tutorial

Tutorial

Tutorial

TUTORIAL

Tutorial

Tutorial Tutorial

TUTORIAL

Tutorial

tutorial

TUTORIAL

Tutorial