Tutorial
Download
1 / 131

Tutorial - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

Tutorial. Developing and Deploying Multimodal Applications James A. Larson Larson Technical Services jim @ larson-tech.com SpeechTEK West February 23, 2007 . Developing and Deploying Multimodal Applications. What applications should be multimodal?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Tutorial' - star


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Tutorial

Tutorial

Developing and Deploying Multimodal Applications

James A. LarsonLarson Technical Servicesjim @ larson-tech.com

SpeechTEK WestFebruary 23, 2007


Developing and deploying multimodal applications
Developing and Deploying Multimodal Applications

  • What applications should be multimodal?

  • What is the multimodal application development process?

  • What standard languages can be used to develop multimodal applications?

  • What standard platforms are available for multimodal applications?

Developing & Delivering Multimodal Applications


Capturing input from the user

Acoustic

Microphone

Speech

Keypad

Key

Keyboard

Pen

Ink

Tactile

Mouse

GUI

Joystick

Scanner

Photograph

Visual

Still camera

Movie

Video camera

Capturing Input from the User

Input Device

Medium

Mode

Developing & Delivering Multimodal Applications


Capturing input from the user1
Capturing Input From the User

Multimodal

Input Device

Medium

Mode

Acoustic

Microphone

Speech

Keypad

Key

Keyboard

Pen

Ink

Tactile

Mouse

GUI

Joystick

Scanner

Photograph

Visual

Still camera

Gaze tracking

Gesture reco

Video camera

RFID

Digital data

Electronic

Biometric

GPS

Developing & Delivering Multimodal Applications


Presenting output to the user
Presenting Output to the User

Output Device

Medium

Mode

Acoustic

Speaker

Speech

Text

Photograph

Visual

Display

Movie

Tactile

Joystick

Pressure

Developing & Delivering Multimodal Applications


Presenting output to the user1
Presenting Output to the User

Multimedia

Output Device

Medium

Mode

Acoustic

Speaker

Speech

Text

Photograph

Visual

Display

Movie

Tactile

Joystick

Pressure

Developing & Delivering Multimodal Applications


Multimodal and multimedia application benefits
Multimodal and Multimedia Application Benefits

  • Provide a natural user interface by using multiple channels for user interactions

  • Simplify interaction with small devices with limited keyboard and display, especially on portable devices

  • Leverage advantages of different modes in different contexts

  • Decrease error rates and time required to perform tasks

  • Increase accessibility of applications for special users

  • Enable new kinds of applications

Developing & Delivering Multimodal Applications


Exercise 1
Exercise 1

  • What new multimodal applications would be useful for your work?

  • What new multimodal applications would be entertaining to you, your family, or friends?

Developing & Delivering Multimodal Applications


Voice as a third hand
Voice as a “Third Hand”

  • Game Commander 3

    • http://www.gamecommander.com/

Developing & Delivering Multimodal Applications


Voice enabled games
Voice-Enabled Games

  • Scansoft’s VoCon Games Speech SDK

    • http://www.scansoft.com/games/

    • PlayStation® 2

    • Nintendo® GameCube™

    • http://www.omnipage.com/games/poweredby/

Developing & Delivering Multimodal Applications


Education
Education

Tucker Maxon School of Oral Education

http://www.tmos.org/

Developing & Delivering Multimodal Applications


Education1
Education

Reading Tutor Project

http://cslr.colorado.edu/beginweb/reading/reading.html

Developing & Delivering Multimodal Applications


Multimodal applications developed by psu and ohsu students
Multimodal Applications Developed by PSU and OHSU Students

  • Hands-busy

  • Troubleshooting a car’s motor

  • Repairing a leaky faucet

  • Tune musical instruments

  • Construction

  • Complex origami artifact Project book for children

  • Cooking—Talking recipe book

  • Entertainment

  • Child’s fairy tale book Audio-controlled juke box Games (Battleship, Go)

Developing & Delivering Multimodal Applications


Multimodal applications developed by psu and ohsu students continued
Multimodal Applications Developed by PSU and OHSU Students (continued)

  • Data collection

  • Buy a car Collect health data Buy movie tickets Order meals from a restaurant Conduct banking business Locate a business Order a computer Choose homeless pets from an animal shelter

  • Authoring Photo album tour

  • Education

  • Flash cards—Addition tables

Download Opera and the speech plug-inGo to www.larson-tech.com/mm-Projects/Demos.htm

Developing & Delivering Multimodal Applications


New application classes
New Application Classes (continued)

  • Active listening

  • Verbal VCR controls: start, stop, fast forward, rewind, etc.

  • Virtual assistants

  • Listen for requests and immediately perform them

  • - Violin tuner - TV Controller - Environmental controller - Family-activity coordinator

  • Synthetic experiences

  • Synthetic interviews Speech-enabled games Education and training

  • Authoring content

Developing & Delivering Multimodal Applications


Two general uses of multiple modes of input
Two General Uses of Multiple Modes of Input (continued)

  • Redundancy—One mode acts as backup for another mode

  • In noisy environments, use keypad instead of speech input.

  • In cold environments, use speech instead of keypad.

  • Complementary—One mode supplements another mode

  • Voice as a third hand

  • “Move that (point) to there (point)” (late fusion)

  • Lip reading = video + speech (early fusion)

Developing & Delivering Multimodal Applications


Potential problems with multimodal applications
Potential Problems with Multimodal Applications (continued)

  • Voice may make an application “noisy.”

    • Privacy and security concerns

    • Noise pollution

  • Sometimes speech and handwriting recognition systems fail.

  • False expectations of users wanting to use natural language.

Developing & Delivering Multimodal Applications


Potential problems with multimodal applications1
Potential Problems with Multimodal Applications (continued)

  • Voice may make an application “noisy.”

    • Privacy and security concerns

    • Noise pollution

  • Sometimes speech and handwriting recognition systems fail.

  • False expectations of users wanting to use natural language.

  • Full natural language processing requires:

  • Knowledge of outside world

  • History of the user-computer interaction

  • Sophisticated understanding of language structure

  • “Natural language-like” simulates natural language for a small domain, short history, and specialized language structures

Developing & Delivering Multimodal Applications


Potential problems with multimodal applications2
Potential Problems with Multimodal Applications (continued)

  • Voice may make an application “noisy.”

    • Privacy and security concerns

    • Noise pollution

  • Sometimes speech and handwriting recognition systems fail.

  • False expectations of users wanting to use natural language.

Possible only

on Star Trek

  • Full “natural language” processing requires:

  • Knowledge of outside world

  • History of the user-computer interaction

  • Sophisticated understanding of language structure

  • “Natural language-like” simulates natural language for a small domain, short history, and specialized language structures.

Incorrectly

called “NLP”

Developing & Delivering Multimodal Applications


Adding a new mode to an application
Adding a New Mode to an Application (continued)

  • Only if…

  • The new mode enables new features not previously possible.

  • The new modes dramatically improves the usability

  • Always….

  • Redesign the application to take advantage of the new mode.

  • Provide backup for the new mode.

  • Test, test, and test some more.

Developing & Delivering Multimodal Applications


Exercise 2
Exercise 2 (continued)

  • Where will multimodal applications be used?

  • A. At home

  • B. At work

  • C. “On the road”

  • D. Other?

Developing & Delivering Multimodal Applications


Developing and deploying multimodal applications1
Developing and Deploying Multimodal Applications (continued)

  • What applications should be multimodal?

  • What is the multimodal application development process?

  • What standard languages can be used to develop multimodal applications?

  • What standard platforms are available for multimodal applications?

Developing & Delivering Multimodal Applications


The playbill who s who on the team
The Playbill— (continued)Who’s Who on the Team 

  • Users—Their lives will be improved by using the multimodal application

  • Interaction designer—Designs the dialog—when and how the user and system interchange requests and information

  • Multimodal programmer—Implements VUI 

  • Voice talent—Records spoken prompts and messages

  • Grammar writer—Specifies words and phrases the user may speak in response to a prompt

  • TTS specialist—Specifies verbal and audio sounds and inflections

  • Quality assurance specialist—Performs tests to validate the application is both useful and usable

  • Customer—Pays the bills

  • Program manager—Organizes the work and makes sure it is completed according to schedule and under budget

Developing & Delivering Multimodal Applications


Development process
Development Process (continued)

  • Investigation Stage

  • Design Stage

  • Development Stage

  • Testing Stage

  • Sustaining Stage

Each stage involves users

Iterative refinement

Developing & Delivering Multimodal Applications


Development process1
Development Process (continued)

  • Investigation Stage

  • Design Stage

  • Development Stage

  • Testing Stage

  • Sustaining Stage

  • Identify the Application

  • Conduct ethnography studies

  • Identify candidate applications

  • Conduct focus groups

  • Select the application

Developing & Delivering Multimodal Applications



Exercise 3
Exercise 3 (continued)

  • What will be the “killer” consumer multimodal applications?

Developing & Delivering Multimodal Applications


Development process2
Development Process (continued)

  • Investigation Stage

  • Design Stage

  • Development Stage

  • Testing Stage

  • Sustaining Stage

  • Specify the Application

  • Construct the conceptual model

  • Construct scenarios

  • Specify performance and preference requirements

Developing & Delivering Multimodal Applications


Specify performance and preference requirements
Specify Performance and Preference Requirements (continued)

Is the application useful?

Is the application enjoyable?

Performance

Preference

Measure users’ likes and dislikes.

Measure what the users actually accomplished.

Validate that the users enjoyed the application and will use it again again.

Validate that the users achieved success.

Developing & Delivering Multimodal Applications


Performance metrics

User Task (continued)

Measure

Typical Criteria

Speak a command

Word error rate

Less than 3%

The caller supplies values into a form

Enters valid values into each field of a form

< 5 seconds per value

Navigate a list

The user successfully selects the specified option.

Greater than 95%

Purchase a product

The user successfully completes the purchase option.

Greater than 93%

Performance Metrics

Developing & Delivering Multimodal Applications


Exercise 4

User Task (continued)

Measure

Typical Criteria

Exercise 4

Specify performance metrics for the multimodal email application

Developing & Delivering Multimodal Applications


Preference metrics

Question (continued)

Typical Criteria

On a scale from 1 to 10, rate the help facility.

The average caller score is greater than 8.

On a scale from 1 to 10, rate the ease of use of this application.

The average caller score is greater than 8.

Would you recommend using this voice portal to a friend?

Over 80% of callers respond by saying “yes.”

What would you be willing to pay to each time you use this application?

Over 80% of callers indicate that they are willing to pay $1.00 or more per use.

Preference Metrics

Developing & Delivering Multimodal Applications


Exercise 5

Question (continued)

Typical Criteria

Exercise 5

Specify preference metrics for the multimodal email application

Developing & Delivering Multimodal Applications


Preference metrics open ended questions
Preference Metrics (continued)(Open-ended Questions)

  • What did you like the best about this voice-enabled application? (Do not change these features.)

  • What did you like the least about this voice-enabled application? (Consider changing these features.)

  • What new features would you like to have added? (Consider adding these features in this or a later release.)

  • What features do you think you will never use? (Consider deleting these features.)

  • Do you have any other comments and suggestions? (Pay attention to these responses. Callers frequently suggest very useful ideas.)

Developing & Delivering Multimodal Applications


Development process3
Development Process (continued)

  • Investigation Stage

  • Design Stage

  • Development Stage

  • Testing Stage

  • Sustaining Stage

  • Develop the Application

  • Specify the persona

  • Specify the modes and modalities

  • Specify the dialog script

Developing & Delivering Multimodal Applications


Ui design guidelines
UI Design Guidelines (continued)

  • Guidelines for Voice User Interfaces

    • Bruce Balentine and David P. Morgan. How to Build a Speech Recognition Application, Second Edition. http://www.eiginc.com

      Guidelines for Graphical User Interfaces

    • Research-Based Web Design and Usability Guidelines. U.S. Department of Health and Human Services. http://www.usability.gov/pdfs/guidelines.html

  • Guidelines for Graphical User Interfaces

    • Common Sense Guidelines for Developing Multimodal User Interfaces.W3C Working Group Note. 19 April 2006 http://www.w3.org/2002/mmi/Group/2006/Guidelines/

Developing & Delivering Multimodal Applications


Common sense suggestions 1 satisfy real world constraints
Common-sense Suggestions (continued)1. Satisfy Real-World Constraints

  • Task-oriented Guidelines

  • 1.1. Guideline: For each task, use the easiest mode available on the device.

  • Physical Guidelines

  • 1.2. Guideline: If the user’s hands are busy, then use speech.

  • 1.3. Guideline: If the user’s eyes are busy, then use speech.

  • 1.4. Guideline: If the user may be walking, use speech for input.

  • Environmental Guidelines

  • 1.5. Guideline: If the user may be in a noisy environment, then use a pen, keys or mouse.

  • 1.6. Guideline: If the user’s manual dexterity may be impaired, then use speech.

Developing & Delivering Multimodal Applications


Exercise 6
Exercise 6 (continued)

  • What input mode(s) should be used for each of the following tasks?

  • A. Selecting objects

  • B. Entering text

  • C. Entering symbols

  • D. Enter sketches or illustrations

Developing & Delivering Multimodal Applications


Common sense suggestions 2 communicate clearly concisely and consistently with users
Common-sense Suggestions (continued)2. Communicate Clearly, Concisely, and Consistently with Users

  • Consistency Guidelines

  • 2.1. Phrase all prompts consistently.

  • 2.2. Enable the user to speak keyword utterances rather than natural language sentences.

  • 2.3. Switch presentation modes only when the information is not easily presented in the current mode.

  • 2.4. Make commands consistent.

  • 2.5. Make the focus consistent across modes.

  • Organizational Guidelines

  • 2.6. Use audio to indicate the verbal structure.

  • 2.7. Use pauses to divide information into natural “chunks.”

  • 2.8. Use animation and sound to show transitions.

  • 2.9. Use voice navigation to reduce the number of screens.

  • 2.10. Synchronize multiple modalities appropriately.

  • 2.11. Keep the user interface as simple as possible.

Developing & Delivering Multimodal Applications


Common sense suggestions 3 help users recover quickly and efficiently from errors
Common-sense Suggestions (continued)3. Help Users Recover Quickly and Efficiently from Errors

  • Conversational Guidelines

  • 3.1. Users tend to use the same mode that was used to prompt them.

  • 3.2. If privacy is not a concern, use speech as output to provide commentary or help.

  • 3.3. Use directed user interfaces, unless the user is always knowledgeable and experienced in the domain.

  • 3.4 Always provide context-sensitive help for every field and command.

Developing & Delivering Multimodal Applications


Common sense suggestions 3 help users recover quickly and efficiently from errors continued
Common-sense Suggestions (continued)3. Help Users Recover Quickly and Efficiently from Errors (Continued)

  • Reliability Guidelines

  • Operational status

  • 3.5. The user always should be able to determine easily if the device is listening to the user.

  • 3.6. For devices with batteries, users always should be able to determine easily how much longer the device will be operational.

  • 3.8. Support at least two input modes so one input mode can be used when the other cannot.

  • Visual feedback

  • 3.8. Present words recognized by the speech recognition system on the display, so the user can verify they are correct.

  • 3.9. Display the n-best list to enable easy speech recognition error correction

  • 3.10. Try to keep response times less than 5 seconds. Inform the user of longer response times.

Developing & Delivering Multimodal Applications


Common sense suggestions 4 make users comfortable
Common-sense Suggestions (continued)4. Make Users Comfortable

  • Listening mode

  • 4.1. Speak after pressing a speak key. which automatically releases after the user finishes speaking.

  • System Status

  • 4.2. Always present the current system status to the user.

  • Human-memory Constraints

  • 4.3. Use the screen to ease stress on the user’s short-term memory.

Developing & Delivering Multimodal Applications


Common sense suggestions 4 make users comfortable continued
Common-sense Suggestions (continued)4. Make Users Comfortable (Continued)

  • Social Guidelines

  • 4.4. If the user may need privacy, use a display rather than render speech.

  • 4.5. If the user may need privacy, use a pen or keys.

  • 4.6. If the device may be used during a business meeting, then use a pen or keys (with the keyboard sounds turned off).

  • Advertising Guidelines

  • 4.7. Use animation and sound to attract the user’s attention.

  • 4.8. Use landmarks to help the know where he is.

Developing & Delivering Multimodal Applications


Common sense suggestions 4 make users comfortable continued1
Common-sense Suggestions (continued)4. Make Users Comfortable (continued)

  • Ambience

  • 4.9 Use audio and graphic design to set the mood and convey emotion in games and entertainment applications.

  • Accessibility

  • 4.10 For each traditional output technique, provide an alternative output technique.

  • 4.11. Enable users to adjust the output presentation.

Developing & Delivering Multimodal Applications


Books
Books (continued)

  • Ramon Lopez-Cozar Delgado and Masahiro Araki. Spoken, Multilingual and Multimodal Dialog Systems—Development and Assessment. West Sussex, England: Wiley, 2005.

  • Julie A. Jacko and Andrew Sears (Editors) The Human-Computer Interaction Handbook—Fundamentals, Evolving technologies, and Emerging Applications. Mahwah, New Jersey: Lawrence Erlbaum Associates, 2003.

Developing & Delivering Multimodal Applications


Development process4
Development Process (continued)

  • Investigation Stage

  • Design Stage

  • Development Stage

  • Testing Stage

  • Sustaining Stage

  • Test The Application

  • Component test

  • Usability test

  • Stress test

  • Field test

Developing & Delivering Multimodal Applications


Testing resources
Testing Resources (continued)

  • Jeffrey Rubin. Handbook of Usability Testing. New York: Wiley Technical Communication Library, 1994.

  • Peter and David Leppik. Gourmet Customer Service. Eden Prairie, MN: VocalLabs, 2005. sales@vocalabs.com

Developing & Delivering Multimodal Applications


Development process5
Development Process (continued)

  • Investigation Stage

  • Design Stage

  • Development Stage

  • Testing Stage

  • Sustaining Stage

  • Deploy and Monitor the Application

  • User Survey

  • Usage reports from log files

  • User feedback and comments

Developing & Delivering Multimodal Applications


Developing and deploying multimodal applications2
Developing and Deploying Multimodal Applications (continued)

  • What applications should be multimodal?

  • What is the multimodal application development process?

  • What standard languages can be used to develop multimodal applications?

  • What standard platforms are available for multimodal applications?

Developing & Delivering Multimodal Applications


W3c multimodal interaction framework
W3C Multimodal Interaction Framework (continued)

  • Recognition Grammar

  • Semantic Interpretation

  • Extended Multimodal Annotation (EMMA)

  • Speech Synthesis

  • Interaction Managers

General description of speech application components and how they relate

Developing & Delivering Multimodal Applications


W3c multimodal interaction framework1
W3C Multimodal Interaction Framework (continued)

Input

Interaction

Manager

Application

Functions

Output

Telephony

Properties

Developing & Delivering Multimodal Applications


W3C Multimodal Interaction Framework (continued)

Interaction

Manager

Application

Functions

Information

Integration

ASR

Semantic

Interpretation

Ink

Display

Telephony

Functions

Audio

Media

Planning

User

TTS

Language

Generation

Developing & Delivering Multimodal Applications


W3C Multimodal Interaction Framework (continued)

SRGS: Describe what the user may say at each point in the dialog

Interaction

Manager

Application

Functions

Information

Integration

ASR

Semantic

Interpretation

Ink

Display

Telephony

Functions

Audio

Media

Planning

User

TTS

Language

Generation

Developing & Delivering Multimodal Applications


Speech recognition engines

Low-end (continued)

High-end

Other

Speaking mode

Isolated (discrete)

Continuous

Keywords

Enrollment

Speaker dependent

Speaker independent

Adaptive

Vocabulary size

Small

Large

Switch vocabularies

Speaking style

Read

Spontaneous

Number of simultaneous callers

Single-threaded

Multi-threaded

Speech Recognition Engines

Developing & Delivering Multimodal Applications


Speech recognition engines1

Low-end (continued)

High-end

Other

Speaking mode

Isolated (discrete)

Continuous

Keywords

Enrollment

Speaker dependent

Speaker

independent

Adaptive

Vocabulary size

Small

Large

Speaking style

Read

Spontaneous

Number of simultaneous callers

Single-threaded

Multi-threaded

Speech Recognition Engines

Switch

vocabularies

Developing & Delivering Multimodal Applications


Grammars
Grammars (continued)

  • Describe what the user may say or handwrite at a point in the dialog

  • Enable the recognition engine to work faster and more accurately

  • Two types of grammars:

    • Structured Grammar

    • Statistical Grammar (N-grams)

Developing & Delivering Multimodal Applications


Structured grammars
Structured Grammars (continued)

  • Specifies words that a user may speak or write

  • Two representation formats

    1. Backus-Naur format (ABNF)

    Production Rules

    Single_digit ::= zero | one | two | … | nine

    Zero_thru_ten ::= Single_digit | ten

    2. XML format

    Can be processed by XML validater

Developing & Delivering Multimodal Applications


Example xml grammar
Example XML Grammar (continued)

  • <grammar mode = "voice" type = "application/srgs+xml" root = "zero_to_ten“><rule id = "zero_to_ten">       <one-of>              <ruleref uri = "#single_digit"/>              <item> ten </item>        </one-of></rule>

  •      <rule id = "single_digit">          <one-of>               <item> zero </item>               <item> one </item>               <item> two </item>               <item> three </item>               <item> four </item>               <item> five </item>               <item> six </item>               <item> seven </item>               <item> eight </item>              <item> nine </item>          </one-of>     </rule></grammar>

Developing & Delivering Multimodal Applications


Exercise 7
Exercise 7 (continued)

  • Write a grammar that recognizes the digits zero through nineteen

  • (Hint: Modify the previous page)

Developing & Delivering Multimodal Applications


Reusing existing grammars
Reusing Existing Grammars (continued)

  • <grammar

  • type = "application/srgs+xml" root = "size " src = "http://www.example.com/size.grxml"/>

Developing & Delivering Multimodal Applications


Exercise 8
Exercise 8 (continued)

  • Write a grammar for positive responses to a yes/no question (i.e., “yes,” “sure,” “affirmative,” and so forth)

Developing & Delivering Multimodal Applications


When is a grammar too large
When Is a Grammar Too Large? (continued)

Word

Coverage

Response

Developing & Delivering Multimodal Applications


W3C Multimodal Interaction Framework (continued)

SISR: A procedural JavaScript-like language for interpreting the text strings returned by the speech synthesis engine

Interaction

Manager

Application

Functions

Information

Integration

ASR

Semantic

Interpretation

Ink

Display

Telephony

Functions

Audio

Media

Planning

User

TTS

Language

Generation

Developing & Delivering Multimodal Applications


Semantic interpretation
Semantic Interpretation (continued)

  • Semantic scripts employ ECMAScript

  • Advantages:

    • Translate aliases to vocabulary words

    • Perform calculations

    • Produces a rich structure rather than a text string

Developing & Delivering Multimodal Applications


Semantic interpretation1

Large white (continued)

t-shirt

Semantic Interpretation

Grammar

Recognizer

Conversation

Manager

Big white

t-shirt

Developing & Delivering Multimodal Applications


Semantic interpretation2
Semantic Interpretation (continued)

Big white

t-shirt

Grammar with

Semantic

Interpretation

Scripts

<rule id = "action">

<one-of>     <item> small <tag> out.size = "small"; </tag> </item>        <item> medium <tag> out.size = "medium"; </tag> </item> <item> large <tag> out.size = "large"; </tag> </item> <item> big <tag> out.size = "large"; </tag> </item>

    </one-of> <one-of>     <item> green <tag> out.color = "green"; </tag> </item>        <item> blue   <tag> out.color = "blue"; </tag>  </item>        <item> white <tag> out.color = "white"; </tag>  </item>    </one-of></rule>

Recognizer

Semantic

Interpretation

Processor

Conversation

Manager

{

size: large

color: white

}

Developing & Delivering Multimodal Applications


Exercise 9 modify this rule to return only yes
Exercise 9 (continued)Modify this rule to return only “yes”

  • <grammar type = "application/srgs+xml" root = "yes" mode = "voice">

  • <rule id = "yes">       <one-of>              <item> yes </item>              <item> sure </item> <item> affirmative </item>

  • …  

  • </one-of> </rule>

  • </grammar>

Developing & Delivering Multimodal Applications


W3C Multimodal Interaction Framework (continued)

EMMA: A language for representing the semantic content from speech recognizers, handwriting recognizers, and other input devices

Interaction

Manager

Application

Functions

Information

Integration

ASR

Semantic

Interpretation

Ink

Display

Telephony

Functions

Audio

Media

Planning

User

TTS

Language

Generation

Developing & Delivering Multimodal Applications


EMMA (continued)

  • Extensible MultiModal Annotation markup language

  • Canonical structure semantic interpretations for a variety of inputs including:

    • Speech

    • Natural language text

    • GUI

    • Ink

Developing & Delivering Multimodal Applications


EMMA (continued)

Speech

Keyboard

Grammar

+ Semantic

Interpretation

Instructions

Interpretation

Instructions

Speech

Recognition

Keyboard

Interpretation

EMMA

EMMA

Merging/

Unification

EMMA

Applications

Developing & Delivering Multimodal Applications


EMMA (continued)

Speech

Keyboard

Grammar

+ Semantic

Interpretation

Instructions

Interpretation

Instructions

Speech

Recognition

Keyboard

Interpretation

EMMA

EMMA

Merging/

Unification

<interpretation mode = "speech">

<travel>

<to hook="ink"/>

<from hook="ink"/>

<day> Tuesday </day>

</travel>

</interpretation>

EMMA

Applications

Developing & Delivering Multimodal Applications


EMMA (continued)

Speech

Keyboard

Grammar

+ Semantic

Interpretation

Instructions

Interpretation

Instructions

Speech

Recognition

Keyboard

Interpretation

EMMA

EMMA

Merging/

Unification

<interpretation mode = "speech">

<travel>

<to hook="ink"/>

<from hook="ink"/>

<day> Tuesday </day>

</travel>

</interpretation>

<interpretation mode = "ink">

<travel>

<to>Las Vegas </to>

<from>Portland </from>

</travel>

</interpretation>

EMMA

Applications

Developing & Delivering Multimodal Applications


EMMA (continued)

<interpretation mode = "interp1">

<travel>

<to> Las Vegas </to>

<from> Portland </from>

<day> Tuesday </day>

</travel>

</interpretation>

Speech

Keyboard

Grammar

+ Semantic

Interpretation

Instructions

Interpretation

Instructions

Speech

Recognition

Keyboard

Interpretation

EMMA

EMMA

Merging/

Unification

<interpretation mode = "speech">

<travel>

<to hook="ink"/>

<from hook="ink"/>

<day> Tuesday </day>

</travel>

</interpretation>

<interpretation mode = "ink">

<travel>

<to>Las Vegas </to>

<from>Portland </from>

</travel>

</interpretation>

EMMA

Applications

Developing & Delivering Multimodal Applications


Exercise 10

<interpretation mode = "speech"> (continued) <moneyTransfer> <sourceAcct hook="ink"/> <targetAcct hook="ink"/> <amount> 300 </amount> </moneyTransfer></interpretation>

<interpretation mode = "ink"> <moneyTransfer> <sourceAcct> savings </sourceAcct> <targetAcct> checking</targetAcct> </moneyTransfer></interpretation>

Exercise 10

Given the following two EMMA specifications,

what is the unified EMMA specification?

Unified EMMA specification:

  • <interpretation mode ="intp1"> <moneyTransfer> <sourceAcct> ______ </sourceAcct> <targetAcct> _______</targetAcct> <amount> ______ </amount> </moneyTransfer></interpretation>

Developing & Delivering Multimodal Applications


W3C Multimodal Interaction Framework (continued)

SSML: A language for rendering text as synthesized speech

Interaction

Manager

Application

Functions

Information

Integration

ASR

Semantic

Interpretation

Ink

Display

Telephony

Functions

Audio

Media

Planning

User

TTS

Language

Generation

Developing & Delivering Multimodal Applications


Speech synthesis markup language
Speech Synthesis Markup Language (continued)

Structure

Analysis

Text

Normali-

zation

Text-to-

Phoneme

Conversion

Prosody

Analysis

Waveform

Production

Markup support:

phoneme, sayas

Non-markup behavior:

look up in pronunciation

dictionary

Markup support:

paragraph, sentence

Non-markup behavior:

infer structure by

automated text analysis

Markup support:

emphasis, break, prosody

Non-markup behavior:

automatically generate

prosody through analysis

of document structure and

sentence syntax

Markup support:sayas for dates, times, etc.

Non-markup behavior: automatically identify

and convert constructs

Developing & Delivering Multimodal Applications


Speech synthesis markup language examples
Speech Synthesis Markup Language (continued)Examples

  • <phoneme alphabet="ipa" ph="wɪnɛfɛks"> WinFX </phoneme>is a great platform

  • <prosody pitch = "x-low">      Who’s been sleeping in my bed? </prosody>  said papa bear. <prosody pitch = "medium">     Who’s been sleeping in my bed? </prosody> said momma bear.  <prosody pitch = "x-high">      Who’s been sleeping in my bed? </prosody> said baby bear.

Developing & Delivering Multimodal Applications


Popular strategy
Popular Strategy (continued)

  • Develop dialogs using SSML

  • Usability test dialogs

  • Extract prompts

  • Hire voice talent to record prompts

  • Replace <prompt> with <audio>

Developing & Delivering Multimodal Applications


W3C Multimodal Interaction Framework (continued)

VXML: A language for controlling the exchange of information and commands between the user and the system

Interaction

Manager

Application

Functions

Information

Integration

ASR

Semantic

Interpretation

Ink

Display

Telephony

Functions

Audio

Media

Planning

User

TTS

Language

Generation

Developing & Delivering Multimodal Applications


Developing and deploying multimodal applications3
Developing and Deploying Multimodal Applications (continued)

  • What applications should be multimodal?

  • What is the multimodal application development process?

  • What standard languages can be used to develop multimodal applications?

  • What standard platforms are available for multimodal applications?

Developing & Delivering Multimodal Applications


Speech apis and sdks
Speech APIs and SDKs (continued)

  • JSAPI—Java Speech Application Program Interface

    • http://java.sun.com/products/java-media/speech/

    • http://developer.mozilla.org/en/docs/JSAPI_Reference

  • Nuance Mobil Speech Platform

    • http://www.nuance.com/speechplatform/components.asp

  • VSAPI—Voice Signal API

    • http://www.voicesignal.com/news/articles/2006-06-21-SymbianOne.htm

  • SALT

    • http://www.saltforum.org/

Developing & Delivering Multimodal Applications


Interaction manager approaches
Interaction Manager Approaches (continued)

X+V

W3C

Object-

oriented

Interaction

Manager

(C#)

Interaction

Manager

(XHTML)

Interaction

Manager

(SCXML)

VoiceXML 2.0

Modules

XHTML

SAPI 5.3

VoiceXML 3.0

InkML

Developing & Delivering Multimodal Applications


Interaction manager approaches1
Interaction Manager Approaches (continued)

X+V

W3C

Object-

oriented

Interaction

Manager

(C#)

Interaction

Manager

(XHTML)

Interaction

Manager

(SCXML)

VoiceXML 2.0

Modules

XHTML

SAPI 5.3

VoiceXML 3.0

InkML

Developing & Delivering Multimodal Applications


Sapi 5 3 windows vista speech synthesis
SAPI 5.3 & Windows Vista™ (continued)Speech Synthesis

  • W3C Speech Synthesis Markup Language 1.0

  • <speak> <phoneme alphabet="ipa" ph="wɪnɛfɛks"> WinFX </phoneme> is a great platform </speak>

  • Microsoft proprietary PromptBuilder

  • myPrompt.AppendTextWithPronunciation ("WinFX", "wɪnɛfɛks"); myPrompt.AppendText("is a great platform.");

Object-

oriented

Interaction

Manager

(C#)

SAPI 5.3

Developing & Delivering Multimodal Applications


Sapi 5 3 windows vista speech recognition
SAPI 5.3 & Windows Vista™ (continued)Speech Recognition

  • W3C Speech Recognition Grammar Specification 1.0

  • <grammar type="application/srgs+xml" root= "city" mode="voice"> <rule id = "city"> <one-of> <item> New York City </item> <item> New York </item> <item> Boston </item> </one-of> </rule> </grammar>

  • Microsoft proprietary Grammar Builder

  • Choices cityChoices = new Choices(); cityChoices.AddPhrase ("New York City"); cityChoices.AddPhrase ("New York"); cityChoices.AddPhrase ("Boston"); Grammar pizzaGrammar = new Grammar (new GrammarBuilder(pizzaChoices));

Developing & Delivering Multimodal Applications


Sapi 5 3 windows vista semantic interpretation
SAPI 5.3 & Windows Vista™ (continued)Semantic Interpretation

  • Augment SRGS grammar with Jscript® for semantic interpretation

  • <grammar type="application/srgs+xml" root= "city" mode="voice"> <rule id = "city"> <one-of> <item> New York City <tag> city="JFK" </tag></item> <item> New York <tag> city = "JFK" </tag> </item> <item> Portland <tag> city = "PDX" </tag></item> </one-of> </rule> </grammar>

  • User-Specified “Shortcuts” recognizer replaces “shortcut word”by expanded string

  • User says: my address

  • System: 1033 Smith Street, Apt. 7C, Bloggsville 00000

Developing & Delivering Multimodal Applications


Sapi 5 3 windows vista dialog
SAPI 5.3 & Windows Vista™ (continued)Dialog

  • Introduce the System Speech.Recognition namespace

  • Instantiate a SpeechRecognizer object

  • Build a grammar

  • Attach an event handler

  • Load the grammar into the recognizer

  • When the recognizer hears something that fits the grammar, the SpeechRecognized event handler is invoked, which accesses the Result object and works with the recognized text

Developing & Delivering Multimodal Applications


Sapi 5 3 windows vista dialog1

using System; (continued)

using System.Windows.Forms;

using System.ComponentModel;

using System.Collections.Generic;

using System.Speech.Recognition;

namespace Reco_Sample_1

{

public partial class Form1 : Form

{

//create a recognizer

SpeechRecognizer _recognizer = new SpeechRecognizer();

public Form1() { InitializeComponent(); }

private void Form1_Load(object sender, EventArgs e)

//Create a pizza grammar

Choices pizzaChoices = new Choices();

pizzaChoices.AddPhrase("I'd like a cheese pizza");

pizzaChoices.AddPhrase("I'd like a pepperoni pizza");

{

pizzaChoices.AddPhrase("I'd like a large pepperoni pizza");

pizzaChoices.AddPhrase(

"I'd like a small thin crust vegetarian pizza");

Grammar pizzaGrammar =

new Grammar(new GrammarBuilder(pizzaChoices));

//Attach an event handler

pizzaGrammar.SpeechRecognized +=

new EventHandler<RecognitionEventArgs>(

PizzaGrammar_SpeechRecognized);

_recognizer.LoadGrammar(pizzaGrammar);

}

void PizzaGrammar_SpeechRecognized(

object sender, RecognitionEventArgs e)

{

MessageBox.Show(e.Result.Text);

}

}

}

SAPI 5.3 & Windows Vista™Dialog

Developing & Delivering Multimodal Applications


Sapi 5 3 windows vista references
SAPI 5.3 & Windows Vista™ (continued)References

  • Speech API Overview

  • http://msdn2.microsoft.com/en- us/library/ms720151.aspx#API_Speech_Recognition

  • Microsoft Speech API (SAPI) 5.3

  • http://msdn2.microsoft.com/en-us/library/ms723627.aspx

  • “Exploring New Speech Recognition And Synthesis APIs In Windows Vista” by Robert Brown

  • http://msdn.microsoft.com/msdnmag/issues/06/01/ speechinWindowsVista/default.aspx#Resources

Developing & Delivering Multimodal Applications


Interaction manager approaches2
Interaction Manager Approaches (continued)

X+V

W3C

Object-

oriented

Interaction

Manager

(C#)

Interaction

Manager

(XHTML)

Interaction

Manager

(SCXML)

VoiceXML 2.0

Modules

XHTML

SAPI 5.3

VoiceXML 3.0

InkML

Developing & Delivering Multimodal Applications


Step 1 start with standard voicexml and standard xhtml
Step 1: Start with Standard VoiceXML and Standard XHTML (continued)

  • VoiceXML

  • <form id="topform"> <field name="city"> <prompt>Say a name</prompt> <grammar src="city.grxml"/> </field> </form>

  • XHTML

  • <form> Result: <input type="text" name="in1"/> </form>

W3C grammar language

Developing & Delivering Multimodal Applications


Step 2 combine
Step 2: Combine (continued)

  • <html xmlns="http://www.w3.org/1999/xhtml">

  • <head><form id="topform"> <field name="city"> <prompt>Say a name</vxml:prompt> <grammar src ="city.grxml"/> </field></form></head>

  • <body <form> Result: <input type="text" name="in1"/> </form></body>

  • </html>

Developing & Delivering Multimodal Applications


Step 3 insert vxml namespace
Step 3: Insert vxml Namespace (continued)

  • <html xmlns="http://www.w3.org/1999/xhtml"

  • xmlns:vxml="http://www.w3.org/2001/vxml">

  • <head> <vxml:form id="topform"> <vxml:field name="city"> <vxml:prompt>Say a name</vxml:prompt> <vxml:grammar ="city.grxml"/> </vxml:field> </vxml:form></head>

  • <body> <form> Result: <input type="text" name="in1"/ </form></body>

  • </html>

Developing & Delivering Multimodal Applications


Step 4 insert event
Step 4: Insert event (continued)

  • <html xmlns=http://www.w3.org/1999/xhtml xmlns:vxml=http://www.w3.org/2001/vxmlxmlns:ev="http://www.w3.org/2001/xml-events">

  • <head> <vxml:form id="topform"> <vxml:field name="city"> <vxml:prompt>Say a name</vxml:prompt> <vxml:grammar src ="city.grxml"/> </vxml:field> </vxml:form></head>

  • <body <form ev:event="load" ev:handler="#topform"> Result: <input type="text" name="in1"/> </form></body>

  • </html>

Developing & Delivering Multimodal Applications


Step 5 insert sync
Step 5: Insert <sync> (continued)

  • <html xmlns=http://www.w3.org/1999/xhtml xmlns:vxml=http://www.w3.org/2001/vxml xmlns:ev=http://www.w3.org/2001/xml-events xmlns:xv="http://www.w3.org/2002/xhtml+voice">

  • <head> <xv:sync xv:input="in1" xv:field="#result"/> <vxml:form id="topform"> <vxml:field name="city" xv:id="result"> <vxml:prompt>Say a name</vxml:prompt> <vxml:grammar src ="city.grxml"/> </vxml:field> </vxml:form></head>

  • <body <form ev:event="load" ev:handler="#topform"> Result: <input type="text" name="in1"/> </form></body>

  • </html>

Developing & Delivering Multimodal Applications


Xhtml plus voice x v references
XHTML plus Voice (X+V) References (continued)

  • Available on

    • ACCESS Systems’ NetFront Multimodal Browser for PocketPC 2003

      http://www-306.ibm.com/software/pervasive/ multimodal/?Open&ca=daw-prod-mmb

    • Opera Software Multimodal Browser for Sharp Zaurus http://www-306.ibm.com/software/pervasive/ multimodal/?Open&ca=daw-prod-mmb

    • Opera 9 for Windows http://www.opera.com/

  • Programmers Guide

    • ftp://ftp.software.ibm.com/software/pervasive/info/multimodal /XHTML_voice_programmers_guide.pdf

  • For a variety of small illustrative applications

    • http://www.larson-tech.com/MM-Projects/Demos.htm

  • Developing & Delivering Multimodal Applications


    Exercise 11
    Exercise 11 (continued)

    • Specify the X+V notation for integrating the following VoiceXML and XHTML code by completing the code on the next page

    • VoiceXML

    • <form id="stateForm"> <field name="state"> <prompt>Say a state name</prompt> <grammar src="city.grxml"/> </field> </form>

    • XHTML

    • <form> Result: <input type="text" name="in1"/> </form>

    Developing & Delivering Multimodal Applications


    Exercise 11 continued
    Exercise 11 (continued) (continued)

    • <html xmlns="http://www.w3.org/1999/xhtml" xmlns:vxml="http://www.w3.org/2001/vxml" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:xv="http://www.w3.org/2002/xhtml+voice">

    • <head> <xv:sync xv:input="_______" xv:field="________"/> <vxml:form id="________"> <vxml:field name="state" xv:id="________“> <vxml:prompt>Say a state name</vxml:prompt> <vxml:grammar src ="state.grxml"/> </vxml:field> </vxml:form></head>

    • <body <form ev:event="load" ev:handler="#________"> Result: <input type="text" name="_______"/> </form></body>

    • </html>

    Developing & Delivering Multimodal Applications


    Interaction manager approaches3
    Interaction Manager Approaches (continued)

    X+V

    W3C

    Object-

    oriented

    Interaction

    Manager

    (C#)

    Interaction

    Manager

    (XHTML)

    Interaction

    Manager

    (SCXML)

    VoiceXML 2.0

    Modules

    XHTML

    SAPI 5.3

    VoiceXML 3.0

    InkML

    Developing & Delivering Multimodal Applications


    Mmi architecture 4 basic components
    MMI Architecture—4 Basic Components (continued)

    • Runtime Framework or Browser— initializes application and interprets the markup

    • Interaction Manager—coordinates modality components and provides application flow

    • Modality Components—provide modality capabilities such as speech, pen, keyboard, mouse

    • Data Model—handles shared data

    Interaction

    Manager

    (SCXML)

    Data

    Model

    XHTML

    VoiceXML 3.0

    InkML

    Developing & Delivering Multimodal Applications


    Multimodal architecture and interfaces
    Multimodal Architecture and Interfaces (continued)

    • A loosely-coupled, event-based architecture for integrating multiple modalities into applications

    • All communication is event-based

    • Based on a set of standard life-cycle events

    • Components can also expose other events as required

    • Encapsulation protects component data

    • Encapsulation enhances extensibility to new modalities

    • Can be used outside a Web environment

    Interaction

    Manager

    (SCXML)

    Data

    Model

    XHTML

    VoiceXML 3.0

    InkML

    Developing & Delivering Multimodal Applications


    Specify interaction manager using harel state charts
    Specify Interaction Manager Using Harel State Charts (continued)

    Prepare

    State

    • Extension of state transition systems

      • States

      • Transitions

      • Nested state-transition systems

    • Parallel state-transition systems

    • History

    Prepare

    Response

    (fail)

    Prepare

    Response

    (success)

    Start

    State

    StartFail

    Start

    Response

    FailState

    WaitState

    DoneFail

    Done

    Success

    EndState

    Developing & Delivering Multimodal Applications


    Example state transition system

    State Chart XML (SCXML) (continued)

    <state id="PrepareState">

    <send event="prepare" contentURL="hello.vxml"/>

    <transition event="prepareResponse" cond="status='success'" target="StartState"/>

    <transition event="prepareResponse" cond="status='failure'" target="FailState"/>

    </state>

    Example State Transition System

    Prepare

    State

    Prepare

    Response

    (fail)

    Prepare

    Response

    (success)

    Start

    State

    StartFail

    Start

    Response

    FailState

    WaitState

    DoneFail

    Done

    Success

    EndState

    Developing & Delivering Multimodal Applications


    Example state chart with parallel states
    Example State Chart with Parallel States (continued)

    Prepare

    Voice

    Prepare

    GUI

    Prepare

    Response

    Fail

    Prepare

    Response

    Fail

    Prepare

    Response

    Success

    Prepare

    Response

    Success

    Start

    Voice

    Start

    GUI

    Start Fail

    Start Fail

    Start

    Response

    Fail Voice

    Start

    Response

    Fail GUI

    Done Fail

    Done Fail

    Wait

    Voice

    Wait

    GUI

    Done

    Success

    Done

    Success

    End

    Voice

    End

    GUI

    Developing & Delivering Multimodal Applications


    The life cycle events

    prepare (continued)

    prepare

    Interaction

    Manager

    prepareResponse

    prepareResponse

    GUI

    VUI

    start

    start

    Interaction

    Manager

    startResponse

    startResponse

    GUI

    VUI

    cancel

    cancel

    Interaction

    Manager

    cancelResponse

    cancelResponse

    GUI

    VUI

    pause

    pause

    Interaction

    Manager

    pauseResponse

    pauseResponse

    GUI

    VUI

    resume

    resume

    Interaction

    Manager

    resumeResponse

    resumeResponse

    GUI

    VUI

    The Life Cycle Events

    Developing & Delivering Multimodal Applications


    More life cycle events
    More Life Cycle Events (continued)

    newContextRequest

    Interaction

    Manager

    newContextRequest

    newContextResponse

    newContextResponse

    GUI

    VUI

    Interaction

    Manager

    data

    data

    GUI

    VUI

    Interaction

    Manager

    done

    GUI

    clearContext

    clearContext

    Interaction

    Manager

    GUI

    VUI

    Developing & Delivering Multimodal Applications


    Synchronization using the lifecycle data event

    Intent-based events (continued)

    Capture the underlying intent rather than the physical manifestation of user-interaction events

    Independent of the physical characteristics of particular devices

    Data/reset

    Reset one or more field values to null

    Data/focus

    Focus on another field

    Data/change

    Field value has changed

    Synchronization Using the Lifecycle Data Event

    Interaction

    Manager

    data

    data

    GUI

    VUI

    Developing & Delivering Multimodal Applications


    Lifecycle events between interaction manager and modality

    Modality (continued)

    Lifecycle Events between Interaction Manager and Modality

    Interaction Manager

    prepare

    Prepare

    State

    Prepare

    Response

    Fail

    prepare response (failure)

    Prepare

    Response

    Success)

    prepare response (success)

    start

    Start

    State

    start response (success)

    Start Fail

    Start

    Response

    FailState

    start response (failure)

    DoneFail

    WaitState

    data

    Done

    Success

    done

    EndState

    Developing & Delivering Multimodal Applications


    Mmi architecture principles
    MMI Architecture Principles (continued)

    • Runtime Framework communicates with Modality Components through asynchronous events

    • Modality Components don’t communicate directly with each other, but indirectly through the Runtime Framework

    • Components must implement basic life cycle events, may expose other events

    • Modality components can be nested (e.g. a Voice Dialog component like a VoiceXML <form>)

    • Components need not be markup-based

    • EMMA communicates users’ inputs to the Interaction Manager

    Developing & Delivering Multimodal Applications


    Modalities

    GUI Modality (XHTML) (continued)

    Adapter converts Lifecycle events to XHTML events

    XHTML events converted to lifecycle events

    Modalities

    Interaction

    Manager

    (SCXML)

    Data

    Model

    XHTML

    VoiceXML 3.0

    • Voice Modality (VoiceXML 3.0)

      • Lifecyle events are embeddedinto VoiceXML 3.0

    Developing & Delivering Multimodal Applications


    Exercise 12
    Exercise 12 (continued)

    • What should VoiceXML do when it receives each of the following events?

    • Reset

    • Change

    • Focus

    Developing & Delivering Multimodal Applications


    Modalities1

    VoiceXML 3.0 will support lifecycle events. (continued)

    <form> <catch name="change"> <assign name="city" value="data"/> </catch>

    <field name = "city"> <prompt> Blah </prompt> <grammar src="city.grxml"/> <filled><send event="data.change" data="city"/> </filled> </field>

    </form>

    Modalities

    Interaction

    Manager

    (SCXML)

    Data

    Model

    XHTML

    VoiceXML 3.0

    Developing & Delivering Multimodal Applications


    Exercise 13
    Exercise 13 (continued)

    • What should HTML do when it receives each of the following events?

    • Reset

    • Change

    • Focus

    Developing & Delivering Multimodal Applications


    Modalities2

    XHTML is extended to support lifecycle events (continued)sent to a modality.

    <head>…<ev:Listener ev:event="onChange" ev:observer="app1" ev:handler="onChangeHandler()";>…<script>{function onChangeHandler()post ("data", data="city")}</script></head>

    <body id="app1"? <input type="text" id=city "value= " "/></body>

    Modalities

    Interaction

    Manager

    (SCXML)

    Data

    Model

    XHTML

    VoiceXML 3.0

    Developing & Delivering Multimodal Applications


    Modalities3

    XHTML is extended to support lifecycle events (continued)sent to the interaction manager

    <head>…<handler type="text/javascript“ ev:event="data" if (event="change" {document.app1.city.value="data.city"}</handler>…</head>

    <body id="app1"? <input type="text" id="city" value=""/>

    </body>…

    Modalities

    Interaction

    Manager

    (SCXML)

    Data

    Model

    XHTML

    VoiceXML 3.0

    Developing & Delivering Multimodal Applications


    References
    References (continued)

    • SCXML

      • Second working draft available at http://www.w3.org/TR/2006/WD-scxml-20060124/

      • Open Source available from http://jakarta.apache.org/commons/sandbox/scxml/

    • Multimodal Architecture and Interfaces

      • Working draft available at http://www.w3.org/TR/2006/WD-mmi-arch-20060414/

    • Voice Modality

      • First working draft VoiceXML 3.0 scheduled for November 2007

  • XHTML

    • Full recommendation

    • Adapters must be hand-coded

  • Other modalities

    • TBD

  • Developing & Delivering Multimodal Applications


    Comparison
    Comparison (continued)

    Object- oriented X+V W3C

    Standard Languages SRGS VoiceXML SCXML

    SISR SRGS SRGS

    SSML SSML VoiceXML

    SISR SSML

    XHTML SISR

    XHTML

    EMMA

    CCXML

    Interaction Manager C# XHTML SCXML

    Modes GUI GUI GUI

    Speech Speech Speech

    Ink

    Developing & Delivering Multimodal Applications


    Availability
    Availability (continued)

    • SAPI 5.3

      • Microsoft Windows Vista®

        X+V

      • ACCESS Systems’ NetFront Multimodal Browser for PocketPC 2003

        http://www-306.ibm.com/software/pervasive/multimodal/?Open&ca=daw-prod-mmb

      • Opera Software Multimodal Browser for Sharp Zaurus http://www-306.ibm.com/software/pervasive/ multimodal/?Open&ca=daw-prod-mmb

      • Opera 9 for Windows http://www.opera.com/

        W3C

      • First working draft of VoiceXML 3.0 not yet available

      • Working drafts of SCXML are available; some open-source implementations are available

        Proprietary APIs

      • Available from vendor

    Developing & Delivering Multimodal Applications


    Discussion question
    Discussion Question (continued)

    • Should a developer insert SALT tags or X+V modules into an existing Web page without redesigning the Web page?

    Developing & Delivering Multimodal Applications


    Conclusion
    Conclusion (continued)

    • Multimodal applications offer benefits over today’s traditional GUIs.

    • Only use multimodal if there is a clear benefit.

    • Standard languages are available today to develop multimodal applications.

    • Don’t reinvent the wheel.

    • Creativity and lots of usability testing are necessary to create world-class multimodal applications.

    Developing & Delivering Multimodal Applications


    Web resources
    Web Resources (continued)

    • http://www.w3.org/voice

      • Specification of grammar, semantic interpretation, and speech synthesis languages

    • http://www.w3.org/2002/mmi

      • Specification of EMMA and InkML languages

    • http:/www.microsoft.com (and query SALT)

      • SALT specification and download instructions for adding SALT to Internet Explorer

    • http://www-306.ibm.com/software/pervasive/multimodal/

      • X+V specification; download Opera and ACCESS browsers

    • http://www.larson-tech.com/SALT/ReadMeFirst.html

      • Student projects using SALT to develop multimodal applications

    • http://www.larson-tech.com/MMGuide.html or http://www.w3.org/2002/mmi/Group/2006/Guidelines/

      • User interface guidelines for multimodal applications

    Developing & Delivering Multimodal Applications


    Status of w3c multimodal interface languages
    Status of W3C Multimodal Interface Languages (continued)

    Recommendation

    Voice

    XML 2.0

    Speech

    Recog-

    nition

    Grammar

    Format

    (SRGS)

    1.0

    Speech

    Synthesis

    Markup

    Language

    (SSML)

    1.0

    Voice

    XML 2.1

    Proposed

    Recommendation

    Candidate

    Recommendation

    Semantic

    Interpret-

    ation

    of

    Speech

    Recog-

    nition

    (SISR)

    1.0

    Last Call

    Working Draft

    Extended

    Multi-

    modal

    Interaction

    (EMMA)

    1.0

    Working Draft

    State

    Chart

    XML (SCXML)

    1.0

    InkXL

    1.0

    Requirements

    Developing & Delivering Multimodal Applications


    Questions
    Questions (continued)

    ?

    Developing & Delivering Multimodal Applications


    Answer to exercise 5
    Answer to Exercise 5 (continued)

    Developing & Delivering Multimodal Applications


    Answer to exercise 7 write a grammar for zero to nineteen
    Answer to Exercise 7 (continued)Write a grammar for zero to nineteen

    • <grammar type = "application/srgs+xml" root = "zero_to_19" mode = "voice"><rule id = "zero_to_19">       <one-of>              <ruleref uri = "#single_digit"/>

    •      <ruleref uri ="#teens">

    • </one-of></rule>

    •  <rule id = "single_digit">        <one-of>               <item> zero </item>               <item> one </item>               <item> two </item>               <item> three </item>               <item> four </item>               <item> five </item>               <item> six </item>               <item> seven </item>               <item> eight </item>              <item> nine </item>         </one-of></rule>

    <rule id = "#teens">  <one-of>             <item> ten</item> 

    <item> eleven </item>         <item> twelve </item>         <item> thirteen </item>         <item> fourteen </item>             <item> fifteen </item>             <item> sixteen </item>             <item> seventeen </item>             <item> eighteen </item>             <item> nineteen </item>     </one-of> </rule>

    </grammar>

    Developing & Delivering Multimodal Applications


    Answer to exercise 8
    Answer to Exercise 8 (continued)

    • <grammar type = "application/srgs+xml" root = "yes" mode = "voice">

    • <rule id = "yes">        <one-of>              <item> yes </item>              <item> sure </item> <item> affirmative </item>

    • …  

    • </one-of> </rule>

    • </grammar>

    Developing & Delivering Multimodal Applications


    Answer to exercise 9
    Answer to Exercise 9 (continued)

    • <grammar type = "application/srgs+xml" root = "yes" mode = "voice">

    • <rule id = "yes">       <one-of>              <item> yes </item>              <item> sure <tag> out = "yes" </tag> </item> <item> affirmative <tag> out = "yes" </tag> </item> …

    • </one-of> </rule>

    • </grammar>

    Developing & Delivering Multimodal Applications


    Answer to exercise 10

    <interpretation mode = "speech"> (continued) <moneyTransfer> <sourceAcct hook="ink"/> <targetAcct hook="ink"/> <amount> 300 </amount> </moneyTransfer></interpretation>

    <interpretation mode = "ink"> <moneyTransfer> <sourceAcct> savings </sourceAcct> <targetAcct> checking</targetAcct> </moneyTransfer></interpretation>

    Answer to Exercise 10

    Given the following two EMMA specifications,

    what is the unified EMMA specification?

    • <interpretation mode = "intp1"> <moneyTransfer> <sourceAcct> savings </sourceAcct> <targetAcct> checking</targetAcct> <amount> 300 </amount> </moneyTransfer></interpretation>

    Developing & Delivering Multimodal Applications


    Answer to exercise 11
    Answer to Exercise 11 (continued)

    • <html xmlns= "http://www.w3.org/1999/xhtml" xmlns:vxml= "http://www.w3.org/2001/vxml" xmlns:ev= "http://www.w3.org/2001/xml-events" xmlns:xv="http://www.w3.org/2002/xhtml+voice">

    • <head> <xv:sync xv:input="in4" xv:field="#answer"/> <vxml:form id= "stateForm"> <vxml:field name= "state" xv:id= "answer"> <vxml:prompt>Say a state name</vxml:prompt> <vxml:grammar src = "state.grxml"/> </vxml:field> </vxml:form></head>

    • <body <form ev:event="load" ev:handler="#stateForm"> Result: <input type="text" name="in4"/> </form></body>

    • </html>

    Developing & Delivering Multimodal Applications


    Exercise 121
    Exercise 12 (continued)

    • What should HTML do when it receives each of the following events?

    • Reset

      • Reset the value

  • Change

    • Change the value

  • Focus

    • Prompt for the value now in focus

  • Developing & Delivering Multimodal Applications


    Exercise 131
    Exercise 13 (continued)

    • What should HTML do when it receives each of the following events?

    • Reset

      • Reset the value

      • Author decides if cursor should be moved to the reset value

  • Change

    • Change the value

    • Author decides if cursor should be moved to the reset value

  • Focus

    • Move the cursor to the item in focus

  • Developing & Delivering Multimodal Applications


    ad