tutorial
Download
Skip this Video
Download Presentation
Tutorial

Loading in 2 Seconds...

play fullscreen
1 / 131

Tutorial - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

Tutorial. Developing and Deploying Multimodal Applications James A. Larson Larson Technical Services jim @ larson-tech.com SpeechTEK West February 23, 2007 . Developing and Deploying Multimodal Applications. What applications should be multimodal?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Tutorial' - star


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
tutorial

Tutorial

Developing and Deploying Multimodal Applications

James A. LarsonLarson Technical Servicesjim @ larson-tech.com

SpeechTEK WestFebruary 23, 2007

developing and deploying multimodal applications
Developing and Deploying Multimodal Applications
  • What applications should be multimodal?
  • What is the multimodal application development process?
  • What standard languages can be used to develop multimodal applications?
  • What standard platforms are available for multimodal applications?

Developing & Delivering Multimodal Applications

capturing input from the user

Acoustic

Microphone

Speech

Keypad

Key

Keyboard

Pen

Ink

Tactile

Mouse

GUI

Joystick

Scanner

Photograph

Visual

Still camera

Movie

Video camera

Capturing Input from the User

Input Device

Medium

Mode

Developing & Delivering Multimodal Applications

capturing input from the user1
Capturing Input From the User

Multimodal

Input Device

Medium

Mode

Acoustic

Microphone

Speech

Keypad

Key

Keyboard

Pen

Ink

Tactile

Mouse

GUI

Joystick

Scanner

Photograph

Visual

Still camera

Gaze tracking

Gesture reco

Video camera

RFID

Digital data

Electronic

Biometric

GPS

Developing & Delivering Multimodal Applications

presenting output to the user
Presenting Output to the User

Output Device

Medium

Mode

Acoustic

Speaker

Speech

Text

Photograph

Visual

Display

Movie

Tactile

Joystick

Pressure

Developing & Delivering Multimodal Applications

presenting output to the user1
Presenting Output to the User

Multimedia

Output Device

Medium

Mode

Acoustic

Speaker

Speech

Text

Photograph

Visual

Display

Movie

Tactile

Joystick

Pressure

Developing & Delivering Multimodal Applications

multimodal and multimedia application benefits
Multimodal and Multimedia Application Benefits
  • Provide a natural user interface by using multiple channels for user interactions
  • Simplify interaction with small devices with limited keyboard and display, especially on portable devices
  • Leverage advantages of different modes in different contexts
  • Decrease error rates and time required to perform tasks
  • Increase accessibility of applications for special users
  • Enable new kinds of applications

Developing & Delivering Multimodal Applications

exercise 1
Exercise 1
  • What new multimodal applications would be useful for your work?
  • What new multimodal applications would be entertaining to you, your family, or friends?

Developing & Delivering Multimodal Applications

voice as a third hand
Voice as a “Third Hand”
  • Game Commander 3
    • http://www.gamecommander.com/

Developing & Delivering Multimodal Applications

voice enabled games
Voice-Enabled Games
  • Scansoft’s VoCon Games Speech SDK
    • http://www.scansoft.com/games/
    • PlayStation® 2
    • Nintendo® GameCube™
    • http://www.omnipage.com/games/poweredby/

Developing & Delivering Multimodal Applications

education
Education

Tucker Maxon School of Oral Education

http://www.tmos.org/

Developing & Delivering Multimodal Applications

education1
Education

Reading Tutor Project

http://cslr.colorado.edu/beginweb/reading/reading.html

Developing & Delivering Multimodal Applications

multimodal applications developed by psu and ohsu students
Multimodal Applications Developed by PSU and OHSU Students
  • Hands-busy
  • Troubleshooting a car’s motor
  • Repairing a leaky faucet
  • Tune musical instruments
  • Construction
  • Complex origami artifact Project book for children
  • Cooking—Talking recipe book
  • Entertainment
  • Child’s fairy tale book Audio-controlled juke box Games (Battleship, Go)

Developing & Delivering Multimodal Applications

multimodal applications developed by psu and ohsu students continued
Multimodal Applications Developed by PSU and OHSU Students (continued)
  • Data collection
  • Buy a car Collect health data Buy movie tickets Order meals from a restaurant Conduct banking business Locate a business Order a computer Choose homeless pets from an animal shelter
  • Authoring Photo album tour
  • Education
  • Flash cards—Addition tables

Download Opera and the speech plug-inGo to www.larson-tech.com/mm-Projects/Demos.htm

Developing & Delivering Multimodal Applications

new application classes
New Application Classes
  • Active listening
  • Verbal VCR controls: start, stop, fast forward, rewind, etc.
  • Virtual assistants
  • Listen for requests and immediately perform them
  • - Violin tuner - TV Controller - Environmental controller - Family-activity coordinator
  • Synthetic experiences
  • Synthetic interviews Speech-enabled games Education and training
  • Authoring content

Developing & Delivering Multimodal Applications

two general uses of multiple modes of input
Two General Uses of Multiple Modes of Input
  • Redundancy—One mode acts as backup for another mode
  • In noisy environments, use keypad instead of speech input.
  • In cold environments, use speech instead of keypad.
  • Complementary—One mode supplements another mode
  • Voice as a third hand
  • “Move that (point) to there (point)” (late fusion)
  • Lip reading = video + speech (early fusion)

Developing & Delivering Multimodal Applications

potential problems with multimodal applications
Potential Problems with Multimodal Applications
  • Voice may make an application “noisy.”
    • Privacy and security concerns
    • Noise pollution
  • Sometimes speech and handwriting recognition systems fail.
  • False expectations of users wanting to use natural language.

Developing & Delivering Multimodal Applications

potential problems with multimodal applications1
Potential Problems with Multimodal Applications
  • Voice may make an application “noisy.”
    • Privacy and security concerns
    • Noise pollution
  • Sometimes speech and handwriting recognition systems fail.
  • False expectations of users wanting to use natural language.
  • Full natural language processing requires:
  • Knowledge of outside world
  • History of the user-computer interaction
  • Sophisticated understanding of language structure
  • “Natural language-like” simulates natural language for a small domain, short history, and specialized language structures

Developing & Delivering Multimodal Applications

potential problems with multimodal applications2
Potential Problems with Multimodal Applications
  • Voice may make an application “noisy.”
    • Privacy and security concerns
    • Noise pollution
  • Sometimes speech and handwriting recognition systems fail.
  • False expectations of users wanting to use natural language.

Possible only

on Star Trek

  • Full “natural language” processing requires:
  • Knowledge of outside world
  • History of the user-computer interaction
  • Sophisticated understanding of language structure
  • “Natural language-like” simulates natural language for a small domain, short history, and specialized language structures.

Incorrectly

called “NLP”

Developing & Delivering Multimodal Applications

adding a new mode to an application
Adding a New Mode to an Application
  • Only if…
  • The new mode enables new features not previously possible.
  • The new modes dramatically improves the usability
  • Always….
  • Redesign the application to take advantage of the new mode.
  • Provide backup for the new mode.
  • Test, test, and test some more.

Developing & Delivering Multimodal Applications

exercise 2
Exercise 2
  • Where will multimodal applications be used?
  • A. At home
  • B. At work
  • C. “On the road”
  • D. Other?

Developing & Delivering Multimodal Applications

developing and deploying multimodal applications1
Developing and Deploying Multimodal Applications
  • What applications should be multimodal?
  • What is the multimodal application development process?
  • What standard languages can be used to develop multimodal applications?
  • What standard platforms are available for multimodal applications?

Developing & Delivering Multimodal Applications

the playbill who s who on the team
The Playbill—Who’s Who on the Team 
  • Users—Their lives will be improved by using the multimodal application
  • Interaction designer—Designs the dialog—when and how the user and system interchange requests and information
  • Multimodal programmer—Implements VUI 
  • Voice talent—Records spoken prompts and messages
  • Grammar writer—Specifies words and phrases the user may speak in response to a prompt
  • TTS specialist—Specifies verbal and audio sounds and inflections
  • Quality assurance specialist—Performs tests to validate the application is both useful and usable
  • Customer—Pays the bills
  • Program manager—Organizes the work and makes sure it is completed according to schedule and under budget

Developing & Delivering Multimodal Applications

development process
Development Process
  • Investigation Stage
  • Design Stage
  • Development Stage
  • Testing Stage
  • Sustaining Stage

Each stage involves users

Iterative refinement

Developing & Delivering Multimodal Applications

development process1
Development Process
  • Investigation Stage
  • Design Stage
  • Development Stage
  • Testing Stage
  • Sustaining Stage
  • Identify the Application
  • Conduct ethnography studies
  • Identify candidate applications
  • Conduct focus groups
  • Select the application

Developing & Delivering Multimodal Applications

exercise 3
Exercise 3
  • What will be the “killer” consumer multimodal applications?

Developing & Delivering Multimodal Applications

development process2
Development Process
  • Investigation Stage
  • Design Stage
  • Development Stage
  • Testing Stage
  • Sustaining Stage
  • Specify the Application
  • Construct the conceptual model
  • Construct scenarios
  • Specify performance and preference requirements

Developing & Delivering Multimodal Applications

specify performance and preference requirements
Specify Performance and Preference Requirements

Is the application useful?

Is the application enjoyable?

Performance

Preference

Measure users’ likes and dislikes.

Measure what the users actually accomplished.

Validate that the users enjoyed the application and will use it again again.

Validate that the users achieved success.

Developing & Delivering Multimodal Applications

performance metrics

User Task

Measure

Typical Criteria

Speak a command

Word error rate

Less than 3%

The caller supplies values into a form

Enters valid values into each field of a form

< 5 seconds per value

Navigate a list

The user successfully selects the specified option.

Greater than 95%

Purchase a product

The user successfully completes the purchase option.

Greater than 93%

Performance Metrics

Developing & Delivering Multimodal Applications

exercise 4

User Task

Measure

Typical Criteria

Exercise 4

Specify performance metrics for the multimodal email application

Developing & Delivering Multimodal Applications

preference metrics

Question

Typical Criteria

On a scale from 1 to 10, rate the help facility.

The average caller score is greater than 8.

On a scale from 1 to 10, rate the ease of use of this application.

The average caller score is greater than 8.

Would you recommend using this voice portal to a friend?

Over 80% of callers respond by saying “yes.”

What would you be willing to pay to each time you use this application?

Over 80% of callers indicate that they are willing to pay $1.00 or more per use.

Preference Metrics

Developing & Delivering Multimodal Applications

exercise 5

Question

Typical Criteria

Exercise 5

Specify preference metrics for the multimodal email application

Developing & Delivering Multimodal Applications

preference metrics open ended questions
Preference Metrics (Open-ended Questions)
  • What did you like the best about this voice-enabled application? (Do not change these features.)
  • What did you like the least about this voice-enabled application? (Consider changing these features.)
  • What new features would you like to have added? (Consider adding these features in this or a later release.)
  • What features do you think you will never use? (Consider deleting these features.)
  • Do you have any other comments and suggestions? (Pay attention to these responses. Callers frequently suggest very useful ideas.)

Developing & Delivering Multimodal Applications

development process3
Development Process
  • Investigation Stage
  • Design Stage
  • Development Stage
  • Testing Stage
  • Sustaining Stage
  • Develop the Application
  • Specify the persona
  • Specify the modes and modalities
  • Specify the dialog script

Developing & Delivering Multimodal Applications

ui design guidelines
UI Design Guidelines
  • Guidelines for Voice User Interfaces
    • Bruce Balentine and David P. Morgan. How to Build a Speech Recognition Application, Second Edition. http://www.eiginc.com

Guidelines for Graphical User Interfaces

    • Research-Based Web Design and Usability Guidelines. U.S. Department of Health and Human Services. http://www.usability.gov/pdfs/guidelines.html
  • Guidelines for Graphical User Interfaces
    • Common Sense Guidelines for Developing Multimodal User Interfaces.W3C Working Group Note. 19 April 2006 http://www.w3.org/2002/mmi/Group/2006/Guidelines/

Developing & Delivering Multimodal Applications

common sense suggestions 1 satisfy real world constraints
Common-sense Suggestions1. Satisfy Real-World Constraints
  • Task-oriented Guidelines
  • 1.1. Guideline: For each task, use the easiest mode available on the device.
  • Physical Guidelines
  • 1.2. Guideline: If the user’s hands are busy, then use speech.
  • 1.3. Guideline: If the user’s eyes are busy, then use speech.
  • 1.4. Guideline: If the user may be walking, use speech for input.
  • Environmental Guidelines
  • 1.5. Guideline: If the user may be in a noisy environment, then use a pen, keys or mouse.
  • 1.6. Guideline: If the user’s manual dexterity may be impaired, then use speech.

Developing & Delivering Multimodal Applications

exercise 6
Exercise 6
  • What input mode(s) should be used for each of the following tasks?
  • A. Selecting objects
  • B. Entering text
  • C. Entering symbols
  • D. Enter sketches or illustrations

Developing & Delivering Multimodal Applications

common sense suggestions 2 communicate clearly concisely and consistently with users
Common-sense Suggestions2. Communicate Clearly, Concisely, and Consistently with Users
  • Consistency Guidelines
  • 2.1. Phrase all prompts consistently.
  • 2.2. Enable the user to speak keyword utterances rather than natural language sentences.
  • 2.3. Switch presentation modes only when the information is not easily presented in the current mode.
  • 2.4. Make commands consistent.
  • 2.5. Make the focus consistent across modes.
  • Organizational Guidelines
  • 2.6. Use audio to indicate the verbal structure.
  • 2.7. Use pauses to divide information into natural “chunks.”
  • 2.8. Use animation and sound to show transitions.
  • 2.9. Use voice navigation to reduce the number of screens.
  • 2.10. Synchronize multiple modalities appropriately.
  • 2.11. Keep the user interface as simple as possible.

Developing & Delivering Multimodal Applications

common sense suggestions 3 help users recover quickly and efficiently from errors
Common-sense Suggestions3. Help Users Recover Quickly and Efficiently from Errors
  • Conversational Guidelines
  • 3.1. Users tend to use the same mode that was used to prompt them.
  • 3.2. If privacy is not a concern, use speech as output to provide commentary or help.
  • 3.3. Use directed user interfaces, unless the user is always knowledgeable and experienced in the domain.
  • 3.4 Always provide context-sensitive help for every field and command.

Developing & Delivering Multimodal Applications

common sense suggestions 3 help users recover quickly and efficiently from errors continued
Common-sense Suggestions3. Help Users Recover Quickly and Efficiently from Errors (Continued)
  • Reliability Guidelines
  • Operational status
  • 3.5. The user always should be able to determine easily if the device is listening to the user.
  • 3.6. For devices with batteries, users always should be able to determine easily how much longer the device will be operational.
  • 3.8. Support at least two input modes so one input mode can be used when the other cannot.
  • Visual feedback
  • 3.8. Present words recognized by the speech recognition system on the display, so the user can verify they are correct.
  • 3.9. Display the n-best list to enable easy speech recognition error correction
  • 3.10. Try to keep response times less than 5 seconds. Inform the user of longer response times.

Developing & Delivering Multimodal Applications

common sense suggestions 4 make users comfortable
Common-sense Suggestions4. Make Users Comfortable
  • Listening mode
  • 4.1. Speak after pressing a speak key. which automatically releases after the user finishes speaking.
  • System Status
  • 4.2. Always present the current system status to the user.
  • Human-memory Constraints
  • 4.3. Use the screen to ease stress on the user’s short-term memory.

Developing & Delivering Multimodal Applications

common sense suggestions 4 make users comfortable continued
Common-sense Suggestions4. Make Users Comfortable (Continued)
  • Social Guidelines
  • 4.4. If the user may need privacy, use a display rather than render speech.
  • 4.5. If the user may need privacy, use a pen or keys.
  • 4.6. If the device may be used during a business meeting, then use a pen or keys (with the keyboard sounds turned off).
  • Advertising Guidelines
  • 4.7. Use animation and sound to attract the user’s attention.
  • 4.8. Use landmarks to help the know where he is.

Developing & Delivering Multimodal Applications

common sense suggestions 4 make users comfortable continued1
Common-sense Suggestions4. Make Users Comfortable (continued)
  • Ambience
  • 4.9 Use audio and graphic design to set the mood and convey emotion in games and entertainment applications.
  • Accessibility
  • 4.10 For each traditional output technique, provide an alternative output technique.
  • 4.11. Enable users to adjust the output presentation.

Developing & Delivering Multimodal Applications

books
Books
  • Ramon Lopez-Cozar Delgado and Masahiro Araki. Spoken, Multilingual and Multimodal Dialog Systems—Development and Assessment. West Sussex, England: Wiley, 2005.
  • Julie A. Jacko and Andrew Sears (Editors) The Human-Computer Interaction Handbook—Fundamentals, Evolving technologies, and Emerging Applications. Mahwah, New Jersey: Lawrence Erlbaum Associates, 2003.

Developing & Delivering Multimodal Applications

development process4
Development Process
  • Investigation Stage
  • Design Stage
  • Development Stage
  • Testing Stage
  • Sustaining Stage
  • Test The Application
  • Component test
  • Usability test
  • Stress test
  • Field test

Developing & Delivering Multimodal Applications

testing resources
Testing Resources
  • Jeffrey Rubin. Handbook of Usability Testing. New York: Wiley Technical Communication Library, 1994.
  • Peter and David Leppik. Gourmet Customer Service. Eden Prairie, MN: VocalLabs, 2005. [email protected]

Developing & Delivering Multimodal Applications

development process5
Development Process
  • Investigation Stage
  • Design Stage
  • Development Stage
  • Testing Stage
  • Sustaining Stage
  • Deploy and Monitor the Application
  • User Survey
  • Usage reports from log files
  • User feedback and comments

Developing & Delivering Multimodal Applications

developing and deploying multimodal applications2
Developing and Deploying Multimodal Applications
  • What applications should be multimodal?
  • What is the multimodal application development process?
  • What standard languages can be used to develop multimodal applications?
  • What standard platforms are available for multimodal applications?

Developing & Delivering Multimodal Applications

w3c multimodal interaction framework
W3C Multimodal Interaction Framework
  • Recognition Grammar
  • Semantic Interpretation
  • Extended Multimodal Annotation (EMMA)
  • Speech Synthesis
  • Interaction Managers

General description of speech application components and how they relate

Developing & Delivering Multimodal Applications

w3c multimodal interaction framework1
W3C Multimodal Interaction Framework

Input

Interaction

Manager

Application

Functions

Output

Telephony

Properties

Developing & Delivering Multimodal Applications

slide52

W3C Multimodal Interaction Framework

Interaction

Manager

Application

Functions

Information

Integration

ASR

Semantic

Interpretation

Ink

Display

Telephony

Functions

Audio

Media

Planning

User

TTS

Language

Generation

Developing & Delivering Multimodal Applications

slide53

W3C Multimodal Interaction Framework

SRGS: Describe what the user may say at each point in the dialog

Interaction

Manager

Application

Functions

Information

Integration

ASR

Semantic

Interpretation

Ink

Display

Telephony

Functions

Audio

Media

Planning

User

TTS

Language

Generation

Developing & Delivering Multimodal Applications

speech recognition engines

Low-end

High-end

Other

Speaking mode

Isolated (discrete)

Continuous

Keywords

Enrollment

Speaker dependent

Speaker independent

Adaptive

Vocabulary size

Small

Large

Switch vocabularies

Speaking style

Read

Spontaneous

Number of simultaneous callers

Single-threaded

Multi-threaded

Speech Recognition Engines

Developing & Delivering Multimodal Applications

speech recognition engines1

Low-end

High-end

Other

Speaking mode

Isolated (discrete)

Continuous

Keywords

Enrollment

Speaker dependent

Speaker

independent

Adaptive

Vocabulary size

Small

Large

Speaking style

Read

Spontaneous

Number of simultaneous callers

Single-threaded

Multi-threaded

Speech Recognition Engines

Switch

vocabularies

Developing & Delivering Multimodal Applications

grammars
Grammars
  • Describe what the user may say or handwrite at a point in the dialog
  • Enable the recognition engine to work faster and more accurately
  • Two types of grammars:
      • Structured Grammar
      • Statistical Grammar (N-grams)

Developing & Delivering Multimodal Applications

structured grammars
Structured Grammars
  • Specifies words that a user may speak or write
  • Two representation formats

1. Backus-Naur format (ABNF)

Production Rules

Single_digit ::= zero | one | two | … | nine

Zero_thru_ten ::= Single_digit | ten

2. XML format

Can be processed by XML validater

Developing & Delivering Multimodal Applications

example xml grammar
Example XML Grammar
  • <grammar mode = "voice" type = "application/srgs+xml" root = "zero_to_ten“><rule id = "zero_to_ten">       <one-of>              <ruleref uri = "#single_digit"/>              <item> ten </item>        </one-of></rule>
  •      <rule id = "single_digit">          <one-of>               <item> zero </item>               <item> one </item>               <item> two </item>               <item> three </item>               <item> four </item>               <item> five </item>               <item> six </item>               <item> seven </item>               <item> eight </item>              <item> nine </item>          </one-of>     </rule></grammar>

Developing & Delivering Multimodal Applications

exercise 7
Exercise 7
  • Write a grammar that recognizes the digits zero through nineteen
  • (Hint: Modify the previous page)

Developing & Delivering Multimodal Applications

reusing existing grammars
Reusing Existing Grammars
  • <grammar
  • type = "application/srgs+xml" root = "size " src = "http://www.example.com/size.grxml"/>

Developing & Delivering Multimodal Applications

exercise 8
Exercise 8
  • Write a grammar for positive responses to a yes/no question (i.e., “yes,” “sure,” “affirmative,” and so forth)

Developing & Delivering Multimodal Applications

when is a grammar too large
When Is a Grammar Too Large?

Word

Coverage

Response

Developing & Delivering Multimodal Applications

slide63

W3C Multimodal Interaction Framework

SISR: A procedural JavaScript-like language for interpreting the text strings returned by the speech synthesis engine

Interaction

Manager

Application

Functions

Information

Integration

ASR

Semantic

Interpretation

Ink

Display

Telephony

Functions

Audio

Media

Planning

User

TTS

Language

Generation

Developing & Delivering Multimodal Applications

semantic interpretation
Semantic Interpretation
  • Semantic scripts employ ECMAScript
  • Advantages:
      • Translate aliases to vocabulary words
      • Perform calculations
      • Produces a rich structure rather than a text string

Developing & Delivering Multimodal Applications

semantic interpretation1

Large white

t-shirt

Semantic Interpretation

Grammar

Recognizer

Conversation

Manager

Big white

t-shirt

Developing & Delivering Multimodal Applications

semantic interpretation2
Semantic Interpretation

Big white

t-shirt

Grammar with

Semantic

Interpretation

Scripts

<rule id = "action">

<one-of>     <item> small <tag> out.size = "small"; </tag> </item>        <item> medium <tag> out.size = "medium"; </tag> </item> <item> large <tag> out.size = "large"; </tag> </item> <item> big <tag> out.size = "large"; </tag> </item>

    </one-of> <one-of>     <item> green <tag> out.color = "green"; </tag> </item>        <item> blue   <tag> out.color = "blue"; </tag>  </item>        <item> white <tag> out.color = "white"; </tag>  </item>    </one-of></rule>

Recognizer

Semantic

Interpretation

Processor

Conversation

Manager

{

size: large

color: white

}

Developing & Delivering Multimodal Applications

exercise 9 modify this rule to return only yes
Exercise 9 Modify this rule to return only “yes”
  • <grammar type = "application/srgs+xml" root = "yes" mode = "voice">
  • <rule id = "yes">       <one-of>              <item> yes </item>              <item> sure </item> <item> affirmative </item>
  • …  
  • </one-of> </rule>
  • </grammar>

Developing & Delivering Multimodal Applications

slide68

W3C Multimodal Interaction Framework

EMMA: A language for representing the semantic content from speech recognizers, handwriting recognizers, and other input devices

Interaction

Manager

Application

Functions

Information

Integration

ASR

Semantic

Interpretation

Ink

Display

Telephony

Functions

Audio

Media

Planning

User

TTS

Language

Generation

Developing & Delivering Multimodal Applications

slide69
EMMA
  • Extensible MultiModal Annotation markup language
  • Canonical structure semantic interpretations for a variety of inputs including:
    • Speech
    • Natural language text
    • GUI
    • Ink

Developing & Delivering Multimodal Applications

slide70
EMMA

Speech

Keyboard

Grammar

+ Semantic

Interpretation

Instructions

Interpretation

Instructions

Speech

Recognition

Keyboard

Interpretation

EMMA

EMMA

Merging/

Unification

EMMA

Applications

Developing & Delivering Multimodal Applications

slide71
EMMA

Speech

Keyboard

Grammar

+ Semantic

Interpretation

Instructions

Interpretation

Instructions

Speech

Recognition

Keyboard

Interpretation

EMMA

EMMA

Merging/

Unification

<interpretation mode = "speech">

<travel>

<to hook="ink"/>

<from hook="ink"/>

<day> Tuesday </day>

</travel>

</interpretation>

EMMA

Applications

Developing & Delivering Multimodal Applications

slide72
EMMA

Speech

Keyboard

Grammar

+ Semantic

Interpretation

Instructions

Interpretation

Instructions

Speech

Recognition

Keyboard

Interpretation

EMMA

EMMA

Merging/

Unification

<interpretation mode = "speech">

<travel>

<to hook="ink"/>

<from hook="ink"/>

<day> Tuesday </day>

</travel>

</interpretation>

<interpretation mode = "ink">

<travel>

<to>Las Vegas </to>

<from>Portland </from>

</travel>

</interpretation>

EMMA

Applications

Developing & Delivering Multimodal Applications

slide73
EMMA

<interpretation mode = "interp1">

<travel>

<to> Las Vegas </to>

<from> Portland </from>

<day> Tuesday </day>

</travel>

</interpretation>

Speech

Keyboard

Grammar

+ Semantic

Interpretation

Instructions

Interpretation

Instructions

Speech

Recognition

Keyboard

Interpretation

EMMA

EMMA

Merging/

Unification

<interpretation mode = "speech">

<travel>

<to hook="ink"/>

<from hook="ink"/>

<day> Tuesday </day>

</travel>

</interpretation>

<interpretation mode = "ink">

<travel>

<to>Las Vegas </to>

<from>Portland </from>

</travel>

</interpretation>

EMMA

Applications

Developing & Delivering Multimodal Applications

exercise 10
<interpretation mode = "speech"> <moneyTransfer> <sourceAcct hook="ink"/> <targetAcct hook="ink"/> <amount> 300 </amount> </moneyTransfer></interpretation>

<interpretation mode = "ink"> <moneyTransfer> <sourceAcct> savings </sourceAcct> <targetAcct> checking</targetAcct> </moneyTransfer></interpretation>

Exercise 10

Given the following two EMMA specifications,

what is the unified EMMA specification?

Unified EMMA specification:

  • <interpretation mode ="intp1"> <moneyTransfer> <sourceAcct> ______ </sourceAcct> <targetAcct> _______</targetAcct> <amount> ______ </amount> </moneyTransfer></interpretation>

Developing & Delivering Multimodal Applications

slide75

W3C Multimodal Interaction Framework

SSML: A language for rendering text as synthesized speech

Interaction

Manager

Application

Functions

Information

Integration

ASR

Semantic

Interpretation

Ink

Display

Telephony

Functions

Audio

Media

Planning

User

TTS

Language

Generation

Developing & Delivering Multimodal Applications

speech synthesis markup language
Speech Synthesis Markup Language

Structure

Analysis

Text

Normali-

zation

Text-to-

Phoneme

Conversion

Prosody

Analysis

Waveform

Production

Markup support:

phoneme, sayas

Non-markup behavior:

look up in pronunciation

dictionary

Markup support:

paragraph, sentence

Non-markup behavior:

infer structure by

automated text analysis

Markup support:

emphasis, break, prosody

Non-markup behavior:

automatically generate

prosody through analysis

of document structure and

sentence syntax

Markup support:sayas for dates, times, etc.

Non-markup behavior: automatically identify

and convert constructs

Developing & Delivering Multimodal Applications

speech synthesis markup language examples
Speech Synthesis Markup LanguageExamples
  • <phoneme alphabet="ipa" ph="wɪnɛfɛks"> WinFX </phoneme>is a great platform
  • <prosody pitch = "x-low">      Who’s been sleeping in my bed? </prosody>  said papa bear. <prosody pitch = "medium">     Who’s been sleeping in my bed? </prosody> said momma bear.  <prosody pitch = "x-high">      Who’s been sleeping in my bed? </prosody> said baby bear.

Developing & Delivering Multimodal Applications

popular strategy
Popular Strategy
  • Develop dialogs using SSML
  • Usability test dialogs
  • Extract prompts
  • Hire voice talent to record prompts
  • Replace <prompt> with <audio>

Developing & Delivering Multimodal Applications

slide79

W3C Multimodal Interaction Framework

VXML: A language for controlling the exchange of information and commands between the user and the system

Interaction

Manager

Application

Functions

Information

Integration

ASR

Semantic

Interpretation

Ink

Display

Telephony

Functions

Audio

Media

Planning

User

TTS

Language

Generation

Developing & Delivering Multimodal Applications

developing and deploying multimodal applications3
Developing and Deploying Multimodal Applications
  • What applications should be multimodal?
  • What is the multimodal application development process?
  • What standard languages can be used to develop multimodal applications?
  • What standard platforms are available for multimodal applications?

Developing & Delivering Multimodal Applications

speech apis and sdks
Speech APIs and SDKs
  • JSAPI—Java Speech Application Program Interface
    • http://java.sun.com/products/java-media/speech/
    • http://developer.mozilla.org/en/docs/JSAPI_Reference
  • Nuance Mobil Speech Platform
    • http://www.nuance.com/speechplatform/components.asp
  • VSAPI—Voice Signal API
    • http://www.voicesignal.com/news/articles/2006-06-21-SymbianOne.htm
  • SALT
    • http://www.saltforum.org/

Developing & Delivering Multimodal Applications

interaction manager approaches
Interaction Manager Approaches

X+V

W3C

Object-

oriented

Interaction

Manager

(C#)

Interaction

Manager

(XHTML)

Interaction

Manager

(SCXML)

VoiceXML 2.0

Modules

XHTML

SAPI 5.3

VoiceXML 3.0

InkML

Developing & Delivering Multimodal Applications

interaction manager approaches1
Interaction Manager Approaches

X+V

W3C

Object-

oriented

Interaction

Manager

(C#)

Interaction

Manager

(XHTML)

Interaction

Manager

(SCXML)

VoiceXML 2.0

Modules

XHTML

SAPI 5.3

VoiceXML 3.0

InkML

Developing & Delivering Multimodal Applications

sapi 5 3 windows vista speech synthesis
SAPI 5.3 & Windows Vista™Speech Synthesis
  • W3C Speech Synthesis Markup Language 1.0
  • <speak> <phoneme alphabet="ipa" ph="wɪnɛfɛks"> WinFX </phoneme> is a great platform </speak>
  • Microsoft proprietary PromptBuilder
  • myPrompt.AppendTextWithPronunciation ("WinFX", "wɪnɛfɛks"); myPrompt.AppendText("is a great platform.");

Object-

oriented

Interaction

Manager

(C#)

SAPI 5.3

Developing & Delivering Multimodal Applications

sapi 5 3 windows vista speech recognition
SAPI 5.3 & Windows Vista™Speech Recognition
  • W3C Speech Recognition Grammar Specification 1.0
  • <grammar type="application/srgs+xml" root= "city" mode="voice"> <rule id = "city"> <one-of> <item> New York City </item> <item> New York </item> <item> Boston </item> </one-of> </rule> </grammar>
  • Microsoft proprietary Grammar Builder
  • Choices cityChoices = new Choices(); cityChoices.AddPhrase ("New York City"); cityChoices.AddPhrase ("New York"); cityChoices.AddPhrase ("Boston"); Grammar pizzaGrammar = new Grammar (new GrammarBuilder(pizzaChoices));

Developing & Delivering Multimodal Applications

sapi 5 3 windows vista semantic interpretation
SAPI 5.3 & Windows Vista™Semantic Interpretation
  • Augment SRGS grammar with Jscript® for semantic interpretation
  • <grammar type="application/srgs+xml" root= "city" mode="voice"> <rule id = "city"> <one-of> <item> New York City <tag> city="JFK" </tag></item> <item> New York <tag> city = "JFK" </tag> </item> <item> Portland <tag> city = "PDX" </tag></item> </one-of> </rule> </grammar>
  • User-Specified “Shortcuts” recognizer replaces “shortcut word”by expanded string
  • User says: my address
  • System: 1033 Smith Street, Apt. 7C, Bloggsville 00000

Developing & Delivering Multimodal Applications

sapi 5 3 windows vista dialog
SAPI 5.3 & Windows Vista™Dialog
  • Introduce the System Speech.Recognition namespace
  • Instantiate a SpeechRecognizer object
  • Build a grammar
  • Attach an event handler
  • Load the grammar into the recognizer
  • When the recognizer hears something that fits the grammar, the SpeechRecognized event handler is invoked, which accesses the Result object and works with the recognized text

Developing & Delivering Multimodal Applications

sapi 5 3 windows vista dialog1
using System;

using System.Windows.Forms;

using System.ComponentModel;

using System.Collections.Generic;

using System.Speech.Recognition;

namespace Reco_Sample_1

{

public partial class Form1 : Form

{

//create a recognizer

SpeechRecognizer _recognizer = new SpeechRecognizer();

public Form1() { InitializeComponent(); }

private void Form1_Load(object sender, EventArgs e)

//Create a pizza grammar

Choices pizzaChoices = new Choices();

pizzaChoices.AddPhrase("I\'d like a cheese pizza");

pizzaChoices.AddPhrase("I\'d like a pepperoni pizza");

{

pizzaChoices.AddPhrase("I\'d like a large pepperoni pizza");

pizzaChoices.AddPhrase(

"I\'d like a small thin crust vegetarian pizza");

Grammar pizzaGrammar =

new Grammar(new GrammarBuilder(pizzaChoices));

//Attach an event handler

pizzaGrammar.SpeechRecognized +=

new EventHandler<RecognitionEventArgs>(

PizzaGrammar_SpeechRecognized);

_recognizer.LoadGrammar(pizzaGrammar);

}

void PizzaGrammar_SpeechRecognized(

object sender, RecognitionEventArgs e)

{

MessageBox.Show(e.Result.Text);

}

}

}

SAPI 5.3 & Windows Vista™Dialog

Developing & Delivering Multimodal Applications

sapi 5 3 windows vista references
SAPI 5.3 & Windows Vista™References
  • Speech API Overview
  • http://msdn2.microsoft.com/en- us/library/ms720151.aspx#API_Speech_Recognition
  • Microsoft Speech API (SAPI) 5.3
  • http://msdn2.microsoft.com/en-us/library/ms723627.aspx
  • “Exploring New Speech Recognition And Synthesis APIs In Windows Vista” by Robert Brown
  • http://msdn.microsoft.com/msdnmag/issues/06/01/ speechinWindowsVista/default.aspx#Resources

Developing & Delivering Multimodal Applications

interaction manager approaches2
Interaction Manager Approaches

X+V

W3C

Object-

oriented

Interaction

Manager

(C#)

Interaction

Manager

(XHTML)

Interaction

Manager

(SCXML)

VoiceXML 2.0

Modules

XHTML

SAPI 5.3

VoiceXML 3.0

InkML

Developing & Delivering Multimodal Applications

step 1 start with standard voicexml and standard xhtml
Step 1: Start with Standard VoiceXML and Standard XHTML
  • VoiceXML
  • <form id="topform"> <field name="city"> <prompt>Say a name</prompt> <grammar src="city.grxml"/> </field> </form>
  • XHTML
  • <form> Result: <input type="text" name="in1"/> </form>

W3C grammar language

Developing & Delivering Multimodal Applications

step 2 combine
Step 2: Combine
  • <html xmlns="http://www.w3.org/1999/xhtml">
  • <head><form id="topform"> <field name="city"> <prompt>Say a name</vxml:prompt> <grammar src ="city.grxml"/> </field></form></head>
  • <body <form> Result: <input type="text" name="in1"/> </form></body>
  • </html>

Developing & Delivering Multimodal Applications

step 3 insert vxml namespace
Step 3: Insert vxml Namespace
  • <html xmlns="http://www.w3.org/1999/xhtml"
  • xmlns:vxml="http://www.w3.org/2001/vxml">
  • <head> <vxml:form id="topform"> <vxml:field name="city"> <vxml:prompt>Say a name</vxml:prompt> <vxml:grammar ="city.grxml"/> </vxml:field> </vxml:form></head>
  • <body> <form> Result: <input type="text" name="in1"/ </form></body>
  • </html>

Developing & Delivering Multimodal Applications

step 4 insert event
Step 4: Insert event
  • <html xmlns=http://www.w3.org/1999/xhtml xmlns:vxml=http://www.w3.org/2001/vxmlxmlns:ev="http://www.w3.org/2001/xml-events">
  • <head> <vxml:form id="topform"> <vxml:field name="city"> <vxml:prompt>Say a name</vxml:prompt> <vxml:grammar src ="city.grxml"/> </vxml:field> </vxml:form></head>
  • <body <form ev:event="load" ev:handler="#topform"> Result: <input type="text" name="in1"/> </form></body>
  • </html>

Developing & Delivering Multimodal Applications

step 5 insert sync
Step 5: Insert <sync>
  • <html xmlns=http://www.w3.org/1999/xhtml xmlns:vxml=http://www.w3.org/2001/vxml xmlns:ev=http://www.w3.org/2001/xml-events xmlns:xv="http://www.w3.org/2002/xhtml+voice">
  • <head> <xv:sync xv:input="in1" xv:field="#result"/> <vxml:form id="topform"> <vxml:field name="city" xv:id="result"> <vxml:prompt>Say a name</vxml:prompt> <vxml:grammar src ="city.grxml"/> </vxml:field> </vxml:form></head>
  • <body <form ev:event="load" ev:handler="#topform"> Result: <input type="text" name="in1"/> </form></body>
  • </html>

Developing & Delivering Multimodal Applications

xhtml plus voice x v references
XHTML plus Voice (X+V) References
  • Available on
      • ACCESS Systems’ NetFront Multimodal Browser for PocketPC 2003

http://www-306.ibm.com/software/pervasive/ multimodal/?Open&ca=daw-prod-mmb

      • Opera Software Multimodal Browser for Sharp Zaurus http://www-306.ibm.com/software/pervasive/ multimodal/?Open&ca=daw-prod-mmb
      • Opera 9 for Windows http://www.opera.com/
  • Programmers Guide
      • ftp://ftp.software.ibm.com/software/pervasive/info/multimodal /XHTML_voice_programmers_guide.pdf
  • For a variety of small illustrative applications
      • http://www.larson-tech.com/MM-Projects/Demos.htm

Developing & Delivering Multimodal Applications

exercise 11
Exercise 11
  • Specify the X+V notation for integrating the following VoiceXML and XHTML code by completing the code on the next page
  • VoiceXML
  • <form id="stateForm"> <field name="state"> <prompt>Say a state name</prompt> <grammar src="city.grxml"/> </field> </form>
  • XHTML
  • <form> Result: <input type="text" name="in1"/> </form>

Developing & Delivering Multimodal Applications

exercise 11 continued
Exercise 11 (continued)
  • <html xmlns="http://www.w3.org/1999/xhtml" xmlns:vxml="http://www.w3.org/2001/vxml" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:xv="http://www.w3.org/2002/xhtml+voice">
  • <head> <xv:sync xv:input="_______" xv:field="________"/> <vxml:form id="________"> <vxml:field name="state" xv:id="________“> <vxml:prompt>Say a state name</vxml:prompt> <vxml:grammar src ="state.grxml"/> </vxml:field> </vxml:form></head>
  • <body <form ev:event="load" ev:handler="#________"> Result: <input type="text" name="_______"/> </form></body>
  • </html>

Developing & Delivering Multimodal Applications

interaction manager approaches3
Interaction Manager Approaches

X+V

W3C

Object-

oriented

Interaction

Manager

(C#)

Interaction

Manager

(XHTML)

Interaction

Manager

(SCXML)

VoiceXML 2.0

Modules

XHTML

SAPI 5.3

VoiceXML 3.0

InkML

Developing & Delivering Multimodal Applications

mmi architecture 4 basic components
MMI Architecture—4 Basic Components
  • Runtime Framework or Browser— initializes application and interprets the markup
  • Interaction Manager—coordinates modality components and provides application flow
  • Modality Components—provide modality capabilities such as speech, pen, keyboard, mouse
  • Data Model—handles shared data

Interaction

Manager

(SCXML)

Data

Model

XHTML

VoiceXML 3.0

InkML

Developing & Delivering Multimodal Applications

multimodal architecture and interfaces
Multimodal Architecture and Interfaces
  • A loosely-coupled, event-based architecture for integrating multiple modalities into applications
  • All communication is event-based
  • Based on a set of standard life-cycle events
  • Components can also expose other events as required
  • Encapsulation protects component data
  • Encapsulation enhances extensibility to new modalities
  • Can be used outside a Web environment

Interaction

Manager

(SCXML)

Data

Model

XHTML

VoiceXML 3.0

InkML

Developing & Delivering Multimodal Applications

specify interaction manager using harel state charts
Specify Interaction Manager Using Harel State Charts

Prepare

State

  • Extension of state transition systems
    • States
    • Transitions
    • Nested state-transition systems
  • Parallel state-transition systems
  • History

Prepare

Response

(fail)

Prepare

Response

(success)

Start

State

StartFail

Start

Response

FailState

WaitState

DoneFail

Done

Success

EndState

Developing & Delivering Multimodal Applications

example state transition system
State Chart XML (SCXML)

<state id="PrepareState">

<send event="prepare" contentURL="hello.vxml"/>

<transition event="prepareResponse" cond="status=\'success\'" target="StartState"/>

<transition event="prepareResponse" cond="status=\'failure\'" target="FailState"/>

</state>

Example State Transition System

Prepare

State

Prepare

Response

(fail)

Prepare

Response

(success)

Start

State

StartFail

Start

Response

FailState

WaitState

DoneFail

Done

Success

EndState

Developing & Delivering Multimodal Applications

example state chart with parallel states
Example State Chart with Parallel States

Prepare

Voice

Prepare

GUI

Prepare

Response

Fail

Prepare

Response

Fail

Prepare

Response

Success

Prepare

Response

Success

Start

Voice

Start

GUI

Start Fail

Start Fail

Start

Response

Fail Voice

Start

Response

Fail GUI

Done Fail

Done Fail

Wait

Voice

Wait

GUI

Done

Success

Done

Success

End

Voice

End

GUI

Developing & Delivering Multimodal Applications

the life cycle events

prepare

prepare

Interaction

Manager

prepareResponse

prepareResponse

GUI

VUI

start

start

Interaction

Manager

startResponse

startResponse

GUI

VUI

cancel

cancel

Interaction

Manager

cancelResponse

cancelResponse

GUI

VUI

pause

pause

Interaction

Manager

pauseResponse

pauseResponse

GUI

VUI

resume

resume

Interaction

Manager

resumeResponse

resumeResponse

GUI

VUI

The Life Cycle Events

Developing & Delivering Multimodal Applications

more life cycle events
More Life Cycle Events

newContextRequest

Interaction

Manager

newContextRequest

newContextResponse

newContextResponse

GUI

VUI

Interaction

Manager

data

data

GUI

VUI

Interaction

Manager

done

GUI

clearContext

clearContext

Interaction

Manager

GUI

VUI

Developing & Delivering Multimodal Applications

synchronization using the lifecycle data event
Intent-based events

Capture the underlying intent rather than the physical manifestation of user-interaction events

Independent of the physical characteristics of particular devices

Data/reset

Reset one or more field values to null

Data/focus

Focus on another field

Data/change

Field value has changed

Synchronization Using the Lifecycle Data Event

Interaction

Manager

data

data

GUI

VUI

Developing & Delivering Multimodal Applications

lifecycle events between interaction manager and modality
ModalityLifecycle Events between Interaction Manager and Modality

Interaction Manager

prepare

Prepare

State

Prepare

Response

Fail

prepare response (failure)

Prepare

Response

Success)

prepare response (success)

start

Start

State

start response (success)

Start Fail

Start

Response

FailState

start response (failure)

DoneFail

WaitState

data

Done

Success

done

EndState

Developing & Delivering Multimodal Applications

mmi architecture principles
MMI Architecture Principles
  • Runtime Framework communicates with Modality Components through asynchronous events
  • Modality Components don’t communicate directly with each other, but indirectly through the Runtime Framework
  • Components must implement basic life cycle events, may expose other events
  • Modality components can be nested (e.g. a Voice Dialog component like a VoiceXML <form>)
  • Components need not be markup-based
  • EMMA communicates users’ inputs to the Interaction Manager

Developing & Delivering Multimodal Applications

modalities
GUI Modality (XHTML)

Adapter converts Lifecycle events to XHTML events

XHTML events converted to lifecycle events

Modalities

Interaction

Manager

(SCXML)

Data

Model

XHTML

VoiceXML 3.0

  • Voice Modality (VoiceXML 3.0)
      • Lifecyle events are embeddedinto VoiceXML 3.0

Developing & Delivering Multimodal Applications

exercise 12
Exercise 12
  • What should VoiceXML do when it receives each of the following events?
  • Reset
  • Change
  • Focus

Developing & Delivering Multimodal Applications

modalities1
VoiceXML 3.0 will support lifecycle events.

<form> <catch name="change"> <assign name="city" value="data"/> </catch>

<field name = "city"> <prompt> Blah </prompt> <grammar src="city.grxml"/> <filled><send event="data.change" data="city"/> </filled> </field>

</form>

Modalities

Interaction

Manager

(SCXML)

Data

Model

XHTML

VoiceXML 3.0

Developing & Delivering Multimodal Applications

exercise 13
Exercise 13
  • What should HTML do when it receives each of the following events?
  • Reset
  • Change
  • Focus

Developing & Delivering Multimodal Applications

modalities2
XHTML is extended to support lifecycle eventssent to a modality.

<head>…<ev:Listener ev:event="onChange" ev:observer="app1" ev:handler="onChangeHandler()";>…<script>{function onChangeHandler()post ("data", data="city")}</script></head>

<body id="app1"? <input type="text" id=city "value= " "/></body>

Modalities

Interaction

Manager

(SCXML)

Data

Model

XHTML

VoiceXML 3.0

Developing & Delivering Multimodal Applications

modalities3
XHTML is extended to support lifecycle eventssent to the interaction manager

<head>…<handler type="text/javascript“ ev:event="data" if (event="change" {document.app1.city.value="data.city"}</handler>…</head>

<body id="app1"? <input type="text" id="city" value=""/>

</body>…

Modalities

Interaction

Manager

(SCXML)

Data

Model

XHTML

VoiceXML 3.0

Developing & Delivering Multimodal Applications

references
References
    • SCXML
      • Second working draft available at http://www.w3.org/TR/2006/WD-scxml-20060124/
      • Open Source available from http://jakarta.apache.org/commons/sandbox/scxml/
    • Multimodal Architecture and Interfaces
      • Working draft available at http://www.w3.org/TR/2006/WD-mmi-arch-20060414/
    • Voice Modality
      • First working draft VoiceXML 3.0 scheduled for November 2007
  • XHTML
      • Full recommendation
      • Adapters must be hand-coded
    • Other modalities
      • TBD

Developing & Delivering Multimodal Applications

comparison
Comparison

Object- oriented X+V W3C

Standard Languages SRGS VoiceXML SCXML

SISR SRGS SRGS

SSML SSML VoiceXML

SISR SSML

XHTML SISR

XHTML

EMMA

CCXML

Interaction Manager C# XHTML SCXML

Modes GUI GUI GUI

Speech Speech Speech

Ink

Developing & Delivering Multimodal Applications

availability
Availability
  • SAPI 5.3
      • Microsoft Windows Vista®

X+V

      • ACCESS Systems’ NetFront Multimodal Browser for PocketPC 2003

http://www-306.ibm.com/software/pervasive/multimodal/?Open&ca=daw-prod-mmb

      • Opera Software Multimodal Browser for Sharp Zaurus http://www-306.ibm.com/software/pervasive/ multimodal/?Open&ca=daw-prod-mmb
      • Opera 9 for Windows http://www.opera.com/

W3C

      • First working draft of VoiceXML 3.0 not yet available
      • Working drafts of SCXML are available; some open-source implementations are available

Proprietary APIs

      • Available from vendor

Developing & Delivering Multimodal Applications

discussion question
Discussion Question
  • Should a developer insert SALT tags or X+V modules into an existing Web page without redesigning the Web page?

Developing & Delivering Multimodal Applications

conclusion
Conclusion
  • Multimodal applications offer benefits over today’s traditional GUIs.
  • Only use multimodal if there is a clear benefit.
  • Standard languages are available today to develop multimodal applications.
  • Don’t reinvent the wheel.
  • Creativity and lots of usability testing are necessary to create world-class multimodal applications.

Developing & Delivering Multimodal Applications

web resources
Web Resources
  • http://www.w3.org/voice
    • Specification of grammar, semantic interpretation, and speech synthesis languages
  • http://www.w3.org/2002/mmi
    • Specification of EMMA and InkML languages
  • http:/www.microsoft.com (and query SALT)
    • SALT specification and download instructions for adding SALT to Internet Explorer
  • http://www-306.ibm.com/software/pervasive/multimodal/
    • X+V specification; download Opera and ACCESS browsers
  • http://www.larson-tech.com/SALT/ReadMeFirst.html
    • Student projects using SALT to develop multimodal applications
  • http://www.larson-tech.com/MMGuide.html or http://www.w3.org/2002/mmi/Group/2006/Guidelines/
    • User interface guidelines for multimodal applications

Developing & Delivering Multimodal Applications

status of w3c multimodal interface languages
Status of W3C Multimodal Interface Languages

Recommendation

Voice

XML 2.0

Speech

Recog-

nition

Grammar

Format

(SRGS)

1.0

Speech

Synthesis

Markup

Language

(SSML)

1.0

Voice

XML 2.1

Proposed

Recommendation

Candidate

Recommendation

Semantic

Interpret-

ation

of

Speech

Recog-

nition

(SISR)

1.0

Last Call

Working Draft

Extended

Multi-

modal

Interaction

(EMMA)

1.0

Working Draft

State

Chart

XML (SCXML)

1.0

InkXL

1.0

Requirements

Developing & Delivering Multimodal Applications

questions
Questions

?

Developing & Delivering Multimodal Applications

answer to exercise 5
Answer to Exercise 5

Developing & Delivering Multimodal Applications

answer to exercise 7 write a grammar for zero to nineteen
Answer to Exercise 7Write a grammar for zero to nineteen
  • <grammar type = "application/srgs+xml" root = "zero_to_19" mode = "voice"><rule id = "zero_to_19">       <one-of>              <ruleref uri = "#single_digit"/>
  •      <ruleref uri ="#teens">
  • </one-of></rule>
  •  <rule id = "single_digit">        <one-of>               <item> zero </item>               <item> one </item>               <item> two </item>               <item> three </item>               <item> four </item>               <item> five </item>               <item> six </item>               <item> seven </item>               <item> eight </item>              <item> nine </item>         </one-of></rule>

<rule id = "#teens">  <one-of>             <item> ten</item> 

<item> eleven </item>         <item> twelve </item>         <item> thirteen </item>         <item> fourteen </item>             <item> fifteen </item>             <item> sixteen </item>             <item> seventeen </item>             <item> eighteen </item>             <item> nineteen </item>     </one-of> </rule>

</grammar>

Developing & Delivering Multimodal Applications

answer to exercise 8
Answer to Exercise 8
  • <grammar type = "application/srgs+xml" root = "yes" mode = "voice">
  • <rule id = "yes">        <one-of>              <item> yes </item>              <item> sure </item> <item> affirmative </item>
  • …  
  • </one-of> </rule>
  • </grammar>

Developing & Delivering Multimodal Applications

answer to exercise 9
Answer to Exercise 9
  • <grammar type = "application/srgs+xml" root = "yes" mode = "voice">
  • <rule id = "yes">       <one-of>              <item> yes </item>              <item> sure <tag> out = "yes" </tag> </item> <item> affirmative <tag> out = "yes" </tag> </item> …
  • </one-of> </rule>
  • </grammar>

Developing & Delivering Multimodal Applications

answer to exercise 10
<interpretation mode = "speech"> <moneyTransfer> <sourceAcct hook="ink"/> <targetAcct hook="ink"/> <amount> 300 </amount> </moneyTransfer></interpretation>

<interpretation mode = "ink"> <moneyTransfer> <sourceAcct> savings </sourceAcct> <targetAcct> checking</targetAcct> </moneyTransfer></interpretation>

Answer to Exercise 10

Given the following two EMMA specifications,

what is the unified EMMA specification?

  • <interpretation mode = "intp1"> <moneyTransfer> <sourceAcct> savings </sourceAcct> <targetAcct> checking</targetAcct> <amount> 300 </amount> </moneyTransfer></interpretation>

Developing & Delivering Multimodal Applications

answer to exercise 11
Answer to Exercise 11
  • <html xmlns= "http://www.w3.org/1999/xhtml" xmlns:vxml= "http://www.w3.org/2001/vxml" xmlns:ev= "http://www.w3.org/2001/xml-events" xmlns:xv="http://www.w3.org/2002/xhtml+voice">
  • <head> <xv:sync xv:input="in4" xv:field="#answer"/> <vxml:form id= "stateForm"> <vxml:field name= "state" xv:id= "answer"> <vxml:prompt>Say a state name</vxml:prompt> <vxml:grammar src = "state.grxml"/> </vxml:field> </vxml:form></head>
  • <body <form ev:event="load" ev:handler="#stateForm"> Result: <input type="text" name="in4"/> </form></body>
  • </html>

Developing & Delivering Multimodal Applications

exercise 121
Exercise 12
  • What should HTML do when it receives each of the following events?
  • Reset
      • Reset the value
  • Change
      • Change the value
  • Focus
      • Prompt for the value now in focus

Developing & Delivering Multimodal Applications

exercise 131
Exercise 13
  • What should HTML do when it receives each of the following events?
  • Reset
      • Reset the value
      • Author decides if cursor should be moved to the reset value
  • Change
      • Change the value
      • Author decides if cursor should be moved to the reset value
  • Focus
      • Move the cursor to the item in focus

Developing & Delivering Multimodal Applications

ad