Introduction to VoiceXML 2.0 - PowerPoint PPT Presentation

slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction to VoiceXML 2.0 PowerPoint Presentation
Download Presentation
Introduction to VoiceXML 2.0

play fullscreen
1 / 106
Introduction to VoiceXML 2.0
237 Views
Download Presentation
maddy
Download Presentation

Introduction to VoiceXML 2.0

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Introduction to VoiceXML 2.0 Rob Marchand Director of Product Management VoiceGenie Technologies Inc.

  2. Introduction to VoiceXML • Audience • Managers and programmers with little experience with VoiceXML • Attendees will learn • The basic principles of VoiceXML, • Just enough syntax to design and code simple speech applications requiring voice menus and voice forms.

  3. VoiceXML in the Marketplace • VoiceXML 2.0 is now ratified as a Recommendation (e.g., official standard) by the W3C • Hundreds of millions of VoiceXML calls are answered every day VoiceXML is the standard for building speech-enabled applications

  4. W3C and VoiceXML Forum • W3C manages the technical evolution and development of the VoiceXML language • VoiceXML Forum focuses on providing best practices, certification testing, resources and tools Together the W3C and VoiceXML Forum accelerate the adoption of VoiceXML-based speech applications

  5. Outline • Motivation for VoiceXML • W3C Speech Interface Framework Languages • Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • Call Control

  6. Motivation for Speech Applications • Users access Web sites from any telephone, anywhere, any time. • Speaking and listening are the natural usage modes for phones.

  7. Speech-enabled Applications Are Possible Now • Increased computing power at less expense • Due to improved chip design and manufacturing techniques • Improved speech recognition • Due to refinements to basic speech recognition algorithms • Improved dialog design using voice • Minimizes the number of words and phrases that the speech recognizer must process at any point during the dialog

  8. Strength of VoiceXML Applications • Traditional system-directed dialogs for novice users • Mixed initiative dialogs for experienced users • Novice users smoothly become experienced users at their own pace

  9. Limitations of VoiceXML Applications • No special analysis of speech input • Not suitable for training speech skills—Reading, ESL, singing, etc. • VUI conversational bandwidth is slower than GUI conversational bandwidth • Using a VUI is like drinking from Lake Superior with a straw

  10. Exercise 1 • Name or describe a speech application you could use at work. • Name or describe a speech application you or family member can use at home.

  11. XML • XML = eXtensible Markup Language • Elements are surrounded by tags • <prompt>Welcome to the voice system </prompt> • Elements may be nested <prompt>     Welcome to Ajax Travel <break/> we have the cheapest fares </prompt> • Elements may have attributes <choice next="#boat"> <grammar type="application/grammar+xml" version="1.0"        root = "by_boat" src = “boat.grxml”>   • Because “<”, “>”, and “&” have special meanings • “&lt;” in place of “<” • “&gt;” in place of  “>” • “&amp;” in place of “&”.                     

  12. Outline • Motivation for VoiceXML • W3C Speech Interface Framework Languages • Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • Call Control

  13. Documents Multimedia Files HTML Scripts VoiceXML Scripts Web Browser DB Voice Browser Capture Voice Grammars ASR Database Server DTMF Replay Audio Audio Files TTS Speech Server/Gateway Web Server

  14. W3C Speech Interface Framework VoiceXML 2.0 Speech Synthesis Call Control SemanticInterpretation Other Grammar

  15. Status of W3C Speech Interface Languages Recommendation VoiceXML 2.0 Grammar Synthesis Proposed Recommendation Candidate Recommendation Semantic Interpret- ration Call Control VoiceXML2.1 Last Call Working Draft Working Draft Requirements V 3 PLS

  16. Outline • Motivation for VoiceXML • W3C Speech Interface Framework Languages • Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • Call Control

  17. <?xml version="1.0"?> <vxml version="2.0"> <form> … <field>   <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode = "voice">      <rule id = “account_type">          <one-of>               <item> savings </item>                <item> checking </item>         <item> CD </item>                 <item> certificate of deposit <tag>$ = “CD”<tag> </item>          </one-of>     </rule> </grammar> </field> …. <form> … </vxml> VoiceXML 2.0 Fragment Dialog Language (VoiceXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Speci

  18. <?xml version="1.0"?> <vxml version="2.0"> <form> … <field> <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode = "voice">      <rule id = “account_type">          <one-of>               <item> savings </item>                <item> checking </item>         <item> CD </item>                 <item> certificate of deposit <tag>$ = “CD”<tag> </item>          </one-of>     </rule> </grammar> </field> …. </form> … </vxml> VoiceXML 2.0 Fragment Dialog Language (VoiceXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification erpretation (SI)

  19. <?xml version="1.0"?> <vxml version="2.0"> <form> … <field> <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode = "voice">      <rule id = “account_type">          <one-of>               <item> savings </item>                <item> checking </item>         <item> CD </item>                 <item> certificate of deposit <tag>$ = “CD”<tag> </item>          </one-of>     </rule> </grammar> </field> …. </form> … </vxml> VoiceXML 2.0 Fragment Dialog Language (VoiceXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI)

  20. <?xml version="1.0"?> <vxml version="2.0"> <form> … <field> <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode = "voice">      <rule id = “account_type">          <one-of>               <item> savings </item>                <item> checking </item>         <item> CD </item>                 <item> certificate of deposit <tag>$ = “CD”<tag> </item>          </one-of>     </rule> </grammar> </field> …. </form> … </vxml> VoiceXML 2.0 Fragment Dialog Language (VoiceXML 2.0) Speech Synthesis Markup Language (SSML) Speech Recognition Grammar Specification (SRGS) Semantic Interpretation (SI)

  21. VoiceXML 2.0 features • Menus, forms, sub-dialogs • <menu>, <form>, <subdialog> • Inputs • Speech recognition <grammar> • Recording <record> • Keypad <grammar mode=“dtmf”> • Output • Audio files <audio> • Text-to-speech <prompt> • Variables • <var> <script> <assign> • Events • <nomatch>, <noinput>, <help>, <catch>, <throw> • Transition and submission • <goto>, <submit> • Telephony • Connection control • <transfer>, <disconnect> • Telephony information • Platform • Objects • Performance • Fetch

  22. A Typical Voice Menu <menu> <prompt> <audio src=“http://www.ajax.com/three_blind_mice.wav"/> Do you want to listen, next, prior, buy, or exit? </prompt> <choice next="http://www.ajax.com/listen.vxml"> listen </choice> <choice next="http://www.ajax.com/next.vxml"> next </choice> <choice next="http://www.ajax.com/prior.vxml"> prior </choice> <choice next="http://www.ajax.com/buy.vxml"> buy </choice> <choice next="http://www.ajax.com/exit.vxml"> exit </choice> </menu> Exercise 2: Write a menu that asks the user a “yes/no” question to confirm that the user wants to buy the audio “three blind mice

  23. Answer to Exercise 2A “yes/no” menu <menu> <prompt> Do you want to buy three blind mice now? </prompt> <choice next="http://www.ajax.com/yes.vxml"> yes </choice> <choice next="http://www.ajax.com/no.vxml"> no </choice> </menu>

  24. <form> <prompt>Welcome to the electronic payment system.</prompt> <field name="card_number"> <prompt> Please enter your credit card number? </prompt> <grammar src=“http://www.ajax.com/credit_card_number.grxml"/> </field> <field name="date"> <prompt>Please enter your expiration date </prompt> <grammar src=“http://www.ajax.com/credit_card_date.grxml"/> </field> </form> Typical Form Fill-In Exercise 3: Write a form that solicits the month, day, and year for the user’s birth date.

  25. Answer to Exercise 3 <form> <prompt> When were you born? </prompt> <field name = "month"> <prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/> </field> <field name = "day"> <prompt> What day of the month? </prompt> <grammar src=“http://www.ajax.com/day.grxml"/> </field> <field name = "year"> <prompt> What year </prompt> <grammar src=“http://www.ajax.com/year.grxml"/> </field> </form>

  26. Event Handlers • Deal with exceptional or error conditions • Control mechanism for dialog turn retries • <catch event=“noinput”> … </catch> • <catch event=“nomatch” … </catch> • <catch event=“help”> … </catch> • Shorthand notation available • <noinput> … </noinput>, etc. • Scoped according to where they occur • <form>, <field>, etc.

  27. Adding Event Handlers <form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> ….. </catch> <prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/> </field> ….. </form>

  28. Adding Event Handlers <form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> ….. </catch> <prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/> </field> ….. </form>

  29. Adding Event Handlers <form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> ….. </catch> <prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/> </field> ….. </form>

  30. Default Event Handlers <catch event = "nomatch"> <prompt> I did not understand, please try again </prompt></catch> <catch event = "help"> <prompt> Sorry, no help is available. </prompt></catch> <catch event = "noinput"> <prompt> I did not hear anything, please speak again </prompt></catch>

  31. Exercise 4Write event handlers for the month field <catch event = "nomatch"> <prompt> __________________________ </prompt></catch> <catch event = "help"> <prompt> ____________________ </prompt></catch> <catch event = "noinput"> <prompt> ___________________________________ </prompt></catch>

  32. Answer to Exercise 4Write event handlers for the month field <catch event = "nomatch"> <prompt> Which month, for example, January February, or March? </prompt></catch> <catch event = "help"> <prompt> In what month were you born? </prompt></catch> <catch event = "noinput"> <prompt> Say the name of the month you were born in </prompt></catch>

  33. Outline • Motivation for VoiceXML • W3C Speech Interface Framework Languages • Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • Call Control

  34. Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: p, s Non-markup behavior: infer structure by automated text analysis

  35. Before and afterStructure Analysis • Before structure analysis • Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He plays bass guitar. He also likes to fish; last week he caught a 19 lb. bass. • After structure analysis <p> <s> Dr. Smith lives at 214 Elm Dr. </s> <s> He weights 214 lb. </s> <s> He plays bass guitar. </s> <s> He also likes to fish; last week he caught a 19 lb. bass. </s> </p>

  36. Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: p, s Non-markup behavior: infer structure by automated text analysis Markup support:say-as for dates, times, etc.sub for aliasing Non-markup behavior: automatically identify and convert constructs

  37. After Text Normalization <p> <s> <sub alias= "doctor">Dr. </sub> Smith lives at 214 Elm <sub alias = "drive">Dr. </sub> </s> <s> He weights 214<sub alias= "pounds"> lb. </sub> </s> <s> He plays bass guitar. </s> <s> He also likes to fish; last week he caught a 19 <sub alias= "pound"> lb. </sub> bass. </s> </p>

  38. <p> <s> <sub alias = "doctor">Dr.</sub> Smith lives at <say-as interpret-as = “address">214 </say-as> Elm <sub alias = "drive">Dr. </sub> </s> <s> He weighs <sayas interpret-as = “number">214 </sayas> <sub alias = "pounds"> lb.</sub> </s> <s> He plays bass guitar. </s> <s> He also likes to fish; last week he caught a <say-as interpret-as = “number">19 </say-as> <sub alias= "pound"> lb. </sub> bass. </s> </p>

  39. Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: p, s Non-markup behavior: infer structure by automated text analysis Markup support:say-as for dates, times, etc.sub for aliasing Non-markup behavior: automatically identify and convert constructs

  40. After text-to-phoneme conversion <p> <s> <sub alias = "doctor">Dr.</sub> Smith lives at <say-as interpret-as = “address"> 214 </sayas> Elm <sub alias = "drive">Dr. </sub> </s> <s> He weighs <sayas interpret-as = “number”>214 </sayas> <sub alias= "pounds"> lb.</sub> </s> <s> He plays <phoneme alphabet = “IPA" ph="b@s">bass</phoneme> guitar. </s> <s> He also likes to fish; last week he caught a <sayas interpret-as= “number">19 </sayas> <sub alias= "pound"> lb. </sub> <phoneme alphabet = “IPA" ph="bas">bass</phoneme>. </s> </p>

  41. Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: p, s Non-markup behavior: infer structure by automated text analysis Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax Markup support:say-as for dates, times, etc.sub for aliasing Non-markup behavior: automatically identify and convert constructs

  42. Prosody Analysis(Initial text) <prompt> Environmental control menu. Do you want to adjust the lighting or temperature? </prompt>

  43. Prosody Analysis(Add pause at phrase boundaries) <prompt> Environmental control menu <break strength=“medium”/> Do you want to adjust the lighting or temperature? </prompt>

  44. Prosody analysis(De-emphasize familiar words) <prompt> Environmental control menu <break strength=“medium” /> <emphasis level = "reduced"> Do you want to adjust </emphasis> the lighting or temperature? </prompt>

  45. Prosody Analysis(pause to let the listener catch up) <prompt> Environmental control menu <break/> <emphasis level = "reduced " > do you want to adjust </emphasis> the lighting <break/> or temperature? </prompt>

  46. Prosody Analysis(Add emphasis to focus listener’s attention) <prompt> Environmental control menu <break/> <emphasis level = "reduced" > do you want to adjust the </emphasis> <emphasis level = "strong"> lighting </emphasis> <break/> or <emphasis level = "strong"> temperature? </emphasis> </prompt>

  47. Speech Synthesis ML Structure Analysis Text Normali- zation Text-to- Phoneme Conversion Prosody Analysis Waveform Production Markup support: voice, audio* Markup support: phoneme, say-as Non-markup behavior: look up in pronunciation dictionary Markup support: paragraph, sentence Non-markup behavior: infer structure by automated text analysis *audio icons, branding, advertising Markup support: emphasis, break, prosody Non-markup behavior: automatically generate prosody through analysis of document structure and sentence syntax Markup support:say-as for dates, times, etc.sub for aliasing Non-markup behavior: automatically identify and convert constructs

  48. Waveform Production <prompt> <audio src=“http://www.example.com/adjust.wav" > Environmental control menu. Do you want to adjust the lighting or temperature </audio> </prompt>

  49. Exercise 5(insert SSML commands) <prompt> Welcome to Ajax Bank do you want to withdraw or deposit funds? </prompt>