1 / 44

Voice XML and Speech Applications

Voice XML and Speech Applications. Outline. VoiceXML – Tellme.com, BeVocal.com Speech.NET – Microsoft. What is Voice XML?. Language for specifying voice dialogs Output: Prerecorded audio and text-to-speech (TTS) Input: Touch-tone keys and Automatic Speech Recognition (ASR)

bernad
Download Presentation

Voice XML and Speech Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Voice XML and Speech Applications

  2. Outline VoiceXML – Tellme.com, BeVocal.com Speech.NET – Microsoft

  3. What is Voice XML? • Language for specifying voice dialogs • Output: • Prerecorded audio and text-to-speech (TTS) • Input: • Touch-tone keys and Automatic Speech Recognition (ASR) • Extension of XML • Designed to interact with web-based applications

  4. VoiceXML’s History • 1995 • Phone Web project by AT&T Research • 1999 • Lucent and AT&T have incompatible dialects of Phone Markup Language • So, VoiceXML Forum created with AT&T, Lucent, Motorola, and IBM • Team develops VoiceXML 0.9, a first pass at standardization • 2000 • VoiceXML 1.0 was created and submitted to World-Wide Web Consortium (W3C) • 2001 • VoiceXML 2.0 by W3C’s Voice Browser Working Group

  5. XML in 30 seconds • Tags and body <cmu> <welcome>Welcome to CMU! </welcome> <ecom> <welcome>Welcome to the E-commerce!</welcome> </ecom> </cmu> • zero or more attributes <welcome accent=“texan”>Welcome</welcome> <welcome accent=“pittsburgh”>Welcome</welcome> • Tag with no body </breath> • XML is Picky about syntax • All lowercase, “ ” are not optional, can be validated using a Document Type Definition (DTD).

  6. What is VoiceXML • VoiceXML ≈ XML ≈ XHTML ≈ HTML • What is Tellme.com (800.555.TELL) • A VoiceXML gateway which is easy!

  7. The Big Picture

  8. Dissecting a simple VXML program <vxml version="2.0"> <form> <block>Hello, world!</block> </form> </vxml>

  9. What you need to get started • Go to Tellme.com • Click on Studio for Developers in the lower right • Join and login • You’ll see the VoiceXML scratchpad • Type this in and hit update: <vxml version="2.0"> <form> <block>Hello, world!</block> </form> </vxml> Congrats you’ve written your first VoiceXML application. Call 1-800-555-VXML to try it out.

  10. Playing a Sound • The Audio Element • Must be contained with a block: <block> <audio src="ui/welcome.wav"> Welcome to the HCII </audio> </block> relative file reference

  11. Moving around • The goto element • stop what your doing and go execute this other voice xml document. • Like clicking a link on a webpage. <block> <audio>Thanks for calling!</audio> <goto next="document2.vxml" /> </block>

  12. Getting user input • We can talk, we can go to a different page, We just need to know what the user wants! • find out using fields • Fields are different then blocks Blocks just speak, Fields listen • “But the computer doesn’t hear too good” So tell it what to expect

  13. More on fields • Prompt– Asks the user a question • Grammar – defines the possible answers • Can use built-in or custom • Name – name of variable that stores what the user said • GoingtoDaytonaBeach • Instructions–What the program should do based on the input • If going, "that’s great!" otherwise "bummer maybe next year"

  14. Summary of VXML elements (1/2) • Input: • <form>, <field>, <prompt> • Output: • <audio> • Events: • <filled>, <noinput>, <nomatch>, <help>, <catch> • Transition: • <goto>, <submit>

  15. Summary of VXML elements (2/2) • Grammars: <grammar> <![CDATA[ [ [visa] {<element “visa">} [master card] {<element “mastercard">} [american express] {<element “amex">} ] ]]> </grammar> • Selection: • <menu>, <choice>, <option> • ECMA Scripting (i.e. Javascript): • <script>, <var>, <foreach>, <if>

  16. Demo

  17. Application overview • Components: • VXML file (to prompt information) • ASP file (to retrieve balance) • Tellme Studio: • Get developer account (free) • Enable your Tellme extension (“free”) • Web Server • To host VXML and HTML files • .NET enabled for this demo • Not provided by Tellme

  18. A simple example <vxml version="2.0"> <form> <field name=“goingtoBeach" type="boolean"> <prompt> “Are you going to Daytona Beach this year?” </prompt> <filled>Ohh... <if cond=“goingtoBeach">That’s great! <goto next=“goingDocument.vxml" /> <else /> bummer maybe next year. <goto next=“notgoingDocument.vxml" /> </if> </filled> </field> </form> </vxml>

  19. A simple example Grammer & Prompt <vxml version="2.0"> <form> <field name=“goingtoBeach" type="boolean"> <prompt> “Are you going to Daytona Beach this year?” </prompt> <filled>Ohh... <if cond=“goingtoBeach">That’s great! <goto next=“goingDocument.vxml" /> <else /> bummer maybe next year. <goto next=“notgoingDocument.vxml" /> </if> </filled> </field> </form> </vxml>

  20. A simple example <vxml version="2.0"> <form> <field name=“goingtoBeach" type="boolean"> <prompt> “Are you going to Daytona Beach this year?” </prompt> <filled>Ohh... <if cond=“goingtoBeach">That’s great! <goto next=“goingDocument.vxml" /> <else /> bummer maybe next year. <goto next=“notgoingDocument.vxml" /> </if> </filled> </field> </form> </vxml> Executes if Successfully Recognized

  21. A simple example <vxml version="2.0"> <form> <field name=“goingtoBeach" type="boolean"> <prompt> “Are you going to Daytona Beach this year?” </prompt> <filled>Ohh... <if cond=“goingtoBeach">That’s great! <goto next=“goingDocument.vxml" /> <else /> bummer maybe next year. <goto next=“notgoingDocument.vxml" /> </if> </filled> </field> </form> </vxml> If /else used with goto to control flow

  22. Custom Grammars • What if you want the user to be able to say something that’s not built-in? • e.g. Which hotel will you be staying at? The Hilton, the Hyatt, the Doubletree, or the Bates Motel?

  23. Custom Grammars • Which hotel will you be staying at? The Hilton, the Hyatt,the Doubletree, or the Bates Motel ? <grammar type="application/x-gsl" mode="voice"> <![CDATA[ [ [ doubletree ] {<hotel "doubletree">} [ hilton (convention center) ]{<hotel "hilton">} [ (bates motel) ] {<hotel “bates">} [ (?the hyatt ?hotel) ] {<hotel "hyatt">} ] ]]> </grammar>

  24. Custom Grammars • At CHI which hotel will you be staying at? The Hilton, the Hyatt,the Doubletree, or the Motel 4? <grammar type="application/x-gsl" mode="voice"> <![CDATA[ [ [ doubletree ] {<hotel "doubletree">} [ hilton (convention center) ]{<hotel "hilton">} [ (bates motel) ] {<hotel “bates">} [ (?the hyatt ?hotel) ] {<hotel "hyatt">} ] ]]> </grammar> field name What they can say

  25. Custom Grammars • At CHI which hotel will you be staying at? The Hilton, the Hyatt,the Doubletree, or the Motel 4? <grammar type="application/x-gsl" mode="voice"> <![CDATA[ [ [ doubletree ] {<hotel "doubletree">} [ hilton (convention center) ]{<hotel "hilton">} [ (bates motel) ] {<hotel “bates">} [ (?the hyatt ?hotel) ] {<hotel "hyatt">} ] ]]> </grammar> What they can say Different options are separated by spacesOptions that are more than one word long are in ( )’sput a ? before optional words

  26. Custom Grammars • At CHI which hotel will you be staying at? The Hilton, the Hyatt,the Doubletree, or the Motel 4? <grammar type="application/x-gsl" mode="voice"> <![CDATA[ [ [ doubletree ] {<hotel "doubletree">} [ hilton (convention center) ]{<hotel "hilton">} [ (bates motel) ] {<hotel “bates">} [ (?the hyatt ?hotel) ] {<hotel "hyatt">} ] ]]> </grammar> result field variable "hotel" is set to what the user saysWe can use this result later in our if and goto’s

  27. Grammars Tips • Grammars Languages • Nuance Grammar Specification Language (GRXML) • and Nuance Grammar Specification Language (GSL) • Tools for testing your Grammars • Syntax checker, Parse, Generate

  28. VoiceXML VXML is ideal for non-experts in speech recognitions • Easy to understand basics in order to built simple apps Could not do this with Speech.NET

  29. Microsoft Speech.NET

  30. Speech.NET • Voice • No (current) Voice Portal • Multimodal • Voice, mouse, stylus, etc. Compaq TabletPC

  31. Speech.NET Millions use Visual Studio

  32. Speech.NET Millions use Visual Studio a few new controls to “Speech enable” apps

  33. Speech.NET Millions use Visual Studio a few new controls to “Speech enable” apps Millions of potential speech developers

  34. SALT Speech.NET (ASP) compiles down to SALT • Speech Application Language Tags • Prompt, listen, record, dtmf • http://www.saltforum.org/

  35. Speech.NET BUT application developers never see SALT • Microsoft Speech.NET • ASP.NET web application • Visual (GUI) Controls • Speech Controls • Wav prompt database • Grammars

  36. Summary • General Impressions of Speech.NET • Where’s the logic? external JS file, HTML <script> block, in properties? • Forced to scroll/expand property window • Auto-complete • Prompt Editor is very nice

  37. Resources • Microsoft.public.netspeechsdk

  38. Benefits of VXML? • Brings web development paradigm to IVR market • Existing HTTP gateways to existing enterprise services/data built with Internet tech like can be seamlessly extended to the phone • Anytime, anywhere access to the web via voice interface • Keypads and small displays are made moot • Great for the car (personal experience) • Standardized technology, high interoperability • Thin layer that sits on entire web technology stack • Interoperable with infrastructure, software, other standards for web deployment • Security- VPNs, SSL, cookies • Application Servers- Java Servlets, Perl, IBM Websphere, MSFT Active Server Pages • Data abstraction- XML, XSL • Database conncectivity- ODBC, SQL • Streaming media- WAV, Real, MP3 • Open Development • 15,000 developers at Tellme alone

  39. Business Applications with Tellme • Airlines- Flight information, flight delay notification, baggage tracking, employee reservations and more. • Banking- Telephone banking, bill payment, mortgage tools, ATM and branch locators and more. • Brokerages- Telephone trading applications, retirement account management, stock alerts, financial content and more. • Government-Travel hotlines, benefits management for government services, alerts and notifications for public announcements • 511 travel directory services to Utah Government, why? • Retail- Catalog shopping applications, store locators and more.

  40. Tellme’s and Nuance • Nuance • Speech recognition software/hardware • Nuance 8.0 speech recognition and natural language understanding server • Nuance Vocalizer- synthesizes text to speech • Nuance Verifier- identify and authenticate caller based on their voiceprint (biometrics) • Both a partner and a competitor

  41. For more information • Tellme Studio Developer • http://studio.tellme.com/ • W3C VXML 2.0 Specification • http://www.w3.org/TR/voicexml20 • Example of a VXML application with Perl • http://www.webreference.com/perl/tutorial/20/ • Creating Voice applications with VXML and ASP .NET • http://www.devhood.com/Tutorials/tutorial_details.aspx?tutorial_id=147

  42. VoiceXML and XML • Based on XML Tag/Attribute Format • Elements must be properly nested! <element_name attribute_name=“attribute_value”> ..contained elements.. </element_name> • All documents start with <?xml version=“1.0”?> • All other instructions are enclosed within the <vxml> tag, called “root element” <vxml version=“1.0”> ..VoiceXML Instructions.. </vxml>

More Related