1 / 24

Listener-Control Navigation of VoiceXML

Listener-Control Navigation of VoiceXML. Nuance Speech Analysis. 92% of customer service is through phone. 84% of industrialists believe speech better than web. History of VoiceXML. Bell/Lucent (’98). PML. PML. IBM (’98). SpeechML. VoiceXML Forum (’00). W3C (’02). AT&T (‘95).

nituna
Download Presentation

Listener-Control Navigation of VoiceXML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Listener-Control Navigation of VoiceXML

  2. Nuance Speech Analysis • 92% of customer service is through phone. • 84% of industrialists believe speech better than web.

  3. History of VoiceXML Bell/Lucent (’98) PML PML IBM (’98) SpeechML VoiceXML Forum (’00) W3C (’02) AT&T (‘95) VoiceXML 1.0 VoiceXML 2.0 HP (’98) TalkML Motorola (’98) VoxML

  4. VoiceXML • Open standard-language for serving voice/audio documents. • VoiceXML is designed for creating audio dialogs that feature. • Synthesized speech, Digitized audio, Recognition of spoken and DTMF key input, Recording of spoken input, Telephony and Mixed-Initiative conversations.

  5. VoiceXML (Cont’d) • VoiceXML allows scripts/CGIs etc. • Can take input from the listener via speech(fill out forms like in HTML). • Used extensively for automated call handling. • Makes info accessible over (cell) phones • The next revolution on the Web.

  6. Architectural Model

  7. Goals of VoiceXML • Web development and content delivery into voice response applications. • Minimize client/server interactions. • Separate code from service logic. • Shield the application authors from platform specific details.

  8. Voice Browser • Software platform running on a network server. • It supports the following features. • ASR • DTMF • Recognition grammars • Mixed-initiative dialog • TTS • Voice browser:VoiceXML :: Web browser:HTML

  9. Voice Enabling

  10. Sample VoiceXML Code • <vxml version="2.0"> <form> <field name="rich"> <grammar type=“application/x-gsl” mode = “voice”> <![CDATA[[ [(yes)]{<option “yes”>} [(no)]<option “no”>} ]]]> </grammar> <prompt>Would you like to get rich quick?</prompt> <filled>Gotcha. <if cond="rich==‘yes’">You want to be rich! <goto next="rich.vxml" /> <else /> You don't want to be rich. <goto next="poor.vxml" /> </if> </filled> </field> </form> </vxml>

  11. Problem with VoiceXML • Navigation of the voice document. • Author has to ask where listener will like to go next. • Listener has absolutely no control over navigation. • Tedium, Adv.Applications not possible. • Analogy: Scroll vs book

  12. Solution • Allow users to control navigation interactively. • Using Voice Anchors.

  13. Voice Anchors • Permit Speech labels that listeners can place on a dialog. • Listener can return to that dialog later by uttering that label. • Hard to implement, as free-form speech recognition is not possible. • Need to incorporate in the voice browser.

  14. Voice Anchors • We developed a number of methods for attaching voice anchors. • Most practical method: Spelling. • Anchor as a whole word. • Default anchors • Default navigation strategies

  15. Recall Anchor Place Anchors Converter Voice browser Initial VXML Augmented VXML New VXML DB file Creates a DB file

  16. Our Architecture

  17. Cumulative Anchors • Different dialogs can be marked with the same label. • Recalling the label reads out the corresponding dialogs. • Multiple cumulative anchors in a single document. • Allows creation of sub documents. • Hierarchy of sub documents can be created.

  18. Grammar • Set of valid expressions. • Each dialog references one or more grammars. • Nuance Grammar Specification Language (GSL). • Inline grammar and Offline grammar. • Offline provides the following advantages: • Can be generated dynamically (via CGI’s, ASP's). • Reused by multiple dialogs or applications. • Updated and modified without change in source code. • Subgrammars and Form-level grammar.

  19. Sample Grammar code <grammar type="application/x-gsl" mode="voice"> <![CDATA[ [ [(skip)]{<option "skip">} [(previous)]{<option "previous">} [(place anchor) (call mark) (begin mark)]{<option "mark">} [(recall mark) (recall anchor) (recall)]{<option "recall">} ] ]]>

  20. Get the HTML page Reference to another link in Augmented VXML Translator Converter Initial HTML Initial VXML Augmented VXML Voice browser

  21. Applications Web access through voice. • This involves the following sequence of steps • HTML -> VXML • Translator written in java was already developed. • Navigation of VXML

  22. Applications Mathematics for visually impaired. • This involves the following steps. • MathML -> VXML. • A translator was developed to convert the MathML documents to VXML documents using the XSLT semantics. • Navigation of VXML.

  23. Conclusion & Future work • Designing default navigation strategies. • Unit of division for navigation. • Voice Scripting Languages. • Example: “repeat chlorine until exit”.

  24. Q U E S T I O N S

More Related