1 / 42

Voice XML- Voice Markup Language

Voice XML- Voice Markup Language. Presented by Hongliang Xu. Presentation Overview. What is VoiceXML ? Introduction to VoiceXML Overview of VoiceXML A Sample VoiceXML Application Summary History. What is VoiceXML ?.

lei
Download Presentation

Voice XML- Voice Markup Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Voice XML- Voice Markup Language Presented by Hongliang Xu

  2. Presentation Overview • What is VoiceXML ? • Introduction to VoiceXML • Overview of VoiceXML • A Sample VoiceXML Application • Summary • History

  3. What is VoiceXML ? • VoiceXML is a standard for voice–based communication. VoiceXML is an XML language, which plays the role of the language of communication in voice application, similar to the role played by HTML in web application. • Also, like other XML technologies, VoiceXML seamlessly integrates with existing web-based technologies and can be used with any existing server-side technology such as ASP, Java servelets.

  4. Goal of VoiceXML VoiceXML’s main goal is to bring the full power of web development and content delivery to voice response applications, and to free the authors of such applications from low-level programming and resource management.

  5. Concepts • Dialogs and Subdialogs • Sessions • Grammars • Events • Links • Application

  6. Introduction to VoiceXML • Two Short examples • The advantages of using VoiceXML • The architectural design of VoiceXML • Elements of the VoiceXML Implementation

  7. “Hello World” Example Here are two short examples of VoiceXML. The first is the venerable “Hello World”: <?xml version="1.0"?> <vxml version="1.0"> <form> <block>Hello World!</block> </form> </vxml> The top-level element is <vxml>, which is mainly a container for dialogs. There are two types of dialogs: forms and menus. Forms present information and gather input; menus offer choices of what to do next. This example has a single form, which contains a block that synthesizes and presents “Hello World!” to the user. Since the form does not specify a successor dialog, the conversation ends.

  8. Another Short Example The second example asks the user for a choice of drink and then submits it to a server script: <?xml version="1.0"?> <vxml version="1.0"> <form> <field name="drink"> <prompt>Would you like coffee, tea, milk, or nothing?</prompt> <grammar src="drink.gram" type="application/x-jsgf"/> </field> <block> <submit next= "http://www.drink.example/drink2.asp"/> </block> </form> </vxml>

  9. The advantages of using VoiceXML • Minimizes client/server interactions by specifying multiple interactions per document • Shields application developers from low-level and platform specific details • Separates the user interaction code(which is given in VoiceXML) from service logic • Promotes service portability across implementation platforms. VoiceXML is a common language for content providers, tool providers and platform providers. • Is easy to use for simple interactions, and yet provide language features to support complex dialogs

  10. The architectural design of VoiceXML Translation process VoiceXML Voice Client (e.g. PC) User Content Server VoiceXML Gateway Client (e.g. mobile phone)

  11. Elements of the VoiceXML Implementation Document Server Document request Interceptor Context interceptor Implementation Platform

  12. Overview of VoiceXML • Grammar • Structure of VoiceXML Applications • Different Types of Forms • A Form Dialog • A Menu Dialog • Sub Dialogs within a Form • Transition between Dialogs • Variables in VoiceXML Dialogs • Event Handling in VoiceXML

  13. Grammar • A grammar defines the allowable inputs submitted by the user. • The <grammar> element is used to provide a speech grammar that • specifies a set of utterances that a user may speak to perform an action or supply information, and • provides a corresponding string value (in the case of a field grammar) or set of attribute-value pairs (in the case of a form grammar) to describe the information or action. • The <grammar> element is designed to accommodate any grammar format that meets these two requirements. At this time, VoiceXML does not specify a grammar format nor require support of a particular grammar format. This is similar to the situation with recorded audio formats for VoiceXML, and with media formats in general for HTML.

  14. Structure of VoiceXML applications Root Document Document1 Document2 Document3 Document4 Dialog 1 Dialog 2 Dialog 3 Subdialog 1 Subdialog 2

  15. Forms contains • A set of form items. Form items are subdivided into field items, those that define the form’s field item variables, and control items • Declarations of non-field item variables. • Event handlers. • “Filled” actions, blocks of procedural logic that execute when certain combinations of field items are filled in. • Form attributes are id and form.

  16. Different Types of Forms • Directed Forms • Mixed Initiative Forms

  17. Directed Forms The simplest and most common type of form is one in which the form items are executed exactly once in sequential order to implement a computer-directed interaction. Here is a weather information service that uses such a form. <form id="weather_info"> <block>Welcome to the weather information service.</block> <field name="state"> <prompt>What state?</prompt> <grammar src="state.gram" type="application/x-jsgf"/> <catch event="help"> Please speak the state for which you want the weather. </catch> </field> <field name="city"> <prompt>What city?</prompt>

  18. Directed Forms(continued) <grammar src="city.gram" type="application/x-jsgf"/> <catch event="help"> Please speak the city for which you want the weather. </catch> </field> <block> <submit next="/servlet/weather" namelist="city state"/> </block> </form> This dialog proceeds sequentially: C (computer): Welcome to the weather information service. What state? H (human): Help C: Please speak the state for which you want the weather. H: Georgia C: What city? H: Tblisi C: I did not understand what you said. What city? H: Macon C: The conditions in Macon Georgia are sunny and clear at 11 AM …

  19. Mixed Initiative Forms • Directed forms implementing rigid, computer-directed conversations. To make a mixed initiative form, where both the computer and the human direct the conversation, it must one or more <initial> form items and one or more form-level grammars. • If a form has form-level grammars: • Its fields can be filled in any order. • More than one field can be filled as a result of a single user utterance. • Also, the form’s grammars can be active when the user is in other dialogs. If a document has two forms on it, say a car rental form and a hotel reservation form, and both forms have grammars that are active for that document, a user could respond to a request for hotel reservation information with information about the car rental, and thus direct the computer to talk about the car rental instead. The user can speak to any active grammar, and have fields set and actions taken in response.

  20. A Form Dialog • Fields • Recording input • Blocks and objects

  21. Fields A field specifies an input item to be gathered from the user. The <field> type attribute is used to specify a built-in grammar for one of the fundamental types, and also specifies how its value is to be spoken if subsequently used in a value attribute in a prompt. An example: <field name="lo_fat_meal" type="boolean"> <prompt> Do you want a low fat meal on this flight? </prompt> <help> Low fat means less than 10 grams of fat, and under 250 calories. </help> <filled> <prompt> I heard <emp><value expr="lo_fat_meal"/></emp>. </prompt> </filled> </field>

  22. Fields(continued) In this example, the boolean type indicates that inputs are various forms of true and false. The value actually put into the field is either true or false. The field would be read “yes” or “no” in prompts. It is important that there be input conventions for each built-in type, so that, for instance, generic prompt and help messages can be written that apply to all implementations of VoiceXML. These are locale-dependent, and a certain amount of variability is allowed. For example, the boolean type’s grammar should minimally allow “yes” and “no” responses, but each implementation is free to add other choices, such as “yeah” and “nope”. In cases where an application requires a different behavior, it should use explicit field grammars. In addition, each built-in type has a convention for the format of the value returned. These are independent of locale and of the implementation. The return type for built-in fields is string except for the boolean field type.

  23. Recording input As we know, <field> element is used for collecting user input. The <record> element is used to record the user input as voice. We can think of the <field> element as analogous to a human speaking with another human, while the <record> element is more analogous to a voice mailbox. The user is prompted for a greeting and then records it. The greeting is played back, and if the user approves it, is sent on to the server for storage using the HTTP POST method. Notice that like other field items, <record> has prompts and catch elements. It may also have <filled> actions. If the platform supports simultaneous recognition and recording, form and document scoped grammars can be active while the recording is in progress.

  24. Blocks The <block> element contains sequence of procedural statements used for prompting and computation, but not for gathering input. The <block> element is executed based on the value of the field variable associated with the block. It’s a control item. A block has a (normally implicit) form item variable that is set to true just before it is interpreted.

  25. Object Objects are used to declare and execute sections of code. The <object> element is very similar to the HTML <OBJECT> element. The <object> item invokes a platform-specific "object" with various parameters. The result of the platform object is an ECMAScript Object with one or more properties. One platform object could be a built-in dialog that gathers credit card information. Another could gather a text message using some proprietary DTMF text entry method. There is no requirement for implementations to provide platform-specific objects, although support for the <object> element is required.

  26. Menu Dialog The menu dialog in voiceXML is intended to allow the user to make a selection from a set of choices. For example , in a voiceXML music shopping portal, the application asks the user if they want to buy titles on classical music, pop or rock using a menu. This menu leads the user to another dialog which will depend on their selection. A simple example is given below: <menu> <prompt> Welcome to Music Portal. Select the category you want to go to: <enumerate /> <choice next =http://www.mobilestore.com/vxml/classical.vxml> Classical </choice> <choice next =http://www.mobilestore.com/vxml/pop.vxml> POP </choice> <choice next =http://www.mobilestore.com/vxml/rock.vxml> Rock </choice>

  27. Menu Dialog(continued) <noinput> Please make a choice <enumerate /> </ noinput> </menu> <enumerate /> list all available options to the user. This code essentially defines a prompt and enumberates through the choices. In the example above each choice directs the user to a different sectionof the application through a hyperlink. The final section defines the promptto be spoken if the timeout value is exceeded. If the user does not speak within a specified time, the application reiterate the choices. The entire menu-type dialog is enclosed within a <menu> element. Inside the <menu> element there are other elements intended to prompt the user for input and to collect this input.

  28. Sub Dialogs within a Form A subdialog in a form is a dialog within the scope of another dialog. In addition , subdialogs are helpful for giving structure to a dialog. For example, complex logic and input fields in a dialog cab be split off into subdialogs. It will be easier to manage many fields and complex entries if a subdialog mechanism is used . Subdialogs are linked to a main dialog using the <subdialog> element

  29. Transition between Dialogs • A VoiceXML application consists of a set of dialogs ,and that the dialogs can submit data to other dialogs or a document. In Practice The information is passed between documents in the conetnt server. The two elements used for transferring execution between forms or documents are <goto> and < submit> • The <goto> element is used to; • transition to another form item in the current form, • transition to another dialog in the current document, or • transition to another document. • The different attributes used by <goto> are listed below: • Next The URI to which to transition. • Expr An ECMAScript expression that yields the URI. • Nextitem The name of the next form item to visit in the current form.

  30. Transition between Dialogs Expritem An ECMAScript expression that yields the name of the next form item to visit. The <submit> element is similar to <goto> in that it results in a new document being obtained. Unlike <goto>, it lets you submit a list of variables to the document server via an HTTP GET or POST request. Commonly used attributes besides next and expr are listed below: namelist The list of variables to submit. By default, all the named field item variables are submitted. If a namelist is supplied, it may contain individual variable references which are submitted with the same qualification used in the namelist. Method The request method: get (the default) or post. enctype The MIME encoding type of the submitted document. The default is application/x-www-form-urlencoded. Interpreters may support additional encoding types.

  31. Variables in VoiceXML Dialogs Variables are declared by <var> elements. This element is named using the name attribute can be set with a default or initial value using the expr attribute. Variables are also declared by form items, like <field> and <record>. VoiceXML variables are in all respects equivalent to ECMAScript variables. The variable naming convention is as in ECMAScript, but names beginning with the underscore character (“_”) are reserved for internal use.

  32. Variable Scope hierarchy The curved arrows in this diagram show that each scope contains a variable whose name is the same as the scope that refers to the scope itself. This allows you for example in the anonymous, dialog, and document scopes to refer to a variable in the document scope using document.

  33. Event Handling in VoiceXML User Application Interceptor Context Platform

  34. A Sample VoiceXML Application • This sample intents to show how VoiceXML works and also to illustrate the difficulties of voice browsing. • This application’s function is to provide information about different geographical locations around the world. • The application does two things: • Finds a place if a latitude and longitude are entered • Finds the latitude and longitude of a particular place • The root document geo.vxml. Which acts as a startup form • <?xml version=“1.0”?> • <vxml version=“1.0”> • <form> • <block> • Welcome to Geoplanet • <goto next=“geomain.vxml”/> • </block> • </form> • </vxml>

  35. A Sample VoiceXML Application <?xml version=“1.0”?> <vxml version=“1.0”> <menu> <prompt> Please select the category you want to use <enumerate/> <choice next=“getlat.vxml”> find a place < /choice> <noinput> Please make a choice <enumerate/> </noinput> </menu> </vxml> After giving a greeting, the root document directs the user to geomain.vxml, where the user makes the choice of whether to find a place by latitude and longitude or to get the latitude and longitude of a particular place

  36. A Sample VoiceXML Application <?xml version=“1.0”?> <vxml version=“1.0”> <form> <field name=“latitude” type=“number”> <prompt> Please give the latitude </prompt> </field> <field name=“longitude” type=“number”> <prompt> Please give the longitude </prompt> </field> <filled namelist=“latitude longitude”> <submit next=“http://localhost:8080/servlet/get_place/”> </filled> </form> </vxml> If the user decides to find a place by giving a latitude and longitude, they are sent to findplace.vxml

  37. A Sample VoiceXML Application <!– getlat.vxml -- > <?xml version=“1.0”?> <vxml version=“1.0” application=“geo1.vxml”> <form> <field name=“place”> <prompt> Please say the name of a place </prompt> <grammar src=“place.gram” type “application/x-jsgf”/> </field> <block> <submit next=“=“http://localhost:8080/servlet/get_place/”> </block> </form> </vxml>

  38. Summary • VoiceXML applications consist of many documents, each of which contains dialogs.These dialogs take the form of menus or forms. Forms can be constructed in different schemes – direct or mixed initiative. • The implementation of VoiceXML can be either telephony-based or browser-based. In the case of telephony-based implementation, the user connects to a VoiceXML gateway through the telephone,which itself connects to a content server. However, for browser-based implementations the user interacts directly with a browser that connect to the content server. The bowser or gateway interacts with a TTS(Text to Speech)engine and speech recognition engine, which may be implemented in hardware or software to render VoiceXML dialogs to the user.

  39. Next Generation Interface Voice Browsing: Design and limitations Voice browsing in its fullest sense may not be acceptable to users for a number of reasons: user cannot retain different menu items for long; users will always prefer familiar types of conversations rather than automated formal ones. Hence, VoiceXML application should be based on natural language dialog, and not be a simple translation of the functionality of graphical-beased user interface to voice-based interface to be successful.

  40. Revision History Version Date Description 0.9 17 Aug 1999 Initial release. Provided as baseline in support of comment period from supporters. 1.0 RC 02 Mar 2000 Release Candidate (released to Forum Supporters) 1.0 07 Mar 2000 Released to public - editorial corrections from 1.0 RC

  41. Resources • VoiceXML Forum http:www.voicexml.org/ • VoiceXML Reference http://studio.tellme.com/voicexmlref/ • W3C VoiceXML version 1.0. • Professional WAP, Wrox Press,2000

  42. Thank You !

More Related