New challenge telephone
1 / 28

New challenge: telephone - PowerPoint PPT Presentation

  • Uploaded on

New challenge: telephone. Text To Speech & audio Speech recognition VoiceXML Homework: sign up on Telephone. Caller to system: speech recognition, using grammars (limited vocabulary, general audience, no training) optional use of touch tones (numbers)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'New challenge: telephone' - charity-dunn

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
New challenge telephone

New challenge: telephone

Text To Speech & audio

Speech recognition


Homework: sign up on


  • Caller to system: speech recognition,

    • using grammars (limited vocabulary, general audience, no training)

    • optional use of touch tones (numbers)

  • System to caller: recorded audio (wav files) plus TTS (text to speech)

  • Limited bandwidth, in comparison to other applications, but very familiar, ubiquitous medium

  • 800 long distance, some airline information systems, others?

Problems in context
Problems in context

  • Speech recognition: very difficult if

    • no restrictions on speakers

    • grammar for all of English with aim of 'natural language understanding'

  • Text to speech: much easier problem (but English is more difficult than more fully phonetic languages like Spanish. (I've been told.)

    (More next class)

Studio tellme com

  • Company that provides ‘engine’ for applications

  • Provides developing environment

    • We are doing the tellme version of VoiceXML, but it appears to be standard.

  • Register as a developer:

    • Provide your own id; assigned a PIN

    • Scratchpad for quick testing

      • Put VoiceXML in ScratchPad place (no audio files)

      • 1-800-555-VXML (8965)

        • SAY id and then PIN.

    • Application URL for projects with multiple files

  • To look at someone else's project, you change your Application URL

    • called pointing your account to a new source.


  • XML document (VXML header)

  • VoiceXML has tags for flow-of-control and calculations.

    • Also can use <script> for JavaScript

  • Grammars come in different varieties. We will use the tellme way.

    • Grammars are included in CDATA tags to prevent XML interpretation.

    • Many grammars constructed for you.

      • <field name="answer" type="boolean" >…will listen for yes or no. <field name="price" type="currency" > … will listen for currency.

    • <menu > <choice > <choice> for list

Voicexml basics continued
VoiceXML basics, continued

  • <form> element can contain

    • <block> elements, which can contain <audio>, <go>, other

    • <field> which can contain

      • <prompt>

      • <grammar> (if not one of built-in grammars)

      • <filled>

  • <var> tags can be at different levels (for example, document, block, or higher levels)

  • <if> <elseif><else> tags

  • <script> elements for JavaScript (which can also appear in expressions>

Voicexml basics typical case
VoiceXML basics: typical case

  • a form element

    • <field>

      • <prompt>, made up of <audio>, with reference to recorded wav file and backup text

      • <grammar>, if NOT using built-in grammars designated by type attribute of field. This is a CDATA section.

      • <filled> with (follow-on) code using field

      • <catch> for nomatch, noinput cases


A form contains various elements,


a field.

If a field has a grammar and the grammar is satisfied, control goes to a

filled tag


<?xml version="1.0"?>

<vxml version="2.0">



<audio src="prompt1.wav">Hello, world </audio>




recorded using tellme studio

backup using TTS, just in case src file missing

Preparation objects
Preparation: objects

  • JavaScript (and other languages) use classes and objects

  • Objects (aka object instances) are declared (created, instantiated) as members of a class

  • Objects have

    • properties ('the data')

    • methods (functions that you can use 'on' the objects)

    • static methods

      • Math.random

Example tm date
Example: tm_date

  • var dt = new tm_date; creates a date/time object.

  • Use methods to extract/manipulate information held 'in' dt.

    var day = dt.get_day();

  • Use static methods supplied to do common tasks:

    var dn=tm_date.to_day_of_week_name(day);

    or directly:

    var dn=tm_date.to_day_of_week_name(dt.get_day());


  • Header stuff

  • script with external reference

  • script (code) encased in CDATA notation

  • Form/Block, with text to speech using value produced by script

  • Closing stuff

New challenge telephone

<?xml version="2.0"?>




Will make use of data functions

New challenge telephone

<script> <![CDATA[

var dt = new tm_date();

var monis = tm_date.to_month_name(dt.get_month());

var dateis = dt.get_date();

var dayis = tm_date.to_day_of_week_name(dt.get_day());

var yearis = tm_date.to_year_name(dt.get_full_year());

var houris= dt.get_hours() - 4;

var minutesis=dt.get_minutes()

var whole = 'The date is '+ monis+' '+dateis+'. It is ' + dayis+'. The time is ' + houris + ' ' + minutesis;

]]> </script>

brute force correction from GMT

New challenge telephone



<value expr="whole"/>

Good bye.




Can use block for audio

Example my family
Example: my family

  • Directed responses to 3 family members:

    • Daniel,

      • question/response on activities

    • Aviva,

      • question/response on number of cranes

    • Esther

      • response

  • Calculations (arithmetic) done using variables

  • if tags

    • The cond attribute is a condition test.

  • limited error handled: exit on no-match event

    • alternative is to repeat prompt, generally using count attribute

New challenge telephone

<vxml version="2.0">


<field name="childid">


<audio src="whosthis.wav">Hello. Who is calling?</audio>


New challenge telephone

<grammar type="application/x-gsl" mode="voice">



[dan daniel (daniel meyer) (dan meyer)] {<childid "daniel">}

[aviva (aviva meyer)] {<childid "aviva">}

[esther (esther minkin) ] {<childid "esther">}




New challenge telephone

<catch event="noinput nomatch"> <audio src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch>


<if cond="'daniel'==childid">

<goto next="#danfollowup"/>

<elseif cond="'aviva'==childid"/>

<goto next="#avivafollowup"/>

<elseif cond="'esther'==childid"/>

<goto next="#estherfollowup"/>







never happens

Note inner, single quote marks. Note double ='s

New challenge telephone

<form id="danfollowup"> src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch>

<field name="today" >


<audio src="congratsdan.wav" >Congratulations on the new job. Did you work on your thesis, or do aikido or jo today?</audio>


<grammar type="application/x-gsl" mode="voice">



[aikido (i key dough)] {<today "aikido">}

[thesis (work)] {<today "thesis">}

[jo (joe) ] {<today "jo">}

[both (all) (everything) ((i key dough) jo)]{<today "both">}

[none nothing (sort of)] {<today "nothing">}




<catch event="noinput nomatch"> <audio >I didn't quite understand. Call or send e-mail.</audio> <exit/> </catch>

New challenge telephone

<filled> src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch>

<if cond="today=='aikido'" >

<audio>Some aikido is fine. </audio>

<elseif cond="today=='thesis'" />

<audio>Good, but do other things also.</audio>

<elseif cond="today=='jo'" />

<audio>don't get hit in the head.</audio>

<elseif cond="today=='both'" />

<audio>Doing some of everything is best. </audio>

<elseif cond="today=='nothing'"/>

<audio> You deserve a break, but remember you want to be done by September. </audio>


<audio> See you soon.</audio>


</filled> </field>


<audio> Good bye </audio> </block> </form>

New challenge telephone

<form id="avivafollowup"> src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch>

<var name="rest" expr="1000"/>

<field name="bcount" type="number">


<audio src="howmanycranes.wav">Hello, Aviva. How many cranes have you made? </audio>


<grammar type="application/x-gsl" mode="voice" >





<catch event="noinput nomatch"> <audio src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch>

New challenge telephone

can't use < src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch>


<assign name="rest" expr="1000-bcount"/>

<audio> <value expr="rest" /> </audio>

<audio src="togo.wav"> to go. </audio>

<if cond="rest&lt;200" >

<audio src="homestretch.wav">You're in the home stretch </audio>

<elseif cond="rest&lt;500" />

<audio src="morethanhalf.wav">More than half way </audio>

<elseif cond="rest&lt;800" />

<audio src="goodstart.wav">Off to a good start </audio>


<audio> Get a move on </audio>


<audio src="goodbye.wav">Good bye. </audio>




New challenge telephone

<form id="estherfollowup"> src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch>


<audio >Hello, Mommy. This is all I can do now. </audio>




Application logic
Application logic src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch>

  • VoiceXML elements (for example, <if> and <var>.

    • Note: more powerful than XSLT: <assign> tag

  • JavaScript code in attributes (for example, cond, expr)

  • JavaScript code in <script> </script>

    • Encase in CDATA to avoid problems with certain characters

  • external JavaScript code, cited using <script src=file address />

Class work
Class work src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch>

  • EVERYONE (who hasn't already) signup tonight

  • Design simple application (you may work in groups):

    • Ask one question

    • Detect and respond to each of 2 or 3 answers

    • Use examples here for models

    • All text to speech

  • Pick (at least) one and implement.

  • (Do this a short time and then go on to next lecture. Resume after 9pm when minutes are free.)

Homework src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch>

  • (Majors requirement overdue: there will be a deduction but better late than never.)

  • Go to & signup as developer.

    • try examples (using scratch pad)

    • record some voice samples

    • do tellme tutorials

  • ALSO try and report on

    • 800 long distance or some other commercial application