new challenge telephone
Skip this Video
Download Presentation
New challenge: telephone

Loading in 2 Seconds...

play fullscreen
1 / 28

New challenge: telephone - PowerPoint PPT Presentation

  • Uploaded on

New challenge: telephone. Text To Speech & audio Speech recognition VoiceXML Homework: sign up on Telephone. Caller to system: speech recognition, using grammars (limited vocabulary, general audience, no training) optional use of touch tones (numbers)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' New challenge: telephone' - charity-dunn

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
new challenge telephone

New challenge: telephone

Text To Speech & audio

Speech recognition


Homework: sign up on

  • Caller to system: speech recognition,
    • using grammars (limited vocabulary, general audience, no training)
    • optional use of touch tones (numbers)
  • System to caller: recorded audio (wav files) plus TTS (text to speech)
  • Limited bandwidth, in comparison to other applications, but very familiar, ubiquitous medium
  • 800 long distance, some airline information systems, others?
problems in context
Problems in context
  • Speech recognition: very difficult if
    • no restrictions on speakers
    • grammar for all of English with aim of \'natural language understanding\'
  • Text to speech: much easier problem (but English is more difficult than more fully phonetic languages like Spanish. (I\'ve been told.)

(More next class)

studio tellme com
  • Company that provides ‘engine’ for applications
  • Provides developing environment
    • We are doing the tellme version of VoiceXML, but it appears to be standard.
  • Register as a developer:
    • Provide your own id; assigned a PIN
    • Scratchpad for quick testing
      • Put VoiceXML in ScratchPad place (no audio files)
      • 1-800-555-VXML (8965)
        • SAY id and then PIN.
    • Application URL for projects with multiple files
  • To look at someone else\'s project, you change your Application URL
    • called pointing your account to a new source.
  • XML document (VXML header)
  • VoiceXML has tags for flow-of-control and calculations.
    • Also can use <script> for JavaScript
  • Grammars come in different varieties. We will use the tellme way.
    • Grammars are included in CDATA tags to prevent XML interpretation.
    • Many grammars constructed for you.
      • <field name="answer" type="boolean" >…will listen for yes or no. <field name="price" type="currency" > … will listen for currency.
    • <menu > <choice > <choice> for list
voicexml basics continued
VoiceXML basics, continued
  • <form> element can contain
    • <block> elements, which can contain <audio>, <go>, other
    • <field> which can contain
      • <prompt>
      • <grammar> (if not one of built-in grammars)
      • <filled>
  • <var> tags can be at different levels (for example, document, block, or higher levels)
  • <if> <elseif><else> tags
  • <script> elements for JavaScript (which can also appear in expressions>
voicexml basics typical case
VoiceXML basics: typical case
  • a form element
    • <field>
      • <prompt>, made up of <audio>, with reference to recorded wav file and backup text
      • <grammar>, if NOT using built-in grammars designated by type attribute of field. This is a CDATA section.
      • <filled> with (follow-on) code using field
      • <catch> for nomatch, noinput cases

A form contains various elements,


a field.

If a field has a grammar and the grammar is satisfied, control goes to a

filled tag


<?xml version="1.0"?>

<vxml version="2.0">



<audio src="prompt1.wav">Hello, world </audio>




recorded using tellme studio

backup using TTS, just in case src file missing

preparation objects
Preparation: objects
  • JavaScript (and other languages) use classes and objects
  • Objects (aka object instances) are declared (created, instantiated) as members of a class
  • Objects have
    • properties (\'the data\')
    • methods (functions that you can use \'on\' the objects)
    • static methods
      • Math.random
example tm date
Example: tm_date
  • var dt = new tm_date; creates a date/time object.
  • Use methods to extract/manipulate information held \'in\' dt.

var day = dt.get_day();

  • Use static methods supplied to do common tasks:

var dn=tm_date.to_day_of_week_name(day);

or directly:

var dn=tm_date.to_day_of_week_name(dt.get_day());

  • Header stuff
  • script with external reference
  • script (code) encased in CDATA notation
  • Form/Block, with text to speech using value produced by script
  • Closing stuff
<?xml version="2.0"?>




Will make use of data functions

<script> <![CDATA[

var dt = new tm_date();

var monis = tm_date.to_month_name(dt.get_month());

var dateis = dt.get_date();

var dayis = tm_date.to_day_of_week_name(dt.get_day());

var yearis = tm_date.to_year_name(dt.get_full_year());

var houris= dt.get_hours() - 4;

var minutesis=dt.get_minutes()

var whole = \'The date is \'+ monis+\' \'+dateis+\'. It is \' + dayis+\'. The time is \' + houris + \' \' + minutesis;

]]> </script>

brute force correction from GMT



<value expr="whole"/>

Good bye.




Can use block for audio

example my family
Example: my family
  • Directed responses to 3 family members:
    • Daniel,
      • question/response on activities
    • Aviva,
      • question/response on number of cranes
    • Esther
      • response
  • Calculations (arithmetic) done using variables
  • if tags
    • The cond attribute is a condition test.
  • limited error handled: exit on no-match event
    • alternative is to repeat prompt, generally using count attribute
<vxml version="2.0">


<field name="childid">


<audio src="whosthis.wav">Hello. Who is calling?</audio>


<grammar type="application/x-gsl" mode="voice">



[dan daniel (daniel meyer) (dan meyer)] {<childid "daniel">}

[aviva (aviva meyer)] {<childid "aviva">}

[esther (esther minkin) ] {<childid "esther">}




<catch event="noinput nomatch"> <audio src="sorry.wav">Sorry. I didn\'t get that.</audio> <exit/> </catch>


<if cond="\'daniel\'==childid">

<goto next="#danfollowup"/>

<elseif cond="\'aviva\'==childid"/>

<goto next="#avivafollowup"/>

<elseif cond="\'esther\'==childid"/>

<goto next="#estherfollowup"/>







never happens

Note inner, single quote marks. Note double =\'s

<form id="danfollowup">

<field name="today" >


<audio src="congratsdan.wav" >Congratulations on the new job. Did you work on your thesis, or do aikido or jo today?</audio>


<grammar type="application/x-gsl" mode="voice">



[aikido (i key dough)] {<today "aikido">}

[thesis (work)] {<today "thesis">}

[jo (joe) ] {<today "jo">}

[both (all) (everything) ((i key dough) jo)]{<today "both">}

[none nothing (sort of)] {<today "nothing">}




<catch event="noinput nomatch"> <audio >I didn\'t quite understand. Call or send e-mail.</audio> <exit/> </catch>


<if cond="today==\'aikido\'" >

<audio>Some aikido is fine. </audio>

<elseif cond="today==\'thesis\'" />

<audio>Good, but do other things also.</audio>

<elseif cond="today==\'jo\'" />

<audio>don\'t get hit in the head.</audio>

<elseif cond="today==\'both\'" />

<audio>Doing some of everything is best. </audio>

<elseif cond="today==\'nothing\'"/>

<audio> You deserve a break, but remember you want to be done by September. </audio>


<audio> See you soon.</audio>


</filled> </field>


<audio> Good bye </audio> </block> </form>

<form id="avivafollowup">

<var name="rest" expr="1000"/>

<field name="bcount" type="number">


<audio src="howmanycranes.wav">Hello, Aviva. How many cranes have you made? </audio>


<grammar type="application/x-gsl" mode="voice" >





<catch event="noinput nomatch"> <audio src="sorry.wav">Sorry. I didn\'t get that.</audio> <exit/> </catch>


can\'t use <


<assign name="rest" expr="1000-bcount"/>

<audio> <value expr="rest" /> </audio>

<audio src="togo.wav"> to go. </audio>

<if cond="rest&lt;200" >

<audio src="homestretch.wav">You\'re in the home stretch </audio>

<elseif cond="rest&lt;500" />

<audio src="morethanhalf.wav">More than half way </audio>

<elseif cond="rest&lt;800" />

<audio src="goodstart.wav">Off to a good start </audio>


<audio> Get a move on </audio>


<audio src="goodbye.wav">Good bye. </audio>




<form id="estherfollowup">


<audio >Hello, Mommy. This is all I can do now. </audio>




application logic
Application logic
  • VoiceXML elements (for example, <if> and <var>.
    • Note: more powerful than XSLT: <assign> tag
  • JavaScript code in attributes (for example, cond, expr)
  • JavaScript code in <script> </script>
    • Encase in CDATA to avoid problems with certain characters
  • external JavaScript code, cited using <script src=file address />
class work
Class work
  • EVERYONE (who hasn\'t already) signup tonight
  • Design simple application (you may work in groups):
    • Ask one question
    • Detect and respond to each of 2 or 3 answers
    • Use examples here for models
    • All text to speech
  • Pick (at least) one and implement.
  • (Do this a short time and then go on to next lecture. Resume after 9pm when minutes are free.)
  • (Majors requirement overdue: there will be a deduction but better late than never.)
  • Go to & signup as developer.
    • try examples (using scratch pad)
    • record some voice samples
    • do tellme tutorials
  • ALSO try and report on
    • 800 long distance or some other commercial application