VoiceXMLTechnology Andrea Piras – Guido Zucconi email@example.com@crs4.it 03/09/2001
Contents VoiceXML What’s VoiceXML? Advantages by web Advantages by SR Advantages by phone Architectural Model VoiceXML enable … Voice apps VoiceXML history Now … W3C WG
Contents VoiceXML VoiceXML Techs VoiceXML Techs in Italy VoiceXML Technology Nuance Nuance products SpeechObjects Installation Watcher Processes Access to Watcher
Contents VoiceXML Technology Watchers Launchpad Some about Vocalizer Standard system An other system A complex system Good or bad? E-mate and VoiceXML? Links
What’s VoiceXML? VoiceXML is a Web-based markup language for representing human-computer dialogs using audio output devices (computer-synthesized and/or recorded) and audio input device (voice and/or keypad tones).
Advatanges by web Advantages took by web: • improve web server capabilities • browser more powerful • advanced web data representation (XML) • web application development tools more powerful • internet infrastructure is improving in performance, bandwidth, and quality of service • the growth of the World-Wide Web and of its capabilities
Advantages by SR Advantages took by Speech Recognition: • better algorithms and acoustic models • require hardware less powerful • speech synthesizer nearer to the human talk • improvements in computer-based speech recognition and text-to-speech synthesis
Advantages by phone Advantages tooks by phone: • high diffusion • portable • instant-on • using when driving, with earphone :-)
Architectural Model Document Server process request form a client, ex. a web server
Architectural Model VoiceXML Interpreter process VoiceXML documents and conduct the dialog
Architectural Model VoiceXML Interpreter Context acquire VoiceXML documents, detect and answer calls
Architectural Model Implementation Platform controlled by VoiceXML Interpreter Context and VoiceXML Interpreter; generate events in response to user actions and system events; require: audio output (TTS, audio files),audio input (SR, audio record, DTMF)
VoiceXML enable … Voice applications developed easily. Applications are easy to deliver because don’t required particular web servers. Work with computers and telephones indifferently.
Voice apps • Information retrieval • Electronic transactions • Telephone services – Call centers • Voice e-mail • Voice Access Control – Voice Recognition • ….
VoiceXML history 1-2/1999 PML PML VoxML SpeechML AT&T Bell Labs PML / PhoneWeb RAMMING REHOR LADD TUCKEY 1995
VoiceXML history VoiceXML 0.9 8/1999 VoiceXML 1.0 3/2000 ACCEPTED 5/2000 3/1999
Now … Voice Browser Working Group Speech Recognition Grammars, Speech Synthesis Markup Language, Natural Language Semantics Markup Language, Multimodal Dialog Markup Language VoiceXML 2.0
VoiceXML Techs WebSphere Voice Server SDK – IBM TTS, ASR, browser Mya Voice Platforms – Motorola gateway, TTS, ASR, browser, download only Mobile Applications Development Kit Voice Web Application Platform – Telera voice browser and voice web server, developed TXML before use VoiceXML, after registering it’s possible checking VoiceXML code on line, no download, California
VoiceXML Techs MagicTalk Voice Gateway - General Magic integrates VoiceXML, speech recognition, and telephony technologies to enable voice access, no download, California Bevocal Cafe after registering it’s possible checking VoiceXML code on line, no download, California Enterprise VoiceXML Server - Tellme after registering in Tellme Studio it’s possible checking VoiceXML code and grammars on line and listen application by phone, no download, California
VoiceXML Techs Natural Voices – AT&T Labs high quality TTS, testable on line, no download Mosquito – Minde voice platform, no download, Utah VoiceGenie Server, browser and applications, after registering in Developer Workshopit’s possible checking VoiceXML code and grammars on line and listen application by phone, no download, Toronto
VoiceXML Techs in Italy VoxNauta – Loquendo voice platform, no download VoceViva – Tiscali voice platform, good TTS and SR, no download
Nuance Californian software house with a complete suite of VoiceXML product. After registering, it’s possible to download almost all products, test voicexml code on line, access to discussion group and read support guides.
Nuance products Nuance 7.0 distributed architecture platform used by the other Nuance products, supports 25 languages Vocalizer TTS avaibles in 9 languages V-Builder graphical tool for to easily create VoiceXML applications Verifier voiceprint identification sotfware V-Optimizer tool for analyzing and tuning deployed applications
Nuance products Voyager voice browser compatible 80% with VoiceXML 1.0 Voice Web Server web server contain a browser full compatible with VoiceXML 1.0 Grammar Builder graphical tool that enables developers to create, view, edit, manage, and test grammars Nuance Foundation SpeechObjects Nuance extension of Speech Objects
SpeechObjects Created inside of the V-Commerce Alliance for using natural language in e-commerce, SO are Java packages for voice applications. Define speech channel, grammar handle. The source code is FREE.
Installation For installing the platform is require: Nuance 7.0.4 - Service Pack 9 and Speech Object 1.1 Vocalizer 1.0 - Service Pack 1 V-Builder 1.2 Voice Web Server 1.2 Installation: 308 Mbyte Installed: 461 Mbyte Installation: 248 Mbyte Installed: 273 Mbyte Installation: 27 Mbyte Installed: 53 Mbyte Installation: 12 Mbyte Installed: 26 Mbyte TOTAL 813 Mbyte INSTALLED
Watcher Watcher is a deamon/service can start, stop, get and set parameters, quiesce and monitor (using the port 7890) about processes inside the Nuance platform. A Watcher process must run on each machine that must be monitored. A Watcher can communicate with the other ones. The default launched processes are: license manager, resource manager, recognition server, recognition client, compilation server
Processes Recognition Server (RecServer.exe) listen for incoming connection request from recognition clients; for each CPU a thread starts; work on port 8200 Compilation Server (compilation-server.exe) compile the grammars; work on port 2527 License Manager (nlm.exe) manage float license across the machines in the network; work on port 8470 Resource Manager (resource-manager.exe): manage the requests of the other processes; do not connect more than 1000 channels; work on port 7777
Processes Recognition Client (RecClient.exe) points where the applications ‘enter’; performs audio playback, recording and controls telephony applications; support the audio providers: native (SB), dialogic (telephony board by Dialogic), nms (telephony hardware), aculab (telephony board by Aculab), h323 (Voice over IP - VoIP) support multiple applications, run applications remotely; can specify the maximun number of threads; 1 recclient each 10 ports and 1 thread each 4 channels; work on port 9200
Access to Watcher 7080 7161 7023
Watchers Each watcher can communicate with the other one present in the net.
Launchpad Launchpad is a graphical tool able to communicate with all watchers using the same interface. To start: >cd %Nuance%/java >java –cp launchpad.jar nop.frontend.GUI.GUI
Some about Vocalizer By default, work on port 32323. For more TTS Servers in the same machine is necessary to indicate the port used and give a name. Ex: vocalizer tts.resourceName=americanVoice vocalizer -language italian tts.ResourceName=italianVoice tts.Port=32324 vocalizer -language french tts.ResourceName=frenchVoice tts.Port=32325 Good english and french, bad italian.
Good or bad? High flexibility and scalability Complete FREE Many languages supported Use with telephone boards High number of port used Hardware resources No telephone simulation Many variables Disk space Speech Recognition JAVA
E-mate and VoiceXML? Each E-mate service will be able to become a voice service, and it can be made extending the Object Browser to use VoiceXML. Now the unique free voice platform is Nuance. Is it possible to install a platform supporting all VoiceXML 1.0 during E-mate installation? Not simple but YES. Is it risky? YES, require to install 760 Mbyte of third part software.
Links VoiceXML Forum http://www.voicexml.org VoiceXML Central http://www.voicexmlcentral.com General Magic http://www.generalmagic.com IBM WebSphere Voice Server SDK Version 1.5 http://www-4.ibm.com/software/speech/enterprise/ep_11.html Mobile Application Development Toolkit http://www.motorola.com/MIMS/ISG/spin/mix/ Telera http://www.telera.com Tellme http://www.tellme.com, http://studio.tellme.com
Links VoiceGenie http://developer.voicegenie.com V-Commerce Alliance http://www.v-commerce.org Natural Voice – AT&T Labs http://www.naturalvoices.att.com Mosquito http://www.minde.com Bevocal Cafe http://cafe.bevocal.com Tiscali VoceViva http://voceviva.tiscali.it Loquendo http://www.loquendo.it
Links Nuance http://www.nuance.com, http://extranet.nuance.com, Nuance Vocalizer 1.0 – Nuance Vocalizer Developer’s Guide, Nuance Voice Web Server Version 1.2 – Installation Guide, Nuance Voyager Version 1.0 – Voice Browser Installation Guide, Nuance Speech Recognition System – Application Developer's Guide,Speech Object & VoiceXML