Speech service creation
1 / 33

Speech Service Creation - PowerPoint PPT Presentation

  • Uploaded on

Speech Service Creation. NY / NJ Chapter December, 2006. An Overview of Speech Service Creation Tools. K. W. (Bill) Scholz. Agenda. Speech Applications – where we were and where we are Building speech applications today Methodologies and Tools Reusable components & packaged applications

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Speech Service Creation' - keiji

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Speech service creation

Speech Service Creation

NY / NJ Chapter

December, 2006

An Overview of Speech Service Creation Tools

K. W. (Bill) Scholz

Speech service creation


  • Speech Applications – where we were and where we are

  • Building speech applications today

    • Methodologies and Tools

    • Reusable components & packaged applications

  • Summary of today’s Leading VUI creation tools

    • Highlight / compare / contrast industry’s leading tools

What s it take to build a speech app
What’s it take to build a speech app?

Requirements, Use Cases, Project Plan

Dialog Design & Test

Call flow, Implementation, & Test

Prompts, Grammars, & Test

Data / Back-end Integration, & Test

Unit Test, Integration Test, System Test

Pilot, Limited Deployment, Analysis

Full Deployment, Analysis

Where we ve come from building speech apps
Where We’ve Come From: Building Speech Apps

  • Development toolkits designed for building DTMF applications were extended to support speech

  • Call flows had the sound-and-feel of DTMF apps

  • Grammars were constructed by hand

  • Back-end integration coded by hand, often targeting closed-architecture information stores

    • Screen scraping – ‘row 12, column 37, 9 characters’

    • Proprietary closed databases

  • Separate natural language processors driven by recognizer output required separate ‘NL’ grammars

  • Poor TTS quality generated need for recorded prompts

Where we are building speech apps today
Where We Are: Building speech apps today

  • Methodologies and Tools

    • Methodology: problem statement, use cases, dialog design, project management

  • Data / Back-end integration

  • Reusable components

    • OpenSpeech Dialog Modules

    • Reusable Dialog Components

  • Packaged applications

  • Testing & Analytics

Current practice
Current Practice

Most applications use state-based dialogs

  • Easiest to design, debug and test for current simple applications

  • Natural fit with the directed dialogs that are easiest for novice users

  • Speech recognizer grammars are simpler to construct and therefore less error prone

  • As developers and users become exposed to more sophisticated dialog approaches, they will become less satisfied with state-based dialogs

    • Goal-directed

    • Conversational

    • Rule-based

Tools for building speech applications

And others……

Avaya Dialog Designer

IBM WebSphere

Intervoice InVision

Microsoft Speech .NET

NetByTel (TuVox)

Nortel MPS Developer (was PeriProducer)

Nuance OSD

Orange Nextfire OAVS

Tools for Building Speech Applications

  • Dialog design, evaluation, call flow development back-end integration, prototype, deployment, tuning, life cycle support.

  • Vendors

    • Active:

      • Audium: the ‘Audium Builder’

      • DBscape Vocabase

      • Fluency: ‘Voice Runner’

      • OpenMethods: ‘OpenVXML’

      • TuVox: ‘CVR’ (‘Producer’ + management & analytics)

      • Vicorp: ‘xMP’

      • VoiceObjects: ‘VoiceObjects X6’

    • Inactive:

      • Unisys: the ‘NL Speech Assistant’

      • Unveil: ‘Conversation Manager’

      • Vocalocity: ‘AppCenter’

    • Support:

      • Eclipse – Back-end integration

      • Microsoft: ‘Visio’ for call flow representation

      • Nuance: OSI – Tuning

Sce tools what to look for
SCE Tools: what to look for

  • Manipulable element – what the SCE assembles

  • Element detailing – how each is tailored for use

  • Business rule / back-end integration

  • Architectural model – underlying design pattern

  • Life cycle support – pre- and post-deployment management and testing

Visio to represent dialog call flow
Visio to Represent Dialog Call flow

Source: Unisys ‘FFA’ design specification)

Audium purchased by cisco
Audium (Purchased by Cisco)

  • Audium Builder: a GUI that permits users to create and manage multiple applications

  • Visual elements include functions for managing databases, menus, dates and times, or phone transfers, as well as credit card or email processing.

  • Application creation is done by dragging elements to the workspace to construct the call flow

  • As elements are added their properties can be configured to load pre-recorded audio or TTS prompts, and configured to play naturally to callers.

  • Elements are interconnected using the GUI to assign ‘exit states’ to reach an end goal.

Source: Joe Oh, Audium, (private communication)


Application treeview


Object properties


Dbscape vocabase
DBscape Vocabase

The VocaBase “Dialog Map” represents the sequence of modules, sub-modules, and steps. Clicking on any element permits access its detailed configuration.

Fluency voice runner
Fluency ‘Voice Runner’

Key features of this tool are:

  • Visual component assembly

  • Integrated component assembly analysis & testing

  • One click assembly deployment

  • Library of process and rule components:

    • Address Collection

    • Credit Card Verification

Voiceobjects 6 desktop
VoiceObjects 6 Desktop

  • Tree structure to represent dialog design

  • Point-and-click authoring.

  • Layering includes system layers and user-built layers

  • Single click packages an application for deployment

  • Back-end integration: ‘connectors’ support both server-side scripting and J2EE code execution

  • Uses object-oriented concepts

Source: http://www.voiceobjects.com/

Voiceobjects desktop at a glance

List of all available VoiceObjects

Individual editor for voice object

VoiceObjects Desktop – At a glance





Source: Tiemo Winterkamp, VoiceObjects (private communication)

Voiceobjects desktop control center
VoiceObjects Desktop - Control Center

Source: Tiemo Winterkamp, VoiceObjects (private communication)

Vocalocity appcenter
Vocalocity AppCenter

Source: Ken Rehor - 2005

Back end integration
Back-end Integration

  • Java, JSP, C#

  • Scripting languages

    • PERL

    • JSP / ASP

    • PHP

  • Databases

    • Oracle

    • Microsoft SQL Server

    • MySQL / PostgreSQL

  • Web Services

  • AJAX (Asynchronous Javascript and XML)


  • Unit – emulation

  • Callflow – WoZ or live

  • Usability – WoZ or live

  • Post deployment analytics

Speech service creation

Modules and packaged applications

Modules: components and templates




A software program designed to perform a specific set of functions

A piece of software that can be combined with other pieces to construct a program

A pattern used to replicate objects

Source: Steve Erlich, Apptera (private communication)

Sce analysis and evaluation
SCE Analysis and Evaluation

  • Manipulable element – what the SCE assembles

    • Dialog state

    • Object module

    • Conversation step

  • Element detailing

    • Properties and values

    • Element attributes

    • Prompt and grammar management

  • Business rule / back-end integration

    • Built-in primitives

    • Integration with Java, Web Services, Databases

  • Architectural model

    • OO? FSM? SOA? MVC? Design patterns?

    • Visible dialog metalanguage?

  • Life cycle: Deployment and post-deployment support

    • Reuse: create, package, and integrate reusable components

    • Test capability; test script generation; WoZ capability

    • Analytics


  • Application Development assets

    • Gui is implemented using Eclipse. VISIO-like view

    • Inline grammars can be generated directly by the Studio

    • Centralized prompt management capability; recording scripts generated

    • OSDM integration supported (but RDCs are not)

    • XML dialog meta-language documented and the DTD provided

    • Multiple ‘Form’ elements can be combined to generate mixed-initiative dialog

    • Multi-user collaboration is well supported and demonstrated at customer sites

  • Runtime assets

    • Applications published as XML; interpreted by a Java runtime engine

    • SNMP queries are generated

  • Liabilities

    • Layering is not distinct – common database and external component references

    • No 3rd party application support

    • No automatic test script generation

    • No dedicated form for mixed initiative

    • No runtime cluster or server management

    • No speaker verification or video service generation capability

    • Elements oriented towards programmers, not towards VUI designers


  • Application Development assets

    • Explicit separation of presentation layer from business objects layer

    • Visio-like presentation of application call flow.

    • Inline grammars with confidence levels generated from item lists

    • Prompt categories facilitates multiple persona and language management.

    • Invokes 3rd party applications by URI with arguments.

    • Directed dialog, mixed initiative, and sub dialogs are supported.

  • Runtime assets

    • Applications published as EAR files for execution on J2EE application server.

    • Service Management Console provided to mange server clusters.

  • Liabilities

    • No support for the generation of SSML for TTS

    • Internal XML dialog meta-language not exposed for use

    • No automatic testing of applications; no post-deployment analytics

    • No support for multi-user management or collaboration

    • Speaker verification and video service generation not shown

    • It is not possible to open multiple simultaneous projects then cut-and-paste between them.


  • Application Development assets

    • Layering facilitates runtime prompt and persona remapping

    • Java extensions easily integrated as external resources

    • OSDM integration supported

    • Invokes 3rd party applications by URI with arguments.

    • XML dialog meta-language documented, DTD provided

    • Recording script generation by DB query

    • Multi-user collaboration supported: user logons with specific privileges

  • Runtime assets

    • Single runtime engine accesses all applications as data

    • Runtime data collection through ‘InfoStore’ and a mature Analytics package.

    • Extensive server cluster management, including SNMP

    • Support for multi-tenancy: separate JVMs launched for each tenant

  • Liabilities

    • Reusable Dialog Components are not supported

    • No explicit prompt management

    • Eclipse integration is incomplete

    • Confidence values not supported

    • No generation of SSML or recording scripts

    • No built-in application testing capability or test script generation capability

    • Natural language apps only supported by reference to external SLMs

    • External resources such as Java jar files are not managed by app dev environment.


Supported by Multiple Leading Vendors


  • Building speech applications today…..

    …..a bit like a marriage!

Something old, something new, something borrowed, .....

Dialog modules, Packaged apps

VUI built with tools

ASR and TTS subsystems


  • Overview of speech application creation process

  • Building speech applications today

    • Methodologies and Tools

    • Reusable components

    • Packaged applications

  • Where the field is going

    • Dialog description languages and tools: MI, Personalization, automatic call flow generation

    • SLMs, ASR & TTS improvements, Rule-Based and Case-Based Reasoning

Thank you

Thank You.

K. W. (Bill) Scholz, Ph.D.

Home: +1 610.989.0989

Mobile: +1 610.212.8016