Speech service creation
Sponsored Links
This presentation is the property of its rightful owner.
1 / 33

Speech Service Creation PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Speech Service Creation. NY / NJ Chapter December, 2006. An Overview of Speech Service Creation Tools. K. W. (Bill) Scholz. Agenda. Speech Applications – where we were and where we are Building speech applications today Methodologies and Tools Reusable components & packaged applications

Download Presentation

Speech Service Creation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Speech Service Creation

NY / NJ Chapter

December, 2006

An Overview of Speech Service Creation Tools

K. W. (Bill) Scholz


  • Speech Applications – where we were and where we are

  • Building speech applications today

    • Methodologies and Tools

    • Reusable components & packaged applications

  • Summary of today’s Leading VUI creation tools

    • Highlight / compare / contrast industry’s leading tools

What’s it take to build a speech app?

Requirements, Use Cases, Project Plan

Dialog Design & Test

Call flow, Implementation, & Test

Prompts, Grammars, & Test

Data / Back-end Integration, & Test

Unit Test, Integration Test, System Test

Pilot, Limited Deployment, Analysis

Full Deployment, Analysis

Where We’ve Come From: Building Speech Apps

  • Development toolkits designed for building DTMF applications were extended to support speech

  • Call flows had the sound-and-feel of DTMF apps

  • Grammars were constructed by hand

  • Back-end integration coded by hand, often targeting closed-architecture information stores

    • Screen scraping – ‘row 12, column 37, 9 characters’

    • Proprietary closed databases

  • Separate natural language processors driven by recognizer output required separate ‘NL’ grammars

  • Poor TTS quality generated need for recorded prompts

Where We Are: Building speech apps today

  • Methodologies and Tools

    • Methodology: problem statement, use cases, dialog design, project management

  • Data / Back-end integration

  • Reusable components

    • OpenSpeech Dialog Modules

    • Reusable Dialog Components

  • Packaged applications

  • Testing & Analytics

Current Practice

Most applications use state-based dialogs

  • Easiest to design, debug and test for current simple applications

  • Natural fit with the directed dialogs that are easiest for novice users

  • Speech recognizer grammars are simpler to construct and therefore less error prone

  • As developers and users become exposed to more sophisticated dialog approaches, they will become less satisfied with state-based dialogs

    • Goal-directed

    • Conversational

    • Rule-based

And others……

Avaya Dialog Designer

IBM WebSphere

Intervoice InVision

Microsoft Speech .NET

NetByTel (TuVox)

Nortel MPS Developer (was PeriProducer)

Nuance OSD

Orange Nextfire OAVS

Tools for Building Speech Applications

  • Dialog design, evaluation, call flow development back-end integration, prototype, deployment, tuning, life cycle support.

  • Vendors

    • Active:

      • Audium: the ‘Audium Builder’

      • DBscape Vocabase

      • Fluency: ‘Voice Runner’

      • OpenMethods: ‘OpenVXML’

      • TuVox: ‘CVR’ (‘Producer’ + management & analytics)

      • Vicorp: ‘xMP’

      • VoiceObjects: ‘VoiceObjects X6’

    • Inactive:

      • Unisys: the ‘NL Speech Assistant’

      • Unveil: ‘Conversation Manager’

      • Vocalocity: ‘AppCenter’

    • Support:

      • Eclipse – Back-end integration

      • Microsoft: ‘Visio’ for call flow representation

      • Nuance: OSI – Tuning

SCE Tools: what to look for

  • Manipulable element – what the SCE assembles

  • Element detailing – how each is tailored for use

  • Business rule / back-end integration

  • Architectural model – underlying design pattern

  • Life cycle support – pre- and post-deployment management and testing

Visio to Represent Dialog Call flow

Source: Unisys ‘FFA’ design specification)

Audium (Purchased by Cisco)

  • Audium Builder: a GUI that permits users to create and manage multiple applications

  • Visual elements include functions for managing databases, menus, dates and times, or phone transfers, as well as credit card or email processing.

  • Application creation is done by dragging elements to the workspace to construct the call flow

  • As elements are added their properties can be configured to load pre-recorded audio or TTS prompts, and configured to play naturally to callers.

  • Elements are interconnected using the GUI to assign ‘exit states’ to reach an end goal.

Source: Joe Oh, Audium, (private communication)

Application treeview


Object properties


DBscape Vocabase

The VocaBase “Dialog Map” represents the sequence of modules, sub-modules, and steps. Clicking on any element permits access its detailed configuration.

Fluency ‘Voice Runner’

Key features of this tool are:

  • Visual component assembly

  • Integrated component assembly analysis & testing

  • One click assembly deployment

  • Library of process and rule components:

    • Address Collection

    • Credit Card Verification

Vicorp xMP

VoiceObjects 6 Desktop

  • Tree structure to represent dialog design

  • Point-and-click authoring.

  • Layering includes system layers and user-built layers

  • Single click packages an application for deployment

  • Back-end integration: ‘connectors’ support both server-side scripting and J2EE code execution

  • Uses object-oriented concepts

Source: http://www.voiceobjects.com/

List of all available VoiceObjects

Individual editor for voice object

VoiceObjects Desktop – At a glance





Source: Tiemo Winterkamp, VoiceObjects (private communication)

VoiceObjects Desktop - Control Center

Source: Tiemo Winterkamp, VoiceObjects (private communication)

Microsoft Speech (Visual Studio)

Unisys ‘NLSA’

NLSA Grammar Specification

Vocalocity AppCenter

Source: Ken Rehor - 2005

OpenVXML – Open Source SCE

Back-end Integration

  • Java, JSP, C#

  • Scripting languages

    • PERL

    • JSP / ASP

    • PHP

  • Databases

    • Oracle

    • Microsoft SQL Server

    • MySQL / PostgreSQL

  • Web Services

  • AJAX (Asynchronous Javascript and XML)



  • Unit – emulation

  • Callflow – WoZ or live

  • Usability – WoZ or live

  • Post deployment analytics

Modules and packaged applications

Modules: components and templates




A software program designed to perform a specific set of functions

A piece of software that can be combined with other pieces to construct a program

A pattern used to replicate objects

Source: Steve Erlich, Apptera (private communication)

SCE Analysis and Evaluation

  • Manipulable element – what the SCE assembles

    • Dialog state

    • Object module

    • Conversation step

  • Element detailing

    • Properties and values

    • Element attributes

    • Prompt and grammar management

  • Business rule / back-end integration

    • Built-in primitives

    • Integration with Java, Web Services, Databases

  • Architectural model

    • OO? FSM? SOA? MVC? Design patterns?

    • Visible dialog metalanguage?

  • Life cycle: Deployment and post-deployment support

    • Reuse: create, package, and integrate reusable components

    • Test capability; test script generation; WoZ capability

    • Analytics


  • Application Development assets

    • Gui is implemented using Eclipse. VISIO-like view

    • Inline grammars can be generated directly by the Studio

    • Centralized prompt management capability; recording scripts generated

    • OSDM integration supported (but RDCs are not)

    • XML dialog meta-language documented and the DTD provided

    • Multiple ‘Form’ elements can be combined to generate mixed-initiative dialog

    • Multi-user collaboration is well supported and demonstrated at customer sites

  • Runtime assets

    • Applications published as XML; interpreted by a Java runtime engine

    • SNMP queries are generated

  • Liabilities

    • Layering is not distinct – common database and external component references

    • No 3rd party application support

    • No automatic test script generation

    • No dedicated form for mixed initiative

    • No runtime cluster or server management

    • No speaker verification or video service generation capability

    • Elements oriented towards programmers, not towards VUI designers


  • Application Development assets

    • Explicit separation of presentation layer from business objects layer

    • Visio-like presentation of application call flow.

    • Inline grammars with confidence levels generated from item lists

    • Prompt categories facilitates multiple persona and language management.

    • Invokes 3rd party applications by URI with arguments.

    • Directed dialog, mixed initiative, and sub dialogs are supported.

  • Runtime assets

    • Applications published as EAR files for execution on J2EE application server.

    • Service Management Console provided to mange server clusters.

  • Liabilities

    • No support for the generation of SSML for TTS

    • Internal XML dialog meta-language not exposed for use

    • No automatic testing of applications; no post-deployment analytics

    • No support for multi-user management or collaboration

    • Speaker verification and video service generation not shown

    • It is not possible to open multiple simultaneous projects then cut-and-paste between them.


  • Application Development assets

    • Layering facilitates runtime prompt and persona remapping

    • Java extensions easily integrated as external resources

    • OSDM integration supported

    • Invokes 3rd party applications by URI with arguments.

    • XML dialog meta-language documented, DTD provided

    • Recording script generation by DB query

    • Multi-user collaboration supported: user logons with specific privileges

  • Runtime assets

    • Single runtime engine accesses all applications as data

    • Runtime data collection through ‘InfoStore’ and a mature Analytics package.

    • Extensive server cluster management, including SNMP

    • Support for multi-tenancy: separate JVMs launched for each tenant

  • Liabilities

    • Reusable Dialog Components are not supported

    • No explicit prompt management

    • Eclipse integration is incomplete

    • Confidence values not supported

    • No generation of SSML or recording scripts

    • No built-in application testing capability or test script generation capability

    • Natural language apps only supported by reference to external SLMs

    • External resources such as Java jar files are not managed by app dev environment.

Supported by Multiple Leading Vendors


  • Building speech applications today…..

    …..a bit like a marriage!

Something old, something new, something borrowed, .....

Dialog modules, Packaged apps

VUI built with tools

ASR and TTS subsystems


  • Overview of speech application creation process

  • Building speech applications today

    • Methodologies and Tools

    • Reusable components

    • Packaged applications

  • Where the field is going

    • Dialog description languages and tools: MI, Personalization, automatic call flow generation

    • SLMs, ASR & TTS improvements, Rule-Based and Case-Based Reasoning

Thank You.

K. W. (Bill) Scholz, Ph.D.

Home: +1 610.989.0989

Mobile: +1 610.212.8016


  • Login