1 / 44

Technology Development at SpeechWorks

Mike Phillips. Technology Development at SpeechWorks. Intro to SpeechWorks. Founded 1994, Public since Aug 2000 (SPWX) Ongoing MIT relationship 275+ people Boston, NY, SF, Montreal, Mexico, Singapore, UK Network-based speech products, tools and services. Telco AT&T BellSouth

penn
Download Presentation

Technology Development at SpeechWorks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mike Phillips Technology Development at SpeechWorks

  2. Intro to SpeechWorks • Founded 1994, Public since Aug 2000 (SPWX) • Ongoing MIT relationship • 275+ people • Boston, NY, SF, Montreal, Mexico, Singapore, UK • Network-based speech products, tools and services

  3. Telco AT&T BellSouth Cellular One Singapore Telecom Banking One of top three CIBC First Union Brokerage E*TRADE One of top three Discover (via Fiserv) Three other disct brkrs Singapore Stock Exchange Portals AOL HeyAnita AudioPoint Travel United and Continental Airlines Amtrak / MTRC (HongKong) Payless Car Rental Distribution FedEx (via NextLink) Roberts Insurance Guardian Life Manulife Financial Pharmaceuticals / Health McKessonHBOC Boston Medical Ctr Manufacturing / IT Apple Computer Hewlett Packard Representative Customers

  4. SpeechWorks Architecture Application Application Development Environment DialogModules Application Building Blocks Core Speech Technologies: ASR, TTS, Verification

  5. SpeechWorks Recognizer Continuous & speaker-independent Phonetic -- type in words Barge-in -- all deployed systems Large vocabulary -- over 50,000 words Natural language through BNF grammars Dynamic vocabularies and grammars MultiLingual

  6. SpeechWorks Architecture Application Application Development Environment DialogModules Application Building Blocks Core Speech Technologies: ASR, TTS, Verification

  7. Each recognition step must handle • Silence, or no speech detected • Cannot understand the utterance (rejection) • Confidence interpretation • High (OK) & Low (needs confirmation) • Touch-tone inputs (DTMF) Even after a confident recognition • Disambiguation (We have two Bills, which one do you mean) • Make use of previous information Why Do We Need DialogModules?

  8. SADL: Speech Application Development Lifecycle

  9. Specification Preliminary UI Design Requirements Analysis Project Plan High Level System Design

  10. Development Vocabularies (“Recognition Contexts”) Accuracy Testing System Integration Functional Testing Usability Testing User Interface (“Call Flow”) Database / Transaction Interface Telephony / CTI Interface Platform / Operations Interface

  11. Deployment Performance & Operability Test Pilot User Test Partial Deploy Full Deploy UI & Context Tuning UI & Context Tuning Monitor & Tune Pronunciations Grammar / Vocabulary Language Model Acoustic Model

  12. Reporting and Analysis Tools Tuning tools Logged Data VRU Recognition Events DialogModule Events Waveforms Call Playback DM Success Summary Performance Summary Call Flow Summary

  13. Using Web Infrastructure • Web model of application development • Applications on web servers • Markup (VoiceXML) for controlling telephony resources • Shares Infrastructure with Web • Need content and application management tools for multi-model interfaces (Web, WAP, Speech) • Share application logic • Different user interfaces

  14. Public Telephone Network (VoiceXML Servers) Internet (VoiceXML client) Web-based Deployment Model Application Servers Telephony Platform

  15. Core Technologies: 2001

  16. ASR • Very Large Vocabularies • Support for VoiceXML/Portals/ASPs • Robustness • Accuracy

  17. Very Large Vocabulary • Based on FST technology from AT&T • Vocabulary sizes >1 million words • Current practical limit 50 – 100K words • Less computation and memory • Enables new generation of functionality • Demo • Accessing directory-based information • Dialog technologies used to disambiguateanswers from a database

  18. Support for VoiceXML/Portals/ASPs • Fully dynamic grammars • Parallel grammars • Dramatic memory footprint reduction • JavaScript in grammars

  19. Robustness • Overall performance of ASR is very high • But, various situations can result in reduced performance • Identify and improve significant cases • Wireless environment becoming more important • Significant performance improvements for Wireless • Wide variety of wireless conditions - environment, coding, network

  20. Natural Language • Tools For easier NL Application Development • Grammar Import/Export • Wildcards in grammars • NL grammars/actions in parallel with DialogModules • High-level Application Framework • Reusable higher-level Dialog Components • Constructs for commonly used dialogs • Customizable for particular application • Common User Interface Constructs • “How May I Help You” (HMIHY) Technology

  21. “How May I Help You?” • What is “How May I Help You?”? • Automated handling of highly unconstrained customer input via interactive dialogue • More flexible than today’s NL grammars • Leverages years of AT&T call center experience • Enables a new class of speech applications • Call Routing • Help Desk • Customer Care

  22. I was trying to call my sister in Italy but I got a wrong number CREDIT DIAL HELP How do I dial direct to Tokyo? I need to make a long distance call and charge to my home number CHARGE How May I Help You? STEP 1: Training LEARNING Salient Fragments Conceptual Relevance

  23. I need to make a long distance call and charge to my home number I need to make a long distance call and charge to my home number 5.7 0.5 0.5 0.5 0.1 0.4 How May I Help You? STEP 2: Classification I need to make a long distance call and charge to my home number CREDIT DIAL HELP CHARGE

  24. Which number do you want to call? Your home phone number? How May I Help You? STEP 3: Dialog Disambiguation/Completion questions I need to make a long distance call and charge to my home number I need to make a long distance call and charge to my home number CHARGE # to Call OR To Home Number To Credit Card Credit Card # Home #

  25. TTS • TTS becoming more important • Dynamic information • Voice Portals • Large Vocabulary tasks • Quality now acceptable

  26. Comparing TTS Systems • Lucent • AcuVoice • Festival • L&H RealSpeak • Speechify female • Speechify male

  27. TTS • Product development • Increased densities • More Platforms • New voices • SpeechWorks standard voices • Custom voice development • Application-specific improvements • Increased quality for application • Mix of TTS and recordings • New languages

  28. Core Technologies: The Next Four Years

  29. Hardware Density and Cost

  30. Hardware Density and Cost

  31. Product Accuracy

  32. Product Accuracy

  33. Speech Technology • Continued gains on raw technology (30-50% error rate reductions per year) • Supports more and more difficult tasks • Supports richer User Interfaces

  34. Natural Language • Drive NL tools and capabilities by User Interface • Current State-of-the-Art User Interface is Directed Dialog + NL Shortcuts + Personalization • As users gain experience, evolve User Interface • SpeechWorks evolving tools to match • SpeechWorks participating in DARPA Speech Program

  35. TTS • Continued quality increases • With application-specific tuning, should approach human-quality • Hardware costs fall significantly • Reduced cost of custom voices • Broad language support

  36. SpeechWeb: The Next Year • High performance VoiceXML platforms and applications • Open Source Browser/SpeechLinks • Server-Side tools for easier application development

  37. SpeechWeb: Ongoing Development • Standardization is essential! • Open Source efforts • Critical mass of applications • Critical mass of users • Tools for combined Web/WAP/Speech development

  38. Networks: The Next Year • Voice Over IP (VOIP) • Working with platform providers • Optimizing for this environment • Cellular • Main focus of ASR robustness work • Obtaining performance similar to landline in reasonable conditions • Increased mobile use • Environmental noise • Hands-free

  39. Public Telephone Network Enterprise Server(s) DialogModulesTM SpeechWorksTM Database Connectivity Recognition Engine Application Building Blocks IVR Platform Changing the Network

  40. VOIP TCP/IP Enterprise Server(s) DialogModulesTM SpeechWorksTM Database Connectivity Recognition Engine Application Building Blocks IVR Platform Changing the Network

  41. VOIP TCP/IP Enterprise Server(s) DialogModulesTM SpeechWorksTM Database Connectivity Recognition Engine Application Building Blocks IVR Platform Changing the Network Voip phones PDA’s Wireless Etc.

  42. Networks: Next Generations • VOIP • Network-based • Premise-based • DSR - Distributed Speech Recognition • Front-end processing in handset/mobile device/gateway • Better performance at reduced bandwidth • Wide-Range of devices

  43. Mobile Devices • Increasing use of Mobile Devices • With High Quality Displays • With Wireless Networking • Too Small for Keyboard • Speech + Pointing In • Speech + Display out

  44. Summary • Speech will play an increasing roll as UI of choice • Especially in mobile environments • Advances to make this possible include • Continued progress on core technology • Evolution of Speech User Interfaces • Platforms designed and optimized for speech interface • Evolution of standards

More Related