1 / 28

VISIONS, TECHNOLOGY, AND BUSINESS OF TALKING MACHINES

VISIONS, TECHNOLOGY, AND BUSINESS OF TALKING MACHINES. Roberto Pieraccini, CTO, Tell-Eureka Corporation 535 West 34 th Street New York, NY 10001 +1 646 792 2744 roberto@telleureka.com http://www.telleureka.com. The vision. DIALOG. SEMANTICS. SPEECH RECOGNITION. SPOKEN LANGUAGE

shyla
Download Presentation

VISIONS, TECHNOLOGY, AND BUSINESS OF TALKING MACHINES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VISIONS, TECHNOLOGY, AND BUSINESS OF TALKING MACHINES Roberto Pieraccini, CTO, Tell-Eureka Corporation 535 West 34th Street New York, NY 10001 +1 646 792 2744 roberto@telleureka.com http://www.telleureka.com

  2. The vision

  3. DIALOG SEMANTICS SPEECH RECOGNITION SPOKEN LANGUAGE UNDERSTANDING SYNTAX LEXICON MORPHOLOGY SPEECH SYNTHESIS PHONETICS DIALOG MANAGEMENT INNER EAR ACOUSTIC NERVE VOCAL-TRACT ARTICULATORS Recreating the Speech Chain

  4. The technology

  5. Von Kempelen (1791) Joseph Faber (1835) Talking Machines: First Steps into Spoken Language Technology Homer Dudley Bell Labs (1939)

  6. Speech Recognition: the Early Years • 1952 – Automatic Digit Recognition (AUDREY) • Davis, Biddulph, Balashek (Bell Laboratories)

  7. 1960’s – Speech Processing and Digital Computers • AD/DA converters and digital computers start appearing in the labs James Flanagan Bell Laboratories

  8. NP NP VP SEVEN THREE ZERO IS MY FOUR NUMBER TWO NINE SEVEN & E E & & r n b n n th ü e n o i s v z O & r r f m n t I e m v s r I The Illusion of Segmentation... or... Why Speech Recognition is so Difficult (user:Roberto (attribute:telephone-num value:7360474))

  9. Ellipses and Anaphors Limited vocabulary Multiple Interpretations Speaker Dependency Word variations NP NP VP Word confusability SEVEN THREE ZERO IS MY Context-dependency FOUR NUMBER TWO NINE SEVEN Coarticulation Noise/reverberation E & & & E r n b n ü e n th n o s O v z i & r r I t f e n m m v s r I Intra-speaker variability The Illusion of Segmentation... or... Why Speech Recognition is so Difficult (user:Roberto (attribute:telephone-num value:7360474)) errors rules errors rules errors rules errors rules

  10. J. R. Pierce Executive Director, Bell Laboratories 1969 – Whither Speech Recognition? […] General purpose speech recognition seems far away. Social-purpose speech recognition is severely limited. It would seem appropriate for people to ask themselves why they are working in the field and what they can expect to accomplish. […] It would be too simple to say that work in speech recognition is carried out simply because one can get money for it. That is a necessary but no sufficient condition. We are safe in asserting that speech recognition is attractive to money. The attraction is perhaps similar to the attraction of schemes for turning water into gasoline, extracting gold from the sea, curing cancer, or going to the moon. One doesn’t attract thoughtlessly given dollars by means of schemes for cutting the cost of soap by 10%. To sell suckers, one uses deceit and offers glamour. […] Most recognizers behave, not like scientists, but like mad inventors or untrustworthy engineers. The typical recognizer gets it into his head that he can solve “the problem.” The basis for this is either individual inspiration (the “mad inventor” source of knowledge) or acceptance of untested rules, schemes, or information (the untrustworthy engineer approach). The Journal of the Acoustical Society of America, June 1969

  11. 1971-1976: The ARPA SUR project • In spite of the anti-speech recognition campaign headed by the Pierce Commission ARPA launches into a 5 year program on Spoken Understanding Research • REQUIREMENTS: 1000 word vocabulary, 90%understanding rate, near real time on a 100 MIPS machine • 4 Systems built by the end of the program • SDC (24%) • BBN’s HWIM (44%) • CMU’s Hearsay II (74%) • CMU’s HARPY (95% -- 80 times real time!) • HARPY was based on an engineering approach • search on a network representing all the possible utterances • Lack of a scientific evaluation approach • Speech Understanding: too early for its timeThe project was not extended. LESSON LEARNED: Hand-built knowledge does not scale up Need of a global “optimization” criterion Raj Reddy -- CMU

  12. Vintage Speech Recognition

  13. Isolated Words Speaker Dependent Connected Words Speaker Independent Sub-Word Units 1970’s – Dynamic Time WarpingThe Brute Force of the Engineering Approach T.K. Vyntsyuk (1969) H. Sakoe, S. Chiba (1970) TEMPLATE (WORD 7) UNKNOWN WORD

  14. Fred Jelinek Acoustic HMMs Word Tri-grams a11 a22 a33 a12 a23 S1 S2 S3 1980s -- The Statistical Approach • Based on work on Hidden Markov Models done by Leonard Baum at IDA, Princeton in the late 1960s • Purely statistical approach pursued by Fred Jelinek and Jim Baker at IBM T.J.Watson Research • Foundations of modern speech recognition engines Jim Baker • No Data Like More Data • Whenever I fire a linguist, our system performance improves (1988) • Some of my best friends are linguists (2004)

  15. 1980-1990 – The statistical approach becomes ubiquitous • Lawrence Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceeding of the IEEE, Vol. 77, No. 2, February 1989.

  16. 1995 1997 1996 1998 1999 2000 2001 HOSTING 2002 2003 MIT 2004 2005 APPLICATION DEVELOPERS STANDARDS TOOLS SRI PLATFORM INTEGRATORS STANDARDS TECHNOLOGY VENDORS STANDARDS 1980s-1990s – The Power of Evaluation SPOKEN DIALOG INDUSTRY SPEECHWORKS NUANCE Pros and Cons of DARPA programs + Continuous incremental improvement - Loss of “bio-diversity”

  17. The business of speech

  18. Voice User Interface (VUI) Design—the Quantum Leap in Dialog Systems • 1995 -- The WildFire Effect • Change of perspective: From technology driven to user centered • RESEARCH: Natural Language free form • Commercial: Task completion and usability. • Persona: the personality of the application (TTS vs. Recording) • Speech recognition accuracy is important, but success is determined by the VUI. • The importance of a repeatable, streamlined, teachable, development process

  19. Speech Scientist VUI Designer usability 8 full deployment speech science 7 Analyst VUI Designer 2 3 Project Manager 1 VUI design 10 9 6 VUI development partial deployment 4 5 requirements high level system design system engineering integration Architect, App Developer Engineer The Speech Application Lifecycle

  20. Enter Transfer Get Origin Account Get Destination Account origin account Get Amount destination account amount > origin account? Play Wrong Amount Message YES amount NO Play Confirmation confirmed? What is wrong? NO YES Go to Main Menu Voice User Interface Design

  21. Correct acceptance Accept Correctly Recognize Correct confirmation Confirm False acceptance - in Accept In Mis- Vocabulary recognize False confirmation Confirm Falsely False rejection Recognition Reject Correctly Correct rejection Reject Out of Vocabulary Falsely False acceptance - out Accept Speech Science: Tuning for performance

  22. Speech Science: Tuning for performance DM ACTION Utt# = Number of utterances Sub-err% = percent of in-voc utterances wrongly recognized Fa-err% = percent of utterances wrongly accepted Fr-err% = percent of utterances wrongly rejected Rej% = total percent of all utterances rejected OOV% = percent of out-voc utterances Fa-oov% = percent of out-voc utterances wrongly accepted • Prioritize grammars that need improvement • Use transcriptions to improve grammars

  23. The Architectural Evolution of Spoken Dialog 1994 1998 2000 2005 Native Code Standard Clients (VoiceXML) Proprietary IVR Systems Standard Application servers

  24. MRCP SSML, SRGF The Voice Web SCXML? EMMA? Web Server Telephony Platform Voice Browser Internet TTS ASR VoiceXML /SALT Telephone CCXML

  25. Spoken dialog as an anthropomorphic system Spoken dialog as a tool SLU: Statistical Language Understanding Large Vocabulary, Dialog Modules Small Vocabulary Menu Based The Evolution of the Interface and the Research-Industry Chasm Natural Language Research Systems a-la DARPA Communicator Directed Dialog 1994 1996 1998 2000 2002 2004 2006

  26. The evolution of the market and the industry 600 to 1,000M$ revenue • > 8000 apps worldwide HOSTING APPLICATION DEVELOPERS PROFESSIONAL SERVICES TOOLS – AUTHORING, TUNING, PREPACKAGED APPLICATIONS New evolving standards guarantee interoperability of engines and platforms. PLATFORM INTEGRATORS IVR, VoiceXML, CTI,… TECHNOLOGY VENDORS SPEECH RECOGNITION, TTS

  27. Third generation dialog systems 1st Generation INFORMATIONAL 2nd Generation TRANSACTIONAL 3RD Generation PROBLEM SOLVING BANKING CUSTOMER CARE PACKAGE TRACKING STOCK TRADING TECHNICAL SUPPORT FLIGHT STATUS FLIGHT/TRAINRESERVATION LOW MEDIUM HIGH COMPLEXITY

  28. 2005 -- Spoken Dialog goes to Saturday Night Live

More Related