Introduction and overview

Introduction and overview

Outline • A short history of the field • Speechsynthesis (TTS) • Automaticspeechrecognition (ASR) • Dialog system architectures • Voice on the Web (perhaps show the Siri video) • Voice on the Web and W3C Standards • Relation to linguistictheory • A brief look at the course plan • What this course is not about • What this course could mean to you • Introduction to labassignments and platforms • Designing and developingspoken dialog systems • Present project • Givehomeassignment 1: Call flow design and evaluation • Present Labassignment 1)

A short history of the field • 1966, Joseph Weizenbaum, Eliza • Sundial ATIS Verbmobil • AIML NLP system • VXML "Voice XML", dialog markuplanguage (primarily for telephony) developedinitially by AT&T thenadministered by an industryconsortium and finally a W3C specification. • Voxeo, Tropo

Speech synthesis speech text

Speech recognition text (or some semantic representation) speech

Dialog management • Finite-statebaseddialog management • Framebased (form-based) dialog management • Information-statebased dialog management • Plan baseddialog management

Spoken dialogue system

Why voice • Wireless deviceshave small screens and limited input capabilities. • Telephone keypadcangiveusersonly a limitednumber of choices. • Speech technology is improving. • The exchange of information between a person and a computer is becomingmore like a real conversation. • Userswanthands-free or eyes-freeuse. • From a business viewpoint, voiceapplicationsopen up a host of new revenueopportunities. • Thereexistmanymoretelephonesthancomputers with the potential to access the Internet.

TraditionalInteractiveVoiceResponse (IVR)

Speechversus Touch Tone

Applications • Information providing systems: • weather reports • stock quotes • timetables • Transaction-based systems: • calendarfunctions • shopping • financialtransactions • travel reservations

Architecture 1

Architecture 2

Components • Naturallanguageunderstanding • Proper Nameidentification • part of speechtagging • parser • dialog manager • output generator • naturallanguage generator • gesture generator • layout engine • input recognizer/decoder • automaticspeechrecognizer • gesturerecognizer • handwritingrecognizer • output renderer • text-to-speechengine • talkinghead • robot or avatar • multi-modal fusion

Types of systems • by modality • text-based • spoken dialog system • graphical user interface • multi-modal • by device • telephone-based systems • PDA systems • in-car systems • robot systems • desktop/laptop systems • native • in-browser systems • in-virtual machine • in-virtual environment • robots • by style • command-based • menu-driven • natural language • speech graffiti • by initiative • system initiative • userinitiative • mixed initiative • by application • information service • command-and-control • entertainment • education/tutorial • edutainment • reminder systems • companion systems • healthcare • eldercare • assistive/access systems

Mobile voice apps • Voice on the Web • http://www.youtube.com/watch?v=OURZpqh-35A&eurl=&feature=player_embedded

Relation to other fields • Phonetics • Phonology • Syntax • Semantics • Pragmatics • spoken language understanding • psycholinguistics • human communication • discourse analysis • human-computer interaction • computational linguistics • NL-parsing • NL-generation • language modeling • multi-modal fusion • multi-modal fission • psychology • cognitive science • affective dialog • user modeling • embodied communication

A brief look at the course plan

What this course is not about • Sophisticated dialog management • Multi-modal systems • Non-spoken dialog systems

What this course could mean to you • Will prepare you for writing a thesis in the area of dialog systems (if you so choose) • Will prepare you for work in the industry • A link to the linkedin page

Is this something for a linguist?

Roles in the process • Dialog designer • VoiceXML programmer • Voice talent • Grammar writer • TTS specialist • Speech recognition specialist • Quality assurance specialist • Server specialist • Manager

Who are the big players in the area? • Google • http://googleblog.blogspot.com/2010/12/can-we-talk-better-speech-technology.html • Microsoft • http://gigaom.com/2010/12/06/microsoft-claims-its-place-in-a-voice-enabled-world/ • Apple • http://www.dailyfinance.com/story/company-news/apples-siri-purchase-heats-up-the-race-toward-a-voice-activated/19458344/ • IBM • http://www.ibm.com/news/in/en/2010/08/20/a896686u56875f96.html • Nuance • http://gigaom.com/2011/01/19/nuance-releases-mobile-sdk-to-speechify-apps/ • Voxeo • AT&T

The Emergence of Speech as a Mobile Platform Market Trends Speech-Enabled Mobile AppsGainingAcceptance • Voice Control in a Mission-Critical Environment • Search Engine for Audio-Visual Content • Instantaneous Language Translation • IBM'sSpoken Web • What's Driving Speech as a Mobile Platform? • Mobile Devices and Peripherals • Cloud Computing • Open Technologies • Mashups and the Programmable Web • Legislation • Closing the (Mobile) Digital Divide • An Overview of Emerging SAAP ApplicationsCurrentSpeech-EquippedDevices are Merely the Tip of the Iceberg • SaaPEnables New Application Interaction • Spoken Alerts • Mobile Reminders • SynthesizedSpeech • Email and Text Messages • Speech-to-Text for Voicemail • SaaPEnablesVoiceUser Interfaces • SpeechRecognition: The Foundation of Speech-EnabledAppsConstrained vs. Natural Language Processing • Automated vs. Hybrid SpeechRecognition • Applications for SpeechRecognition • Speaker Authentication • Email and Text MessagesComposition • Launch and Control Mobile Apps • Special Case: VoiceActivation

Call flow and call flow diagrams

Evaluatingspeech and dialog technology

W3C Speech Standards Torbjörn Lager

The big picture HTML Webbläsare Webbservrar

The place of speech technology • … speech technology itself has a very long way to go. … the most important thing may turn out to be be not the speech technology itself, but the way in which speech technology connects to all the other technologies. Tim Berners-Lee

The big picture HTML HTML-browser VoiceXML Webb-servers VoiceXML-browser(ASR, TTS)

The What and Why of Standards • Software standards include terminology, languages and protocols specified by committees of experts for widespread use in the software industry. Software standards have both advantages and disadvantages. • Advantages: • developers can create applications using the standard languages that are portable across a variety of platforms; • products from different vendors are able to interact with each other; • a community of experts evolves around the standard and is available to develop products and services based on the standard. • Disadvantages: • some developers feel that standards may inhibit creativity and stall the introduction of superior technology. • However, in the area of speech, vendors are enthusiastic about standards and frequently complain that standards are not developed fast enough. • Emerging speech-technology standards could give a boost to an industry hampered by proprietary software and hardware.

World Wide Web Consortium http://www.w3.org/

W3C Speech Standards • Speech Recognition Grammar Specification (SRGS) – • What the user can say • Semantic Interpretation for Speech Recognition (SISR) – • What the user means • Speech Synthesis Markup Language (SSML) – • What the user hears • Pronunciation Lexicon Specification (PLS) – • How words are pronounced

Intro to XML • Standard for storage and transportation of data • Maintained by W3C (w3.org/TR/REC-xml) • Elements and tags • Well-formedness • Validity • DTD • Editor (Textmate + XMLmate)

Speech synthesis

Speech synthesis text lang speech voice persona

A peek inside the black box • http://www.explainthatstuff.com/how-speech-synthesis-works.html

Introduction and overview