introduction and overview n.
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction and overview PowerPoint Presentation
Download Presentation
Introduction and overview

Loading in 2 Seconds...

play fullscreen
1 / 38

Introduction and overview - PowerPoint PPT Presentation

  • Uploaded on

Introduction and overview. Outline. A short history of the field Speech synthesis (TTS) Automatic speech recognition (ASR) Dialog system architectures Voice on the Web ( perhaps show the Siri video) Voice on the Web and W3C Standards Relation to linguistic theory

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Introduction and overview

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
  • A short history of the field
  • Speechsynthesis (TTS)
  • Automaticspeechrecognition (ASR)
  • Dialog system architectures
  • Voice on the Web (perhaps show the Siri video)
  • Voice on the Web and W3C Standards
  • Relation to linguistictheory
  • A brief look at the course plan
  • What this course is not about
  • What this course could mean to you
  • Introduction to labassignments and platforms
  • Designing and developingspoken dialog systems
  • Present project
  • Givehomeassignment 1: Call flow design and evaluation
  • Present Labassignment 1)
a short history of the field
A short history of the field
  • 1966, Joseph Weizenbaum, Eliza
  • Sundial ATIS Verbmobil
  • AIML NLP system
  • VXML "Voice XML", dialog markuplanguage (primarily for telephony) developedinitially by AT&T thenadministered by an industryconsortium and finally a W3C specification.
  • Voxeo, Tropo
speech recognition
Speech recognition

text (or some semantic representation)


dialog management
Dialog management
  • Finite-statebaseddialog management
  • Framebased (form-based) dialog management
  • Information-statebased dialog management
  • Plan baseddialog management
why voice
Why voice
  • Wireless deviceshave small screens and limited input capabilities.
  • Telephone keypadcangiveusersonly a limitednumber of choices.
  • Speech technology is improving.
  • The exchange of information between a person and a computer is becomingmore like a real conversation.
  • Userswanthands-free or eyes-freeuse.
  • From a business viewpoint, voiceapplicationsopen up a host of new revenueopportunities.
  • Thereexistmanymoretelephonesthancomputers with the potential to access the Internet.
  • Information providing systems:
    • weather reports
    • stock quotes
    • timetables
  • Transaction-based systems:
    • calendarfunctions
    • shopping
    • financialtransactions
    • travel reservations
  • Naturallanguageunderstanding
    • Proper Nameidentification
    • part of speechtagging
    • parser
  • dialog manager
  • output generator
    • naturallanguage generator
    • gesture generator
    • layout engine
  • input recognizer/decoder
    • automaticspeechrecognizer
    • gesturerecognizer
    • handwritingrecognizer
  • output renderer
    • text-to-speechengine
    • talkinghead
    • robot or avatar
  • multi-modal fusion
types of systems
Types of systems
  • by modality
    • text-based
    • spoken dialog system
    • graphical user interface
    • multi-modal
  • by device
    • telephone-based systems
    • PDA systems
    • in-car systems
    • robot systems
    • desktop/laptop systems
      • native
      • in-browser systems
      • in-virtual machine
    • in-virtual environment
    • robots
  • by style
    • command-based
    • menu-driven
    • natural language
    • speech graffiti
  • by initiative
    • system initiative
    • userinitiative
    • mixed initiative
  • by application
    • information service
    • command-and-control
    • entertainment
    • education/tutorial
    • edutainment
    • reminder systems
    • companion systems
    • healthcare
    • eldercare
    • assistive/access systems
mobile voice apps
Mobile voice apps
  • Voice on the Web
relation to other fields
Relation to other fields
  • Phonetics
  • Phonology
  • Syntax
  • Semantics
  • Pragmatics
  • spoken language understanding
  • psycholinguistics
  • human communication
  • discourse analysis
  • human-computer interaction
  • computational linguistics
  • NL-parsing
  • NL-generation
  • language modeling
  • multi-modal fusion
  • multi-modal fission
  • psychology
  • cognitive science
  • affective dialog
  • user modeling
  • embodied communication
what this course is not about
What this course is not about
  • Sophisticated dialog management
  • Multi-modal systems
  • Non-spoken dialog systems
what this course could mean to you
What this course could mean to you
  • Will prepare you for writing a thesis in the area of dialog systems (if you so choose)
  • Will prepare you for work in the industry
  • A link to the linkedin page
roles in the process
Roles in the process
  • Dialog designer 
  • VoiceXML programmer 
  • Voice talent
  • Grammar writer
  • TTS specialist
  • Speech recognition specialist
  • Quality assurance specialist 
  • Server specialist
  • Manager
who are the big players in the area
Who are the big players in the area?
  • Google
  • Microsoft
  • Apple
  • IBM
  • Nuance
  • Voxeo
  • AT&T

The Emergence of Speech as a Mobile Platform Market Trends Speech-Enabled Mobile AppsGainingAcceptance

    • Voice Control in a Mission-Critical Environment
    • Search Engine for Audio-Visual Content
    • Instantaneous Language Translation
    • IBM'sSpoken Web
  • What's Driving Speech as a Mobile Platform?
    • Mobile Devices and Peripherals
    • Cloud Computing
    • Open Technologies
    • Mashups and the Programmable Web
    • Legislation
  • Closing the (Mobile) Digital Divide
  • An Overview of Emerging SAAP ApplicationsCurrentSpeech-EquippedDevices are Merely the Tip of the Iceberg
  • SaaPEnables New Application Interaction
    • Spoken Alerts
    • Mobile Reminders
    • SynthesizedSpeech
    • Email and Text Messages
    • Speech-to-Text for Voicemail
  • SaaPEnablesVoiceUser Interfaces
  • SpeechRecognition: The Foundation of Speech-EnabledAppsConstrained vs. Natural Language Processing
  • Automated vs. Hybrid SpeechRecognition
  • Applications for SpeechRecognition
    • Speaker Authentication
    • Email and Text MessagesComposition
    • Launch and Control Mobile Apps
  • Special Case: VoiceActivation
w3c speech standards

W3C Speech Standards

Torbjörn Lager

the big picture
The big picture




the place of speech technology
The place of speech technology
  • … speech technology itself has a very long way to go. … the most important thing may turn out to be be not the speech technology itself, but the way in which speech technology connects to all the other technologies.

Tim Berners-Lee

the big picture1
The big picture





VoiceXML-browser(ASR, TTS)

the what and why of standards
The What and Why of Standards
  • Software standards include terminology, languages and protocols specified by committees of experts for widespread use in the software industry. Software standards have both advantages and disadvantages.
  • Advantages:
    • developers can create applications using the standard languages that are portable across a variety of platforms;
    • products from different vendors are able to interact with each other;
    • a community of experts evolves around the standard and is available to develop products and services based on the standard.
  • Disadvantages:
    • some developers feel that standards may inhibit creativity and stall the introduction of superior technology.
  • However, in the area of speech, vendors are enthusiastic about standards and frequently complain that standards are not developed fast enough.
  • Emerging speech-technology standards could give a boost to an industry hampered by proprietary software and hardware.
world wide web consortium
World Wide Web Consortium

w3c speech standards1
W3C Speech Standards
  • Speech Recognition Grammar Specification (SRGS) –
  • What the user can say
  • Semantic Interpretation for Speech Recognition (SISR) –
  • What the user means
  • Speech Synthesis Markup Language (SSML) –
  • What the user hears
  • Pronunciation Lexicon Specification (PLS) –
  • How words are pronounced
intro to xml
Intro to XML
  • Standard for storage and transportation of data
  • Maintained by W3C (
  • Elements and tags
  • Well-formedness
  • Validity
  • DTD
  • Editor (Textmate + XMLmate)
speech synthesis2
Speech synthesis






a peek inside the black box
A peek inside the black box