speech recognition understanding and conversational interfaces
Download
Skip this Video
Download Presentation
Speech recognition, understanding and conversational interfaces

Loading in 2 Seconds...

play fullscreen
1 / 33

Speech recognition, understanding and conversational interfaces - PowerPoint PPT Presentation


  • 391 Views
  • Uploaded on

Speech recognition, understanding and conversational interfaces. Alexander Rudnicky School of Computer Science http://www.cs.cmu.edu/~air. Outline. Speech Types of speech interfaces Speech systems and their structure Designing speech interfaces Some applications SpeechWear Communicator.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Speech recognition, understanding and conversational interfaces' - salena


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
speech recognition understanding and conversational interfaces

Speech recognition, understanding and conversational interfaces

Alexander Rudnicky

School of Computer Science

http://www.cs.cmu.edu/~air

outline
Outline
  • Speech
  • Types of speech interfaces
  • Speech systems and their structure
  • Designing speech interfaces
  • Some applications
    • SpeechWear
    • Communicator
speech as a signal
Speech as a signal
  • The difference between speech and sound
    • “CD” quality vs. intelligible quality
      • high-quality is 44.1 / 48 kHz
      • desirable speech bandwidth: 0-8kHz, 16bits
        • at 16bits/sample: 256kbps (tethered mic)
        • telephone: 64kbps (and lower)
    • Compression:
        • MPEG: 64kbps/channel and up (but not speech-optimal)
        • CELP: 16kbps … 2.4kbps (optimized for speech)
speech for communication
Speech for communication
  • The difference between speech and language
  • Speech recognition and speech understanding
computers and speech
Computers and speech
  • Transcription
    • dictation, information retrieval
  • Command and control
    • data entry, device control, navigation
  • Information access
    • airline schedules, stock quotes
  • Problem solving
    • travel planning, logistics
speech system architecture
Speech system architecture
  • SIGNAL PROCESSING
  • DECODING
  • UNDERSTANDING
  • DISCOURSE
  • ACTION
a generic speech system
Signal

processing

Parser

Dialog

manager

Language

Generator

Decoder

Post

parser

Speech

synthesizer

Domain

agent

Domain

agent

Domain

agent

speech

display

effector

A generic speech system

speech

decoding speech
Reduce dimensionality of signal
  • noise conditioning

Signal

processing

  • Transcribe speech to words

Decoder

Decoding speech

Acoustic

models

Language

models

Corpus-base statistical models

creating models for recognition
Creating models for recognition

Speech

data

Acoustic

models

Transcribe*

Train

Text

data

Language

models

Train

understanding speech
Understanding speech

Grammar

Ontology design, language acquisition

Parser

  • Extract semantic content from utterance

Post

parser

  • Introduce context and world knowledge into interpretation

Context

Domain

Agents

Grounding, knowledge engineering

interacting with the user
Interacting with the user

Task

schemas

Task analysis

Context

Dialog

manager

  • Guide interaction through task
  • Map user inputs and system state into actions

Domain

agent

  • Interact with back-end(s)
  • Interpret information using domain knowledge

Domain

agent

Domain

agent

Database

Live data

(e.g. Web)

Domain

expert

Knowledge engineering

communicating with the user
Communicating with the user

Language

Generator

  • Decide what to say to user (and how to phrase it)

Speech

synthesizer

Display

Generator

Action

Generator

speech recognition and understanding
Speech recognition and understanding
  • Sphinx system
    • speaker-independent
    • continuous speech
    • large vocabulary
  • ATIS system
    • air travel information retrieval
    • context management
  • film clip
command and control systems
Command and control systems
  • Small vocabularies, fixed syntax
    • OPEN WINDOW
    • MOVE OBJECT to
    • Applications:
      • data entry (e.g., zip codes), process control (e.g., electron microscope, darkroom equipment)
  • Large vocabulary, fixed syntax
    • Web browsing (?)
speechwear
SpeechWear
  • Vehicle inspection task
    • USMC mechanics, fixed inspection form
    • Wearable computer (COTS components)
    • html-based task representation
  • film clip
information access
Information access
  • Moderate to very large vocabulary
    • IVR and frame based systems
  • Commercial systems:
    • Nuance: http://www.nuance.com/demo/index.html
    • SpeechWorks: http://www.speechworks.com/demos/demos.htm
    • lots of others..
ivr and frame based systems
IVR and frame-based systems
  • Interactive voice response (IVR)
    • interactions specified by a graph (typically a tree)
  • Frame systems
    • ergodic graphs
    • states defined by multi-item forms
graph based systems
Graph-based systems

Welcome to Bank ABC!

Please say one of the following:

Balance, Hours, Loan, ...

What type of loan are you interested in?

Please sayone of the following:

Mortgage, Car, Personal, ...

. . . .

frame based systems
Destination_City: Boston

Departure_Date: ______

Departure_Time: ______

Preferred_Airline: ______

.

.

.

Frame-based systems
  • I would like to fly to Boston
    • I’d like to go to Boston on Friday, …
  • When would you like to fly?
frame based systems21
Frame-based systems

Zxfgdh_dxab: _____

askjs: _____

dhe: _____

aa_hgjs_aa: _____

.

.

Transition on

keyword or phrase

Zxfgdh_dxab: _____

askjs: _____

dhe: _____

aa_hgjs_aa: _____

.

.

Zxfgdh_dxab: _____

askjs: _____

dhe: _____

aa_hgjs_aa: _____

.

.

Zxfgdh_dxab: _____

askjs: _____

dhe: _____

aa_hgjs_aa: _____

.

.

Zxfgdh_dxab: _____

askjs: _____

dhe: _____

aa_hgjs_aa: _____

.

.

some problems
Some problems
  • IVR systems work great, but only for well-structured (& “shallow”) tasks
  • Frame systems are good for “tasks” that correspond to a single form leading to an action
  • Neither approach does well with more complex problem-solving activities
dialog systems
Dialog Systems
  • Problem solving activity; complex task
    • Order of progression through task depends on user goals (which can change) and system state (a back-end retrieval) and is not predictable.
  • Track progress and help task along
    • mixed-initiative dialog
  • Discourse phenomena
    • User expect to “converse” with the system
carnegie mellon communicator
Carnegie Mellon Communicator
  • A dialog system that supports complex problem solving in a travel planning domain
    • create an itinerary using air schedule, hotel and car information
    • 186 U.S. airports (>140k enplanements/yr)
      • currently: >500 world airports
  • Web-based data resources
    • Live and cached flight information
    • Airport, airline, etc. information
value schema handlers
Value schema/handlers

transform

receptors

value

Domain

Agent

compound schema
Value_1

Value_2

Value_3

Compound schema

transform

value

+

e.g. SQL query

Domain

Agent

schema ordering
Destination airport

Date

Time

Flight Leg

Database lookup

Available flights

Schema ordering

Schema i

Value i

Schema j

Value j

Schema k

Value k

transform

Value

carnegie mellon communicator28
Carnegie Mellon Communicator
  • CMU Communicator
    • Call: 268-5144
    • the information is accurate; you can use it for your own travel planning...
user aware speech interfaces
User-aware speech interfaces
  • Predictable behavior on the system’s part
  • Users coomunicate at different levels
  • http://www.speech.cs.cmu.edu/air/papers/InterfaceChars.html
user aware speech interfaces30
User-aware speech interfaces
  • Content: task-centric utterances
  • Possibility: What can I do?
  • Orientation: Where are we?
  • Navigation: moving through the task space
  • Control: verbose/terse, listen!
  • Customization: define this word
speech interface guidelines
Speech interface guidelines
  • Speech recognition is errorful
  • System state is often opaque to the user
  • http://www.speech.cs.cmu.edu/air/papers/SpInGuidelines/SpInGuidelines.html
interface guidelines
Interface guidelines
  • State transparency
  • Input control
  • Error recovery
  • Error detection
  • Error correction
  • Log performance
  • Application integration
summary
Summary
  • Speech and language communication
  • Dialog structure
  • Interface design
ad