Speech recognition understanding and conversational interfaces
Download
1 / 33

Jeruels - PowerPoint PPT Presentation


  • 391 Views
  • Updated On :

Speech recognition, understanding and conversational interfaces. Alexander Rudnicky School of Computer Science http://www.cs.cmu.edu/~air. Outline. Speech Types of speech interfaces Speech systems and their structure Designing speech interfaces Some applications SpeechWear Communicator.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Jeruels' - salena


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Speech recognition understanding and conversational interfaces l.jpg

Speech recognition, understanding and conversational interfaces

Alexander Rudnicky

School of Computer Science

http://www.cs.cmu.edu/~air


Outline l.jpg
Outline interfaces

  • Speech

  • Types of speech interfaces

  • Speech systems and their structure

  • Designing speech interfaces

  • Some applications

    • SpeechWear

    • Communicator


Speech as a signal l.jpg
Speech as a signal interfaces

  • The difference between speech and sound

    • “CD” quality vs. intelligible quality

      • high-quality is 44.1 / 48 kHz

      • desirable speech bandwidth: 0-8kHz, 16bits

        • at 16bits/sample: 256kbps (tethered mic)

        • telephone: 64kbps (and lower)

    • Compression:

      • MPEG: 64kbps/channel and up (but not speech-optimal)

      • CELP: 16kbps … 2.4kbps (optimized for speech)


Speech for communication l.jpg
Speech for communication interfaces

  • The difference between speech and language

  • Speech recognition and speech understanding


Computers and speech l.jpg
Computers and speech interfaces

  • Transcription

    • dictation, information retrieval

  • Command and control

    • data entry, device control, navigation

  • Information access

    • airline schedules, stock quotes

  • Problem solving

    • travel planning, logistics


Speech system architecture l.jpg
Speech system architecture interfaces

  • SIGNAL PROCESSING

  • DECODING

  • UNDERSTANDING

  • DISCOURSE

  • ACTION



A generic speech system l.jpg

Signal interfaces

processing

Parser

Dialog

manager

Language

Generator

Decoder

Post

parser

Speech

synthesizer

Domain

agent

Domain

agent

Domain

agent

speech

display

effector

A generic speech system

speech


Decoding speech l.jpg

Signal

processing

  • Transcribe speech to words

Decoder

Decoding speech

Acoustic

models

Language

models

Corpus-base statistical models


Creating models for recognition l.jpg
Creating models for recognition interfaces

Speech

data

Acoustic

models

Transcribe*

Train

Text

data

Language

models

Train


Understanding speech l.jpg
Understanding speech interfaces

Grammar

Ontology design, language acquisition

Parser

  • Extract semantic content from utterance

Post

parser

  • Introduce context and world knowledge into interpretation

Context

Domain

Agents

Grounding, knowledge engineering


Interacting with the user l.jpg
Interacting with the user interfaces

Task

schemas

Task analysis

Context

Dialog

manager

  • Guide interaction through task

  • Map user inputs and system state into actions

Domain

agent

  • Interact with back-end(s)

  • Interpret information using domain knowledge

Domain

agent

Domain

agent

Database

Live data

(e.g. Web)

Domain

expert

Knowledge engineering


Communicating with the user l.jpg
Communicating with the user interfaces

Language

Generator

  • Decide what to say to user (and how to phrase it)

Speech

synthesizer

Display

Generator

Action

Generator


Speech recognition and understanding l.jpg
Speech recognition and understanding interfaces

  • Sphinx system

    • speaker-independent

    • continuous speech

    • large vocabulary

  • ATIS system

    • air travel information retrieval

    • context management

  • film clip


Command and control systems l.jpg
Command and control systems interfaces

  • Small vocabularies, fixed syntax

    • OPEN WINDOW <window_id>

    • MOVE OBJECT <object_id> to <position>

    • Applications:

      • data entry (e.g., zip codes), process control (e.g., electron microscope, darkroom equipment)

  • Large vocabulary, fixed syntax

    • Web browsing (?)


Speechwear l.jpg
SpeechWear interfaces

  • Vehicle inspection task

    • USMC mechanics, fixed inspection form

    • Wearable computer (COTS components)

    • html-based task representation

  • film clip


Information access l.jpg
Information access interfaces

  • Moderate to very large vocabulary

    • IVR and frame based systems

  • Commercial systems:

    • Nuance: http://www.nuance.com/demo/index.html

    • SpeechWorks: http://www.speechworks.com/demos/demos.htm

    • lots of others..


Ivr and frame based systems l.jpg
IVR and frame-based systems interfaces

  • Interactive voice response (IVR)

    • interactions specified by a graph (typically a tree)

  • Frame systems

    • ergodic graphs

    • states defined by multi-item forms


Graph based systems l.jpg
Graph-based systems interfaces

Welcome to Bank ABC!

Please say one of the following:

Balance, Hours, Loan, ...

What type of loan are you interested in?

Please sayone of the following:

Mortgage, Car, Personal, ...

. . . .


Frame based systems l.jpg

Destination_City: interfacesBoston

Departure_Date: ______

Departure_Time: ______

Preferred_Airline: ______

.

.

.

Frame-based systems

  • I would like to fly to Boston

    • I’d like to go to Boston on Friday, …

  • When would you like to fly?


Frame based systems21 l.jpg
Frame-based systems interfaces

Zxfgdh_dxab: _____

askjs: _____

dhe: _____

aa_hgjs_aa: _____

.

.

Transition on

keyword or phrase

Zxfgdh_dxab: _____

askjs: _____

dhe: _____

aa_hgjs_aa: _____

.

.

Zxfgdh_dxab: _____

askjs: _____

dhe: _____

aa_hgjs_aa: _____

.

.

Zxfgdh_dxab: _____

askjs: _____

dhe: _____

aa_hgjs_aa: _____

.

.

Zxfgdh_dxab: _____

askjs: _____

dhe: _____

aa_hgjs_aa: _____

.

.


Some problems l.jpg
Some problems interfaces

  • IVR systems work great, but only for well-structured (& “shallow”) tasks

  • Frame systems are good for “tasks” that correspond to a single form leading to an action

  • Neither approach does well with more complex problem-solving activities


Dialog systems l.jpg
Dialog Systems interfaces

  • Problem solving activity; complex task

    • Order of progression through task depends on user goals (which can change) and system state (a back-end retrieval) and is not predictable.

  • Track progress and help task along

    • mixed-initiative dialog

  • Discourse phenomena

    • User expect to “converse” with the system


Carnegie mellon communicator l.jpg
Carnegie Mellon Communicator interfaces

  • A dialog system that supports complex problem solving in a travel planning domain

    • create an itinerary using air schedule, hotel and car information

    • 186 U.S. airports (>140k enplanements/yr)

      • currently: >500 world airports

  • Web-based data resources

    • Live and cached flight information

    • Airport, airline, etc. information


Value schema handlers l.jpg
Value schema/handlers interfaces

transform

receptors

value

Domain

Agent


Compound schema l.jpg

Value_1 interfaces

Value_2

Value_3

Compound schema

transform

value

+

e.g. SQL query

Domain

Agent


Schema ordering l.jpg

Destination airport interfaces

Date

Time

Flight Leg

Database lookup

Available flights

Schema ordering

Schema i

Value i

Schema j

Value j

Schema k

Value k

transform

Value


Carnegie mellon communicator28 l.jpg
Carnegie Mellon Communicator interfaces

  • CMU Communicator

    • Call: 268-5144

    • the information is accurate; you can use it for your own travel planning...


User aware speech interfaces l.jpg
User-aware speech interfaces interfaces

  • Predictable behavior on the system’s part

  • Users coomunicate at different levels

  • http://www.speech.cs.cmu.edu/air/papers/InterfaceChars.html


User aware speech interfaces30 l.jpg
User-aware speech interfaces interfaces

  • Content: task-centric utterances

  • Possibility: What can I do?

  • Orientation: Where are we?

  • Navigation: moving through the task space

  • Control: verbose/terse, listen!

  • Customization: define this word


Speech interface guidelines l.jpg
Speech interface guidelines interfaces

  • Speech recognition is errorful

  • System state is often opaque to the user

  • http://www.speech.cs.cmu.edu/air/papers/SpInGuidelines/SpInGuidelines.html


Interface guidelines l.jpg
Interface guidelines interfaces

  • State transparency

  • Input control

  • Error recovery

  • Error detection

  • Error correction

  • Log performance

  • Application integration


Summary l.jpg
Summary interfaces

  • Speech and language communication

  • Dialog structure

  • Interface design


ad