Outline of talk
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

Outline of Talk PowerPoint PPT Presentation


  • 127 Views
  • Uploaded on
  • Presentation posted in: General

Outline of Talk. A brief historical perspective Cognitive User Interfaces Statistical Dialogue Modelling Scaling to the Real World System Architecture Some Examples and Results Conclusions and future work. Why Talk to Machines?.

Download Presentation

Outline of Talk

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Outline of talk

Outline of Talk

  • A brief historical perspective

  • Cognitive User Interfaces

  • Statistical Dialogue Modelling

  • Scaling to the Real World

  • System Architecture

  • Some Examples and Results

  • Conclusions and future work.


Why talk to machines

Why Talk to Machines?

  • it should be an easy and efficient way of finding out information and controlling behaviour

  • sometimes it is the only way

    • hands-busy eg surgeon, driver, package handler, etc.

    • no internet and no call-centres e.g. areas of 3rd world

    • very small devices

  • one day it might be func.f. Project Natal - Milo


Vodis circa 1985

VODIS - circa 1985

Natural language/mixed initiative Train-timetable Inquiry Service

150 word DTW connected speech recognition

8 x 8086

Processors

PDP11/45

Logos Speech Recogniser

Frame-based

Dialogue

Manager

Words

Recognition

Grammars

DecTalk

Synthesiser

128k Mem/

2x5Mb Disk

Text

Demo

Collaboration between BT, Logica and Cambridge U.


Some desirable properties of a spoken dialogue system

Some desirable properties of a Spoken Dialogue System

  • able to support reasoning and inference

    • interpret noisy inputs and resolve ambiguities in context

  • able to plan under uncertainty

    • clearly defined communicative goals

    • performance quantified as rewards

    • plans optimized to maximize rewards

  • able to adapt on-line

    • robust to speaker (accent, vocab, behaviour,..)

    • robust to environment (noise, location, ..)

  • able to learn from experience

    • progressively optimize models and plans over time

Cognitive

User

Interface

S. Young (2010). "Cognitive User Interfaces." Signal Processing Magazine 27(3)


Essential ingredients of a cognitive user interface cui

Essential Ingredients of a Cognitive User Interface (CUI)

  • Explicit representation of uncertainty using a probability model over dialogue states e.g. using Bayesian networks

  • Inputs regarded as observations used to update the posterior state probabilities via inference

  • Responses defined by plans which map internal states to actions

  • The system’s design objectives defined by rewards associated with specific state/action pairs

  • Plans optimized via reinforcement learning

  • Model parameters estimated via supervised learning and/or optimized via reinforcement learning

Partially Observable

Markov Decision Process (POMDP)


A framework for statistical dialogue management

A Framework for Statistical Dialogue Management

Distribution

Parameters λ

Σ

R = r(bt,at)

t

Belief

Observation

bt = P(st|ot-1,bt-1; λ )

ot

Model

Distribution

of Dialogue

States st

Speech

Understanding

User

Policy

π(at|bt,θ)

Policy

Parameters θ

Action at

Response

Generation

Reward

Reward

Function r

bt,at


Belief tracking aka belief monitoring

Belief Tracking aka Belief Monitoring

Belief is updated following each new user input

However, the state space is huge and the above equation is intractable for practical systems. So we approximate:

Hidden Information

State System (HIS)

Track just the N most likely states

Factorise the state space and

ignore all but major conditional

dependencies

Graphical Model

System (GMS aka BUDS)

S. Young (2010). "The Hidden Information State Model" Computer Speech and Language 24(2)

B. Thomson (2010). "Bayesian update of dialogue state" Computer Speech and Language 24(4)


Dialogue state

Dialogue State

  • Tourist Information Domain

  • type = bar,restaurant

  • food = French, Chinese, none

gtype

Goal

gfood

User

Behaviour

hfood

utype

htype

User

Act

Recognition/

Understanding

Errors

Next Time Slice t+1

ufood

otype

Memory

ofood

History

Observation

at time t

J. Williams (2007). ”POMDPs for Spoken Dialog Systems." Computer Speech and Language 21(2)


Dialogue model parameters

Dialogue Model Parameters

(ignoring history nodes for simplicity)

gtype

gfood

gtype

gfood

utype

ufood

utype

ufood

otype

ofood

otype

ofood

time t

time t+1


Belief monitoring tracking

Belief Monitoring (Tracking)

gfood

gtype

gfood

gtype

F

C

-

B

R

F

C

-

B

R

ufood

utype

utype

ufood

t=1

ofood

ofood

otype

otype

t=2

inform(food=french) {0.9}

confirm(food=french)

affirm() {0.9}


Belief monitoring tracking1

Belief Monitoring (Tracking)

gfood

gtype

gfood

gtype

F

C

-

B

R

F

C

-

B

R

ufood

utype

utype

ufood

t=1

ofood

ofood

otype

otype

t=2

inform(type=bar,

food=french) {0.6}

inform(type=restaurant,

food=french) {0.3}

confirm(type=restaurant,

food=french)

affirm() {0.9}


Belief monitoring tracking2

Belief Monitoring (Tracking)

gfood

gtype

gfood

gtype

F

C

-

B

R

F

C

-

B

R

ufood

utype

utype

ufood

t=1

ofood

ofood

otype

otype

t=2

inform(type=bar) {0.4}

select(type=bar,

type=restaurant)

inform(type=bar) {0.4}


Choosing the next action the policy

Choosing the next action – the Policy

gtype

gfood

Policy Vector

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

1

0

type

food

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

1

0

0

0

0

1

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

1

0

Quantize

F

C

-

B

R

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

a = select

Sample

Map

All Possible Summary

Actions: inform, select, confirm, etc

select(type=bar,

type=restaurant)

inform(type=bar) {0.4}


Policy optimization

Policy Optimization

Policy parameters chosen to maximize expected reward

Natural gradient ascent works well

Fisher

Information

Matrix

Gradient is estimated by sampling dialogues and in practice

Fisher Information Matrix does not need to be explicitly computed.

This is the Natural Actor Critic Algorithm.

J. Peters and S. Schaal (2008). "Natural Actor-Critic." Neurocomputing 71(7-9)


Dialogue model parameter optimization

Dialogue Model Parameter Optimization

Approximation of belief distribution via feature vectors

prevents policy differentiation wrt Dialogue Model parameters .

However a trick can be used. Assume that are drawn from a prior which is differentiable wrt . Then optimize reward wrt to and sample to get .

This is the Natural Belief Critic Algorithm.

It is also possible to do maximum likelihood model parameter estimation using Expectation Propagation.

F. Jurcicek (2010). "Natural Belief-Critic" Interspeech 2010

B. Thomson (2010). "Parameter learning for POMDP spoken dialogue models. SLT 2010


Performance comparison in simulated towninfo domain

Performance Comparison in Simulated TownInfo Domain

Handcrafted Model

and Handcrafted Policy

Trained Model

and Trained Policy

Handcrafted Model

and Trained Policy

Handcrafted Policy and Trained Model

Reward = 100 for success – 1 for each turn taken


Scaling up to real world problems

Scaling up to Real World Problems

  • compact representation of dialogue state eg HIS, BUDS

  • mapping belief states into summary states via quantisation, feature vectors, etc

  • mapping actions in summary space back into full space

Several of the key ideas have already been covered

But inference itself is also a problem …


Caminfo ontology

CamInfo Ontology

  • Many concepts

  • Many valuesper concept

multiple nodes per concept

Complex

Dialogue

State


Belief propagation times

Belief Propagation Times

Time

Standard

LBP

LBP with

Grouping

LBP with

Grouping &

Const Prob

of Change

Network Branching Factor

B. Thomson (2010). "Bayesian update of dialogue state" Computer Speech and Language 24(4)


Architecture of the cambridge statistical sds

Architecture of the Cambridge Statistical SDS

Run-time mode

dialog

acts

speech

words

Speech

Recognition

Semantic

Decoder

Dialogue

Manager

HIS

or

BUDS

y

p(w|y)

p(v|y)

dialog

acts

speech

words

Message

Generator

Speech

Synthesiser

p(m|a)

p(x|a)

a

Corpus Data


Architecture of the cambridge statistical sds1

Architecture of the Cambridge Statistical SDS

Training mode

dialog

acts

Dialogue

Manager

HIS

or

BUDS

User

Simulator

Error

Model

p(v|y)

dialog

acts

a

Corpus Data


Cmu let s go spoken dialogue challenge

CMU Let’s Go Spoken Dialogue Challenge

  • Telephone-based spoken dialog system to provide bus schedule information for the City of Pittsburgh, PA (USA).

  • Based on existing system with real users.

  • Two stage evaluation process

    • Control Test with recruited subjects given specific known tasks

    • Live Test with competing implementations switched according to a daily schedule

  • Full results to be presented at a special session at SLT

Organised by the Dialog Research Center, CMU

See http://www.dialrc.org/sdc/


Let s go 2010 control test results

Let’s Go 2010 Control Test Results

All Qualifying Systems

Predicted

Success

Rate

System Z

89% Success

33% WER

System X

65% Success

42% WER

B. Thomson "Bayesian Update of State for the Let's Go Spoken Dialogue Challenge.” SLT 2010.

System Y

75% Success

34% WER

Average Success = 64.8%

Average WER = 42.4%

Word Error Rate (WER)


Caminfo demo

CamInfo Demo


Conclusions

Conclusions

  • End-end statistical dialogue systems can be built and are competitive

  • Core is a POMDP-based dialogue manager which provides an explicit representation of uncertainty with the following benefits

    • robust to recognition errors

    • objective measure of goodness via reward function

    • ability to optimize performance against objectives

    • reduced development costs – no hand-tuning, no complex design processes, easily ported to new applications

    • natural dialogue – say anything, any time

  • Still much to do

    • faster learning, off-policy learning, long term adaptation, dynamic ontologies, multi-modal input/output

  • Perhaps talking to machines is within reach ….


Credits

Credits

EU FP7 Project: Computational Learning in Adaptive Systems for Spoken Conversation

Spoken Dialogue Management using Partially Observable Markov Decision Processes

Past and Present Members of the CUED Dialogue Systems Group

Milica Gasic, Filip Jurcicek, Simon Keizer, Fabrice Lefevre,

Francois Mairesse, Jorge Prombonas, Jost Schatzmann,

Matt Stuttle, Blaise Thomson, Karl Weilhammer, Jason Williams,

Hui Ye, Kai Yu


  • Login