Online supervised learning of non understanding recovery policies
Download
1 / 34

online supervised learning of non-understanding recovery policies - PowerPoint PPT Presentation


  • 127 Views
  • Uploaded on

online supervised learning of non-understanding recovery policies. Dan Bohus www.cs.cmu.edu/~dbohus dbohus@cs.cmu.edu Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213. with thanks to: Alex Rudnicky Brian Langner Antoine Raux Alan Black Maxine Eskenazi. ?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'online supervised learning of non-understanding recovery policies' - winona


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Online supervised learning of non understanding recovery policies l.jpg

online supervised learning of non-understanding recovery policies

Dan Bohus

www.cs.cmu.edu/~dbohus

dbohus@cs.cmu.edu

Computer Science Department

Carnegie Mellon University

Pittsburgh, PA 15213

with thanks to:

Alex Rudnicky

Brian Langner

Antoine Raux

Alan Black

Maxine Eskenazi


Understanding errors in spoken dialog l.jpg

?

?

?

S:

S:

  • Did you say Berlin?

  • from Berlin … where to?

  • Sorry, I didn’t catch that …

  • Can you repeat that?

  • Can you rephrase that?

  • Where are you flying from?

  • Please tell me the name of the city you are leaving from …

  • Could you please go to a quieter place?

  • Sorry, I didn’t catch that … tell me the state first …

understanding-errors in spoken dialog

MIS-understanding

NON-understanding

System constructs an incorrect semantic representation of the user’s turn

System fails to construct a semantic representation of the user’s turn

S: Where are you flying from?

U: Birmingham

[BERLIN PM]

S: Where are you flying from?

U: Urbana Champaign

[OKAY IN THAT SAME PAY]


Recovery strategies l.jpg
recovery strategies

  • large set of strategies (“strategy” = 1-step action)

  • tradeoffs not well understood

  • some strategies are more appropriate at certain times

    • OOV -> ask repeat is not a good idea

    • door slam -> ask repeat might work well

  • Sorry, I didn’t catch that …

  • Can you repeat that?

  • Can you rephrase that?

  • Where are you flying from?

  • Please tell me the name of the city you are leaving from …

  • Could you please go to a quieter place?

  • Sorry, I didn’t catch that … tell me the state first …

S:


Recovery policy l.jpg
recovery policy

  • “policy” = method for choosing between strategies

  • difficult to handcraft

    • especially over a large set of recovery strategies

  • common approaches

    • heuristic

    • “three strikes and you’re out” [Balentine]

      • 1st non-understanding: ask user to repeat

      • 2nd non-understanding: provide more help, including examples

      • 3rd non-understanding: transfer to an operator


This talk l.jpg
this talk …

… an online, supervised method for learning a non-understanding recovery policy from data


Overview l.jpg
overview

  • introduction

  • approach

  • experimental setup

  • results

  • discussion


Overview7 l.jpg
overview

  • introduction

  • approach

  • experimental setup

  • results

  • discussion


Intuition l.jpg
intuition …

… if we knew the probability of success for each strategy in the current situation, we could easily construct a policy

S: Where are you flying from?

U: [OKAY IN THAT SAME PAY] Urbana Champaign

S:

  • Sorry, I didn’t catch that …

  • Can you repeat that?

  • Can you rephrase that?

  • Where are you flying from?

  • Please tell me the name of the city you are leaving from …

  • Could you please go to a quieter place?

  • Sorry, I didn’t catch that … tell me the state first …

32%

15%

20%

30%

45%

25%

43%


Two step approach l.jpg
two step approach

step 1: learn to estimate probability of success for each strategy, in a given situation

step 2: use these estimates to choose between strategies (and hence build a policy)


Learning predictors for strategy success l.jpg
learning predictors for strategy success

  • supervised learning: logistic regression

    • target: strategy recovery successfully or not

      • “success” = next turn is correctly understood

      • labeled semi-automatically

    • features: describe current situation

      • extracted from different knowledge sources

        • recognition features

        • language understanding features

        • dialog-level features [state, history]


Logistic regression l.jpg
logistic regression

  • well-calibrated class-posterior probabilities

    • predictions reflect empirical probability of success

    • x% of cases where P(S|F)=x are indeed successful

  • sample efficient

    • one model per strategy, so data will be sparse

  • stepwise construction

    • automatic feature selection

  • provide confidence bounds

    • very useful for online learning


Two step approach12 l.jpg
two step approach

step 1: learn to estimate probability of success for each strategy, in a given situation

step 2: use these estimates to choose between strategies (and hence build a policy)


Policy learning l.jpg
policy learning

  • choose strategy most likely to succeed

1

0

S1 S2 S3 S4

  • BUT:

    • we want to learn online

    • we have to deal with the exploration / exploitation tradeoff


Highest upper bound learning l.jpg
highest-upper-bound learning

  • choose strategy with highest-upper-bound

    • proposed by [Kaelbling 93]

    • empirically shown to do well in various problems

  • intuition

1

1

0

0

S1 S2 S3 S4

S1 S2 S3 S4

exploration

exploitation


Highest upper bound learning15 l.jpg
highest-upper-bound learning

  • choose strategy with highest upper bound

    • proposed by [Kaelbling 93]

    • empirically shown to do well in various problems

  • intuition

1

1

0

0

S1 S2 S3 S4

S1 S2 S3 S4

exploration

exploitation


Highest upper bound learning16 l.jpg
highest-upper-bound learning

  • choose strategy with highest upper bound

    • proposed by [Kaelbling 93]

    • empirically shown to do well in various problems

  • intuition

1

1

0

0

S1 S2 S3 S4

S1 S2 S3 S4

exploration

exploitation


Highest upper bound learning17 l.jpg
highest-upper-bound learning

  • choose strategy with highest upper bound

    • proposed by [Kaelbling 93]

    • empirically shown to do well in various problems

  • intuition

1

1

0

0

S1 S2 S3 S4

S1 S2 S3 S4

exploration

exploitation


Highest upper bound learning18 l.jpg
highest-upper-bound learning

  • choose strategy with highest upper bound

    • proposed by [Kaelbling 93]

    • empirically shown to do well in various problems

  • intuition

1

1

0

0

S1 S2 S3 S4

S1 S2 S3 S4

exploration

exploitation


Overview19 l.jpg
overview

  • introduction

  • approach

  • experimental setup

  • results

  • discussion


System l.jpg
system

  • Let’s Go! Public bus information system

  • connected to PAT customer service line during non-business hours

  • ~30-50 calls / night



Constraints l.jpg
constraints

  • constraints

    • don’t AREP more than twice in a row

    • don’t ARPH if #words <= 3

    • don’t ASA unless #words > 5

    • don’t ASO unless (4 nonu in a row) and (ratio.nonu > 50%)

    • don’t GUP unless (dialog > 30 turns) and (ratio.nonu > 80%)

  • capture expert knowledge; ensure system doesn’t use an unreasonable policy

  • 4.2/11 strategies available on average

    • min=1, max=9


Features l.jpg
features

  • current non-understanding

    • recognition, lexical, grammar, timing info

  • current non-understanding segment

    • length, which strategies already taken

  • current dialog state and history

    • encoded dialog states

    • “how good things have been going”


Learning l.jpg
learning

  • baseline period [2 weeks, 3/11 -> 3/25, 2006]

    • system randomly chose a strategy, while obeying constraints

    • in effect, a heuristic / stochastic policy

  • learning period [5 weeks, 3/26 -> 5/5, 2006]

    • each morning labeled data from previous night

    • retrained likelihood of success predictors

    • installed in the system for the next night



Overview26 l.jpg
overview

  • introduction

  • approach

  • experimental setup

  • results

  • discussion


Results l.jpg
results

  • average non-understanding recovery rate (ANNR)

  • improvement: 33.6%  37.8% (p=0.03) (12.5%rel)

  • fitted learning curve:

A = 0.3385

B = 0.0470

C = 0.5566

D = -11.44


Policy evolution l.jpg

MOVE

ASA

HLP

IT

RP

HLP_R

SLL

ARPH

AREP

policy evolution

  • MOVE, HLP, ASA engaged more often

  • AREP, ARPH engaged less often


Overview29 l.jpg
overview

  • introduction

  • approach

  • experimental setup

  • results

  • discussion


Are the predictors learning anything l.jpg
are the predictors learning anything?

  • AREP(653), IT(273), SLL(300)

    • no informative features

  • ARPH(674), MOVE(1514)

    • 1 informative feature (#prev.nonu, #words)

  • ASA(637), RP(2532), HLP(3698), HLP_R(989)

    • 4 or more informative features in the model

      • dialog state (especially explicit confirm states)

      • dialog history


More features more specific strategies l.jpg
more features, more (specific) strategies

  • more features would be useful

    • day-of-week

    • clustered dialog states

    • ? (any ideas?) ?

  • more strategies / variants

    • approach might be able to filter out bad versions

    • more specific strategies, features

      • ask short answers worked well …

      • speak less loud didn’t … (why?)


Noise in the experiment l.jpg
“noise” in the experiment

  • ~15-20% of responses following non-understandings are non-user-responses

    • transient noises

    • secondary speech

    • primary speech not directed to the system

  • this might affect training, in a future experiment we want to eliminate that


Unsupervised learning l.jpg
unsupervised learning

  • supervised version

    • “success” = next turn is correctly understood[i.e. no misunderstanding, no non-understanding]

  • unsupervised version

    • “success” = next turn is not a non-understanding

    • “success” = confidence score of next turn

    • training labels automatically available

    • performance improvements might still be possible