belief updating in spoken dialog systems
Download
Skip this Video
Download Presentation
Belief Updating in Spoken Dialog Systems

Loading in 2 Seconds...

play fullscreen
1 / 39

Belief Updating in Spoken Dialog Systems - PowerPoint PPT Presentation


  • 143 Views
  • Uploaded on

Belief Updating in Spoken Dialog Systems. Dan Bohus www.cs.cmu.edu/~dbohus [email protected] Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217. problem. stems mostly from speech recognition spans most domains and interaction types.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Belief Updating in Spoken Dialog Systems' - bernad


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
belief updating in spoken dialog systems

Belief Updating in Spoken Dialog Systems

Dan Bohus

www.cs.cmu.edu/~dbohus

[email protected]

Computer Science Department

Carnegie Mellon University

Pittsburgh, PA, 15217

problem
problem
  • stems mostly from speech recognition
  • spans most domains and interaction types

spoken language interfaces lack robustness when faced with understanding errors.

more concretely
more concretely …

S: What city are you leaving from?

U: Urbana Champaign [OKAY IN THAT SAME PAY]

S: Sorry I’m not sure I understood what you said. What city are you leaving from ?

U: Urbana Champaign [FOR MINUTE SINCE HEY]

S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from?

U: Chicago [CHICAGO]

S: traveling from Chicago. Where would you like to go?

U: Huntsville [SEOUL]

S: traveling to Seoul. What day did you need to travel?

U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M]

S: traveling in the afternoon. Okay what day would you be departing chicago

U: /uh/ the tenth of august [AT THE TENTH OF AUGUST]

S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that?

U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND]

S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………

non and mis understandings

NON

understanding

MIS

understanding

non- and misunderstandings

S: What city are you leaving from?

U: Urbana Champaign [OKAY IN THAT SAME PAY]

S: Sorry I’m not sure I understood what you said. What city are you leaving from ?

U: Urbana Champaign [FOR MINUTE SINCE HEY]

S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from?

U: Chicago [CHICAGO]

S: traveling from Chicago. Where would you like to go?

U: Huntsville [SEOUL]

S: traveling to Seoul. What day did you need to travel?

U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M]

S: traveling in the afternoon. Okay what day would you be departing chicago

U: /uh/ the tenth of august [AT THE TENTH OF AUGUST]

S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that?

U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND]

S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………

approaches for increasing robustness
approaches for increasing robustness
  • gracefully handle errors through interaction
  • fix recognition
  • detect the problems
  • develop a set of recovery strategies
  • know how to choose between them (policy)
six not so easy pieces

misunderstandings

non-understandings

detection

strategies

policy

six not-so-easy pieces …
belief updating
belief updating
  • construct more accurate beliefs by integrating information over multiple turns

misunderstandings

detection

S: Where would you like to go?

U: Huntsville

[SEOUL / 0.65]

destination = {seoul/0.65}

S: traveling to Seoul. What day did you need to travel?

U: no no I’m traveling to Birmingham

[THE TRAVELING TO BERLIN P_M / 0.60]

destination = {?}

belief updating problem statement
belief updating: problem statement
  • given:
    • an initial belief Pinitial(C) over concept C
    • a system action SA
    • a user response R
  • construct an updated belief:
    • Pupdated(C) ← f (Pinitial(C), SA, R)

destination = {seoul/0.65}

S: traveling to Seoul. What day did you need to travel?

[THE TRAVELING TO BERLIN P_M / 0.60]

destination = {?}

outline
outline
  • related work
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • some caveats and future work

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

confidence annotation heuristic updates
confidence annotation + heuristic updates
  • confidence annotation
    • traditionally focused on word-level errors [Chase, Cox, Bansal, Ravinshankar]
    • more recently: semantic confidence annotation [Walker, San-Segundo, Bohus]
      • machine learning approach
      • results fairly good, but not perfect
  • heuristic updates
    • explicit confirmation: no → don’t trust ; yes → trust
    • implicit confirmation: no → don’t trust ; o/w → trust
    • suboptimal for several reasons

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

correction detection
correction detection
  • detect if the user is trying to correct the system [Litman, Swerts, Hirschberg, Krahmer, Levow]
  • machine learning approach
    • features from different knowledge sources in the system
    • results fairly good, but not perfect

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

integration
integration
  • confidence annotation and correction detection are useful tools
  • but separately, neither solves the problem
  • bridge together in a unified approach to accurately track beliefs

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

outline1
outline
  • related work
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • some caveats and future work

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

belief updating general form
belief updating: general form
  • given:
    • an initial belief Pinitial(C) over concept C
    • a system action SA
    • a user response R
  • construct an updated belief:
    • Pupdated(C) ← f (Pinitial(C), SA, R)

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

restricted version 2 simplifications
restricted version: 2 simplifications
  • compact belief
    • system unlikely to “hear” more than 3 or 4 values
      • single vs. multiple recognition results
    • in our data: max = 3 values, only 6.9% have >1 value
    • confidence score of top hypothesis
  • updates after confirmation actions
  • reduced problem
    • ConfTopupdated(C) ← f (ConfTopinitial(C), SA, R)

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

outline2
outline
  • related work
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • some caveats and future work

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

slide17
data
  • collected with RoomLine
    • a phone-based mixed-initiative spoken dialog system
    • conference room reservation
      • search and negotiation
  • explicit and implicit confirmations
    • confidence threshold model (+ some exploration)
  • unplanned implicit confirmations
  • I found 10 rooms for Friday between 1 and 3 p.m. Would like a small room or a large one?
  • I found 10 rooms for Friday between 1 and 3 p.m. Would like a small room or a large one?

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

corpus
corpus
  • user study
    • 46 participants (naïve users)
    • 10 scenario-based interactions each
    • compensated per task success
  • corpus
    • 449 sessions, 8848 user turns
    • orthographically transcribed
    • rich annotation: correct concepts, corrections, etc.

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

outline3
outline
  • related work
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • some caveats and future work

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

user response types
user response types
  • following Krahmer and Swerts
    • study on Dutch train-table information system
  • 3 user response types
    • YES: yes, right, that’s right, correct, etc.
    • NO: no, wrong, etc.
    • OTHER
  • cross-tabulated against correctness of confirmations

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

user responses to explicit confirmations

~10%

user responses to explicit confirmations
  • from transcripts

[numbers in brackets from Krahmer&Swerts]

  • from decoded

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

other responses to explicit confirmations
other responses to explicit confirmations
  • ~70% users repeat the correct value
  • ~15% users don’t address the question
    • attempt to shift conversation focus

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

user responses to implicit confirmations
user responses to implicit confirmations
  • Transcripts

[numbers in brackets from Krahmer&Swerts]

  • Decoded

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

ignoring errors in implicit confirmations
ignoring errors in implicit confirmations
  • users correct later (40% of 118)
  • users interact strategically
    • correct only if essential

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

outline4
outline
  • related work
  • a restricted version
  • data
  • user response analysis
  • experiments and results
  • some caveats and future work

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

machine learning approach
machine learning approach
  • need good probability outputs
  • low cross-entropy between model predictions and reality
    • cross-entropy = negative average log posterior
  • logistic regression
    • sample efficient
    • stepwise approach → feature selection
  • logistic model tree for each action
    • root splits on response-type

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

features target
features. target.
  • initial situation
    • initial confidence score
    • concept identity, dialog state, turn number
  • system action
    • other actions performed in parallel
  • features of the user response
    • acoustic / prosodic features
    • lexical features
    • grammatical features
    • dialog-level features
  • target: was the value correct?

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

baselines
baselines
  • initial baseline
    • accuracy of system beliefs before the update
  • heuristic baseline
    • accuracy of heuristic rule currently used in the system
  • oracle baseline
    • accuracy if we knew exactly when the user is correcting the system

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

results explicit confirmation
results: explicit confirmation

Hard error (%)

Soft error

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

results implicit confirmation
results: implicit confirmation

Hard error (%)

Soft error

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

results unplanned implicit confirmation
results: unplanned implicit confirmation

Hard error (%)

Soft error

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

informative features
informative features
  • initial confidence score
  • prosody features
  • barge-in
  • expectation match
  • repeated grammar slots
  • concept id

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

outline5
outline
  • related work
  • a reduced version. approach
  • data
  • user response analysis
  • experiments and results
  • some caveats and future work

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

eliminate simplification 1
eliminate simplification 1
  • current restricted version
    • belief = confidence score of top hypothesis
    • only 6.9% of cases had more than 1 hypothesis
  • extend to
    • Nhypotheses + 1 (other), where N is a small integer (2 or 3)
    • approach: multinomial generalized linear model
    • use information from multiple recognition hypotheses

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

eliminate simplification 2
eliminate simplification 2
  • current restricted version
    • only updates following system confirmation actions
  • users might correct the system at any point
  • extend to
    • updates after all system actions

related work : restricted version : data : user response analysis : experiment & results : caveats & future work

shameless self promotion

misunderstandings

non-understandings

detection

strategies

policy

shameless self promotion

- rejection threshold adaptation

- nonu impact on performance

[Interspeech-05]

- comparative analysis of 10 recovery strategies

[SIGdial-05]

  • wizard experiment
  • towards learning nonu recovery policies [Sigdial-05]
shameless cmu promotion
shameless CMU promotion
  • Ananlada (Moss) Chotimongkol
    • automatic concept and task structure acquisition
  • Antoine Raux
    • turn-taking, conversation micro-management
  • Jahanzeb Sherwani
    • multimodal personal information management
  • Satanjeev Banerjee
    • meeting understanding
  • Stefanie Tomko
    • universal speech interface
  • Thomas Harris
    • multi-participant dialog
  • DoD / Young Researchers’ Roundtable
a more subtle caveat
a more subtle caveat
  • distribution of training data
    • confidence annotator + heuristic update rules
  • distribution of run-time data
    • confidence annotator + learned model
  • always a problem when interacting with the world
  • hopefully, distribution shift will not cause large degradation in performance
    • remains to validate empirically
    • maybe a bootstrap approach?
ad