Increased Robustness in Spoken Dialog Systems

Increased Robustness in Spoken Dialog Systems (roadmap to a thesis proposal) Dan Bohus, SPHINX Lunch, May 2003 Increased Robustness in Spoken Dialog Systems

The problem S: Are you a registered user ? U: No I'm not. No [NO I'M NOT NO] S: What is your full name ? U: Adam Schumacher [ATHENS IN AKRON] S: There is an Athens in Georgia and in Greece. Which destination did you want ? U: Neither [NEITHER] S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which destination did you want ? U: Georgia [GEORGIA] S: A flight from Athens... Where do you want to go ? U: Start over [START OVER] S: Do you really want to start over ? U: Yes [YES] S: What city are you leaving from ? U: Hamilton, Ontario [HILTON ONTARIO] S: Sorry, I'm not sure I understood what you said. Where are you leaving from ? U: Hamilton [HILTON] S: I'm still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from ? U: Toronto [TORONTO] Increased Robustness in Spoken Dialog Systems

40% 26% Failed 26% 66% contain misunderstandings Is this really a big problem? • And some statistics • CMU Communicator • 66% of the sessions contain a serious misunderstanding • 26% result in a complete breakdown in interaction • Remaining 40% are still frustrating experiences … • Lots of anecdotal evidence Increased Robustness in Spoken Dialog Systems

More statistics … • USC study [Shin et al (1)] • Labeled errors and user behavior on Communicator (multi-site) corpus • Average 1.66 error segments/session • 78% error segments get back on track • 37% of the sessions have errors leading to complete breakdown in interaction Failed 37% Increased Robustness in Spoken Dialog Systems

Yet more statistics … • Utterance level understanding error rates • CMU Communicator 32.4% → 66% of sess. [Rudnicky, Bohus et al (2)] • CU Communicator 27.5% → … of sess. [Segundo (3)] • HMIHY (ATT) 36.5% → … of sess. [Walker (4)] • Jupiter (MIT) 28.5% → … of sess. [Hazen (5)] Increased Robustness in Spoken Dialog Systems

It is a significant problem ! • Roughly… 10-30% lead to interaction breakdowns 60-70% contain misunderstandings Increased Robustness in Spoken Dialog Systems

Goal of proposed work interaction breakdowns sessions containing misunderstandings Increased Robustness in Spoken Dialog Systems

Outline • The problem • Sources of the problem • The approach • Infrastructure: the RavenClaw framework • Proposed work, in detail • Discussion Increased Robustness in Spoken Dialog Systems

Low accuracy of speechrecognition System incorrectly initiatesa disambiguation 1 2 3 System unable to handleuser’s response to disambiguation 4 Implicit verification is confusing Recognition errors 5 7 6 8 Only one recovery strategyavailable to the system: ask userto repeat The problems … in more detail S: Are you a registered user ? U: No I'm not. No [NO I'M NOT NO] S: What is your full name ? U: Adam Schumacher [ATHENS IN AKRON] S: There is an Athens in Georgia and in Greece. Which destination did you want ? U: Neither [NEITHER] S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which destination did you want ? U: Georgia [GEORGIA] S: A flight from Athens... Where do you want to go ? U: Start over [START OVER] S: Do you really want to start over ? U: Yes [YES] S: What city are you leaving from ? U: Hamilton, Ontario [HILTON ONTARIO] S: Sorry, I'm not sure I understood what you said. Where are you leaving from ? U: Hamilton [HILTON] S: I'm still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from ? U: Toronto [TORONTO] Increased Robustness in Spoken Dialog Systems

Three contributing factors 1. Low accuracy of speech recognition 2. Inability to assess reliability of beliefs 3. Lack of efficient error recovery and prevention mechanisms Increased Robustness in Spoken Dialog Systems

Factor 1: Low recognition accuracy • ASR still imperfect at best • Variability: environmental, speaker • 10-30% WER in spoken language systems • Tradeoff : Accuracy vs. System Flexibility • Effect: Main source of errors in SDS • WER → most important predictor of user satisfaction [Walker et al (6,7)] • Users prefer less flexible, more accurate systems [Walker et al (8)] Increased Robustness in Spoken Dialog Systems

Factor 2: Inability to assess reliability of beliefs • Errors typically propagate to the upper levels of the system, leading to: • Non-understandings • Misunderstandings • Effect: Misunderstandings are taken as facts and acted upon • At best: extra turns, user-initiated repairs, frustration • At worst: complete breakdown in interaction Increased Robustness in Spoken Dialog Systems

Factor 3: Lack of recovery mechanisms • Small number of strategies • Implicit and explicit verifications most popular • Sub-optimal implementations • Triggered in an ad-hoc / heuristic manner • Problem is often regarded as an add-on • Non-uniform, domain-specific treatment • Effect: Systems prone to complete breakdowns in interaction Increased Robustness in Spoken Dialog Systems

Three contributing factors … 1. Low accuracy of speech recognition 2. Inability to assess reliability of beliefs 3. Lack of efficient error recovery and prevention mechanisms Increased Robustness in Spoken Dialog Systems

Approach 1 1. Low accuracy of speech recognition 2. Inability to assess reliability of beliefs 3. Lack of efficient error recovery and prevention mechanisms Increased Robustness in Spoken Dialog Systems

Approach 2 1. Low accuracy of speech recognition 2. Inability to assess reliability of beliefs 3. Lack of efficient error recovery and prevention mechanisms Increased Robustness in Spoken Dialog Systems

Why not just fix ASR? • ASR performance is improving, but requirements are increasing too • ASR will not become perfect anytime soon • ASR is not the only source of errors • Approach 2: ensure robustness under a large variety of conditions Increased Robustness in Spoken Dialog Systems

Proposed solution • Assuming the inputs are unreliable: A.Make systems able to assess the reliability of their beliefs B.Optimally deploy a set of error prevention and recovery strategies Increased Robustness in Spoken Dialog Systems

Proposed solution – more precisely • Assuming the inputs are unreliable: 1.Compute grounding state indicators - reliability of beliefs (confidence annotation + updating) - correction detection - goodness-of-dialog metrics - other : user models, etc… B.Optimally deploy a set of error prevention and recovery strategies Increased Robustness in Spoken Dialog Systems

Proposed solution – more precisely • Assuming the inputs are unreliable: 1.Compute grounding state indicators - reliability of beliefs (confidence annotation + updating) - correction detection - goodness-of-dialog metrics - other : user models, etc… 2.Define the grounding actions - error prevention and recovery strategies 3.Create a grounding decision model - decides upon the optimal strategy to employ at a given point • Do it in a domain-independent manner ! Increased Robustness in Spoken Dialog Systems

The RavenClaw DM framework • Dialog Management framework for complex, task-oriented dialog systems • Separation between Dialog Task and Generic Conversational Skills • Developer focuses only on Dialog Task description • Dialog Engine automatically ensures a minimum set of conversational skills • Dialog Engine automatically ensures the grounding behaviors Increased Robustness in Spoken Dialog Systems

[Arrival] [Profile] [Departure] [Registered] [UserName] RavenClaw architecture • Dialog Task implemented by a hierarchy of agents • Information captured in concepts: • Probability distributions over sets of values • Support for belief assessment & grounding mechanisms Communicator Welcome Login Travel Locals Bye AskRegistered GreetUser GetProfile Leg1 AskName DepartLocation ArriveLocation Increased Robustness in Spoken Dialog Systems

RoomLine Login RoomLine Bye GetQuery ExecuteQuery DiscussResults Dialog Task Grounding Level Grounding Decision Model Optimal action Grounding State Indicators Strategies/Grounding Actions Domain-Independent Grounding Increased Robustness in Spoken Dialog Systems

RavenClaw-based systems • LARRI[Symphony] – Language-based Assistant for Retrieval of Repair Information • IPA[NASA Ames] – Intelligent Procedure Assistant • BusLine[Let’s Go!] – Pittsburgh bus route information • RoomLine – conference room reservation at CMU • TeamTalk[11-754] – spoken command and control for a team of robots Increased Robustness in Spoken Dialog Systems

Previous/Proposed Work Overview 1.Compute grounding state indicators - reliability of beliefs (confidence annotation + updating) - correction detection - goodness-of-dialog metrics - other : user models, etc… 2.Define the grounding actions - error prevention and recovery strategies 3.Create a grounding decision model - decides upon the optimal strategy to employ at a given point Increased Robustness in Spoken Dialog Systems

Proposed Work, in Detail - Outline 1.Compute grounding state indicators - reliability of beliefs (confidence annotation + updating) - correction detection - goodness-of-dialog metrics - other : user models, etc…  2.Define the grounding actions - error prevention and recovery strategies 3.Create a grounding decision model - decides upon the optimal strategy to employ at a given point Increased Robustness in Spoken Dialog Systems

Reliability of beliefs • Continuously assess reliability of beliefs • Two sub-problems: • Computing the initial confidence in a concept • Confidence annotation problem • Update confidence based on events in the dialog • User reaction to implicit or explicit verifications • Domain reasoning Increased Robustness in Spoken Dialog Systems

Confidence annotation • Traditionally focused on ASR [Chase(9), …] • More recently, interest in CA geared towards use in SDS [Walker(4), Segundo(3), Hazen(5), Rudnicky, Bohus et al (2)] • Utterance-level, Concept-level CA • Integrating multiple features • ASR: acoustic & lm scores, lattice, n-best • Parser: various measures of parse goodness • Dialog Management: state, expectations, history, etc • 50% relative improvement in classification error Increased Robustness in Spoken Dialog Systems

Confidence annotation – To Do List • Improve accuracy even more  • More features / Less features / Better features • Study transferability across domains  • Q: Can we identify a set of features that transfer well? • Q: Can we use un- or semi-supervised learning or bootstrap from little data and an annotator in a different domain? Increased Robustness in Spoken Dialog Systems

To my knowledge, not really* studied yet! Confidence updating  Increased Robustness in Spoken Dialog Systems

Confidence updating – approaches • Naïve Bayesian updating  • Assumptions do not match reality • Analytical model  • Set of heuristic / probabilistic rules • Data-driven model  • Define events as features • Learning task: • Initial Conf. + E1 + E2 + E3 … → Current Conf. {1/0} • Bypass confidence updating  • Keep all events as grounding state indicators (doesn’t lose that much information) Increased Robustness in Spoken Dialog Systems

Proposed Work, in Detail - Outline 1.Compute grounding state indicators - reliability of beliefs (confidence annotation + updating) - correction detection - goodness-of-dialog metrics - other : user models, etc…  2.Define the grounding actions - error prevention and recovery strategies 3.Create a grounding decision model - decides upon the optimal strategy to employ at a given point Increased Robustness in Spoken Dialog Systems

Proposed Work, in Detail - Outline 1.Compute grounding state indicators - reliability of beliefs (confidence annotation + updating) - correction detection - goodness-of-dialog metrics - other : user models, etc…   2.Define the grounding actions - error prevention and recovery strategies 3.Create a grounding decision model - decides upon the optimal strategy to employ at a given point Increased Robustness in Spoken Dialog Systems

Correction Detection • Automatically detect at run-time correction sites or aware sites • Another data-driven classification task • Prosodic features, bag-of-words features, lexical markers [Litman(10), Bosch(11), Swerts(12), Lewov(13)] • Useful for: • implementation of implicit / explicit verifications • belief assessment / updating • as direct indicator for grounding decisions Increased Robustness in Spoken Dialog Systems

Correction Detection – To Do List • Build an aware site detector  • Q: Can we identify what is the user correcting? • Study transferability across domains  • Q: Can we identify a set of features that transfer well? • Q: Can we use un- or semi-supervised learning or bootstrap from little data and a detector in a different domain? Increased Robustness in Spoken Dialog Systems

Proposed Work, in Detail - Outline 1.Compute grounding state indicators - reliability of beliefs (confidence annotation + updating) - correction detection - goodness-of-dialog metrics - other : user models, etc…   2.Define the grounding actions - error prevention and recovery strategies 3.Create a grounding decision model - decides upon the optimal strategy to employ at a given point Increased Robustness in Spoken Dialog Systems

Proposed Work, in Detail - Outline 1.Compute grounding state indicators - reliability of beliefs (confidence annotation + updating) - correction detection - goodness-of-dialog metrics - other : user models, etc…    2.Define the grounding actions - error prevention and recovery strategies 3.Create a grounding decision model - decides upon the optimal strategy to employ at a given point Increased Robustness in Spoken Dialog Systems

Goodness-of-dialog indicators  • Assessing how well a conversation is advancing • Non-understandings • Q: Can we identify the cause? • Q: Can we relate a non-understoodutterance to a dialog expectation? • Dialog State related indicators / Stay_Here • Q: Can we expand this to some “distance to optimal dialog trace”? • Overall confidence in beliefs within topic • Q: How to aggregate? Entropy-based measures? • Allow for task-specific metrics of goodness Increased Robustness in Spoken Dialog Systems

Proposed Work, in Detail - Outline 1.Compute grounding state indicators - reliability of beliefs (confidence annotation + updating) - correction detection - goodness-of-dialog metrics - other : user models, etc…    2.Define the grounding actions - error prevention and recovery strategies 3.Create a grounding decision model - decides upon the optimal strategy to employ at a given point Increased Robustness in Spoken Dialog Systems

Proposed Work, in Detail - Outline 1.Compute grounding state indicators - reliability of beliefs (confidence annotation + updating) - correction detection - goodness-of-dialog metrics - other : user models, etc…     2.Define the grounding actions - error prevention and recovery strategies 3.Create a grounding decision model - decides upon the optimal strategy to employ at a given point  Increased Robustness in Spoken Dialog Systems

Grounding Actions • Design and evaluate a rich set of strategies for preventing and recovering from errors (both misunderstandings and non-understandings) • Current status: few strategies used / analyzed • Explicit verification: “Did you say Pittsburgh?” • Implicit verification: “traveling from Pittsburgh… when do you want to leave?” Increased Robustness in Spoken Dialog Systems

Explicit & Implicit Verifications • Analysis of user behavior following these 2 strategies[Krahmer(10), Swerts(11)] • User behavior is rich, correction detectors are important! • Design is important! • Did you say Pittsburgh? • Did you say Pittsburgh? Please respond ‘yes’ or ‘no’. • Do you want to fly from Pittsburgh? • Correct implementation & adequate support is important! • Users discovering errors through implicit confirmations are less likely to get back on track … hmm Increased Robustness in Spoken Dialog Systems

Strategies for misunderstandings • Explicit verification (w/ variants) • Implicit verification (w/ variants) • Disambiguation • “I’m sorry, are you flying out of Pittsburgh or San Francisco?” • Rejection • “I’m not sure I understood what you said. Can you tell me again where are you flying from?” Increased Robustness in Spoken Dialog Systems

Strategies for non-understandings - I • Lexically entrain • “Right now I need you to tell me the departure city… You can say for instance, ‘I’d like to fly from Pittsburgh’.” • Ask repeat • “I’m not sure I understood you. Can you repeat that please?” • Ask reformulate • “Can you please rephrase that?” • Diagnose • If non-understanding source can be known/estimated, give that information to the user • “I can’t hear you very well. Can you please speak closer to the microphone?” Increased Robustness in Spoken Dialog Systems

Strategies for non-understandings - II • Select alternative plan: Domain specific strategies • E.g. try to get state name first, then city name • Establish context (& Confirm context variant) • “Right now I’m trying to gather enough information to make a room reservation. So far I know you want a room on Tuesday. Now I need to know for what time you need the room.” • Give targeted help • Give help on the topic / focus of the conversation / estimated user goal • Constrain language model / recognition Increased Robustness in Spoken Dialog Systems

Strategies for non-understandings - III • Switch input modality (i.e. DTMF, pen, etc) • Restart topic / backup dialog • Start-over • Switch to operator • Terminate session • … Increased Robustness in Spoken Dialog Systems

Grounding Strategies – To Do List • Design, implement, analyze, iterate • Human-Human dialog analysis  • Design the strategies, with variants andappropriate support  • Implement in the RavenClaw framework  • Perform data-driven analysis:  • Q: User behaviors • Q: Applicability conditions • Q: Costs, Success rates Increased Robustness in Spoken Dialog Systems

Increased Robustness in Spoken Dialog Systems

Increased Robustness in Spoken Dialog Systems

Presentation Transcript

belief updating in spoken dialog systems

SDC: The Spoken Dialog Challenge

Spoken dialog

Belief Updating in Spoken Dialog Systems

Spoken Dialog Systems

User Interactions in Spoken Dialog systems

Belief Updating in Spoken Dialog Systems

Research Challenges for Spoken Language Dialog Systems

Spoken Dialog System Architecture

Research Challenges for Spoken Language Dialog Systems

Review of Spoken Language Understanding in Dialog Systems

Improving Robustness in Distributed Systems

Flexible Dialog Management for In-vehicle Dialog Systems

Stochastic Language Generation for Spoken Dialog Systems

Belief Updating in Spoken Dialog Systems

Improving User Interaction with Spoken Dialog Systems via Shaping

Improving Robustness in Distributed Systems

Belief Updating in Spoken Dialog Systems