Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems

Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee Alex Rudnicky (Chair) Roni Rosenfeld Jeff Schneider Eric Horvitz (Microsoft Research)

Problem • Spans most domains and interaction types • Has a significant impact on performance Lack of robustness when faced with understanding errors

An example S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………

Some Statistics … • Corrections [Krahmer, Swerts, Litman, Levow] • 30% of utterances correct system mistakes • 2-3 times more likely to be misrecognized • Semantic error rates: ~25-35%

Significant Impact on Interaction • CMU Communicator 40% Failed 26% 33% sessions Contain understanding errors • Multi-site Communicator Corpus [Shin et al] Failed 37% 63% sessions

Outline • Problem • Approach • Infrastructure • Research Program • Summary & Timeline problem : approach : infrastructure : indicators : strategies : decision process : summary

Increasing Robustness … • Increase the accuracy of speech recognition • Assume recognition is unreliable, and create the mechanisms for acting robustly at the dialogue management level problem : approach : infrastructure : indicators : strategies : decision process : summary

Snapshot of Existing Work: Slide 1 • Theoretical models of grounding • Contribution Model [Clark], Grounding Acts [Traum] • Practice: heuristic rules • Misunderstandings • Threshold(s) on confidence scores • Non-understandings Analytical/Descriptive, not decision oriented Ad-hoc, lack generality, not easy to extend problem : approach : infrastructure : indicators : strategies : decision process : summary

Snapshot of Existing Work: Slide 2 • Conversation as Action under Uncertainty [Paek and Horvitz] • Belief networks to model uncertainties • Decisions based on expected utility, VOI-analysis • Reinforcement learning for dialogue control policies[Singh, Kearns, Litman, Walker, Levin, Pieraccini, Young, Scheffler, etc] • Formulate dialogue control as an MDP • Learn optimal control policy from data Do not scale up to complex, real-world tasks problem : approach : infrastructure : indicators : strategies : decision process : summary

Thesis Statement • Decision making under uncertainty Develop a task-independent, adaptive and scalable framework for error recovery in task-oriented spoken dialogue systems Develop a task-independent, adaptive and scalable framework for error recovery in task-oriented spoken dialogue systems Approach: problem : approach : infrastructure : indicators : strategies : decision process : summary

Three components 0. Infrastructure 1. Error awareness 2. Error recovery strategies 3. Error handling decision process Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing Develop and investigate an extended set of conversational error handling strategies Develop a scalable reinforcement-learning based architecture for making error handling decisions problem : approach : infrastructure : indicators : strategies : decision process : summary problem : approach : infrastructure : indicators : strategies : decision process : summary

Infrastructure • RavenClaw • Modern dialog management framework for complex, task-oriented domains • RavenClaw spoken dialogue systems • Test-bed for evaluation Completed Completed problem : approach : infrastructure : indicators : strategies : decision process : summary

RoomLine user_name results query registered Login GetQuery GetResults DiscussResults Welcome GreetUser DateTime Location Properties AskRegistered AskName Network Projector Whiteboard registered: [No]-> false, [Yes] -> true registered: [No]-> false, [Yes] -> true user_name: [UserName] ExplicitConfirm Error Handling Decision Process registered: [No]-> false, [Yes] -> true user_name: [UserName] query.date_time: [DateTime] query.location: [Location] query.network: [Network] AskRegistered ErrorIndicators Login RoomLine Strategies Dialogue Stack Expectation Agenda RavenClaw Dialogue Task (Specification) Domain-Independent Dialogue Engine problem : approach : infrastructure : indicators : strategies : decision process : summary

RavenClaw-based Systems problem : approach : infrastructure : indicators : strategies : decision process : summary

Research Plan 0. Infrastructure 1. Error awareness 2. Error recovery strategies 3. Error handling decision process Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing Develop and investigate an extended set of conversational error handling strategies Develop a scalable reinforcement-learning based architecture for making error handling decisions problem : approach : infrastructure : indicators : strategies : decision process : summary

Existing Work • Confidence Annotation • Traditionally focused on speech recognizer[Bansal, Chase, Cox, and others] • Recently, multiple sources of knowledge[San-Segundo, Walker, Bosch, Bohus, and others] • Recognition, parsing, dialogue management • Detect misunderstandings: ~ 80-90% accuracy • Correction and Aware Site Detection[Swerts, Litman, Levow and others] • Multiple sources of knowledge • Detect corrections: ~ 80-90% accuracy problem : approach : infrastructure : indicators : strategies : decision process : summary

Proposed: Belief Updating • Continuously assess beliefs in light of initial confidence and subsequent events • An example: S: Where are you flying from? U: [CityName={Aspen/0.6; Austin/0.2}] S: Did you say you wanted to fly out of Aspen? U: [No/0.6] [CityName={Boston/0.8}] initial belief + system action + user response updated belief [CityName={Aspen/?; Austin/?; Boston/?}] problem : approach : infrastructure : indicators : strategies : decision process : summary

Belief Updating: Approach • Model the update in a dynamic belief network User concept User concept System action t t + 1 User response 1st Hyp 2nd Hyp 3rd Hyp contents Confidence confidence Yes No Utterance Length Positive Markers Negative Markers correction problem : approach : infrastructure : indicators : strategies : decision process : summary

Research Plan 0. Infrastructure 1. Error awareness 2. Error recovery strategies 3. Error handling decision process Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing Develop and investigate an extended set of conversational error handling strategies Develop a scalable reinforcement-learning based architecture for making error handling decisions problem : approach : infrastructure : indicators : strategies : decision process : summary

Is the Dialogue Advancing Normally? Locally, turn-level: • Non-understanding indicators • Non-understanding flag directly available • Develop additional indicators • Recognition, Understanding, Interpretation Globally, discourse-level: • Dialogue-on-track indicators • Counts, averages of non-understanding indicators • Rate of dialogue advance problem : approach : infrastructure : indicators : strategies : decision process : summary

Research Plan 0. Infrastructure 1. Error awareness 2. Error recovery strategies 3. Error handling decision process Develop indicators that … Assess reliability of information  Assess how well the dialogue is advancing Develop and investigate an extended set of conversational error handling strategies Develop a scalable reinforcement-learning based architecture for making error handling decisions problem : approach : infrastructure : indicators : strategies : decision process : summary

Error Recovery Strategies • Identify • Identify and define an extended set of error handling strategies • Implement • Construct task-decoupled implementations of a large number of strategies • Evaluate • Evaluate performance and bring further refinements problem : approach : infrastructure : indicators : strategies : decision process : summary

User Initiated System Initiated Help Ensure that the system has reliable information (misunderstandings) Ensure that the dialogue on track Where are we? Start over Scratch concept value Go back Channel establishment Explicit confirmation Global problems (compounded, discourse-level problems) Local problems (non-understandings) Suspend/Resume Implicit confirmation Repeat Disambiguation Switch input modality Summarize Ask repeat concept SNR repair Quit Reject concept Restart subtask plan Ask repeat turn Select alternative plan Ask rephrase turn Start over Notify non-understanding Terminate session / Direct to operator Explicit confirm turn Targeted help WH-reformulation Keep-a-word reformulation Generic help You can say List of Error Recovery Strategies problem : approach : infrastructure : indicators : strategies : decision process : summary

Error Recovery Strategies: Evaluation • Reusability • Deploy in different spoken dialogue systems • Efficiency of non-understanding strategies • Simple metric: Is the next utterance understood? • Efficiency depends on decision process • Construct upper and lower bounds for efficiency • Lower bound: decision process which chooses uniformly • Upper bound: human performs decision process (WOZ) problem : approach : infrastructure : indicators : strategies : decision process : summary

Research Plan 0. Infrastructure 1. Error awareness 2. Error recovery strategies 3. Error handling decision process Develop indicators that … Assess reliability of information  Assess how well the dialogue is advancing  Develop and investigate an extended set of conversational error handling strategies Develop a scalable reinforcement-learning based architecture for making error handling decisions problem : approach : infrastructure : indicators : strategies : decision process : summary

Previous Reinforcement Learning Work • Dialogue control ~ Markov Decision Process • States • Actions • Rewards • Previous work: successes in small domains • NJFun [Singh, Kearns, Litman, Walker et al] • Problems • Approach does not scale • Once learned, policies are not reusable R S2 A S3 S1 problem : approach : infrastructure : indicators : strategies : decision process : summary

Proposed Approach Overcome previous shortcomings: • Focus learning only on error handling • Reduces the size of the learning problem • Favors reusability of learned policies • Lessens the system development effort • Use a “divide-and-conquer” approach • Leverage independences in dialogue problem : approach : infrastructure : indicators : strategies : decision process : summary

No Action Topic-MDP Explicit Confirmation Topic-MDP No Action user_name registered Topic-MDP No Action Concept-MDP Concept-MDP Explicit Confirm No Action Decision Process Architecture • Small-size models • Parameters can be tied across models • Accommodate dynamic task generation RoomLine Login Welcome GreetUser Gating Mechanism AskRegistered AskName • Favors reusability of policies • Initial policies can be easily handcrafted • Independence assumption problem : approach : infrastructure : indicators : strategies : decision process : summary

Reward Structure & Learning • Rewards based on any dialogue performance metric • Atypical, multi-agent reinforcement learning setting Global, post-gate rewards Local rewards Reward Action Action Gating Mechanism Gating Mechanism Reward Reward Reward MDP MDP MDP MDP MDP MDP • Multiple, standard RL problems • Model-based approaches problem : approach : infrastructure : indicators : strategies : decision process : summary

Evaluation • Performance • Compare learned policies with initial heuristic policies • Metrics • Task completion • Efficiency • Number and lengths of error segments • User satisfaction • Scalability • Deploy in a system operating with a sizable task • Theoretical analysis problem : approach : infrastructure : indicators : strategies : decision process : summary

Outline • Problem • Approach • Infrastructure • Research Program • Summary & Timeline problem : approach : infrastructure : indicators : strategies : decision process : summary

Summary of Anticipated Contributions • Goal:develop a task-independent, adaptive and scalable framework for error recovery in task-oriented spoken dialogue systems • Modern dialogue management framework • Belief updating framework • Investigation of an extended set of error handling strategies • Scalable data-driven approach for learning error handling policies problem : approach : infrastructure : indicators : strategies : decision process : summary

Timeline indicators strategies decisions data now proposal Misunderstanding andnon-understandingstrategies Investigatetheoreticalaspects ofproposedreinforcementlearningmodel February 2004 end ofyear 4 Evaluatenon-understandingstrategies; developremaining strategies Data collection forbelief updating andWOZ study milestone 1 September 2004 Develop andevaluate thebelief updatingmodels Implementdialogue-on-trackindicators milestone 2 January 2005 Data collection forRL training Error handling decision process: reinforcement learning experiments Data collection forRL evaluation end ofyear 5 milestone 3 Contingency data collection efforts Additional experiments: extensions or contingency work September 2005 defense December 2005 5.5 years problem : approach : infrastructure : indicators : strategies : decision process : summary

Thank You! Questions & Comments

Additional Slides

Errors in spoken dialogue systems System acquires correct information OK System acquires information System acquires incorrect information Understanding Process Recognition Parsing Contextual Interpretation Misunderstanding System does not acquire information Non-understanding Non-understanding indicators/ Turn-level strategies Belief Updating/ Concept-level strategies

Structure of Individual MDPs • Concept MDPs • State-space: belief indicators • Action-space: concept scoped system actions ExplConf ExplConf ExplConf ImplConf ImplConf ImplConf LC MC HC NoAct NoAct NoAct NoAct 0 • Topic MDPs • State-space: non-understanding, dialogue-on-track indicators • Action-space: non-understanding actions, topic-level actions

Gating Mechanism • Heuristic derived from domain-independent dialogue principles • Give priority to entities closer to the conversational focus • Give priority to topics over concept

RoomLine user_name results query registered Login GetQuery GetResults DiscussResults Welcome GreetUser DateTime Location Properties AskRegistered AskName Network Projector Whiteboard registered: [No]-> false, [Yes] -> true registered: [No]-> false, [Yes] -> true user_name: [UserName] ExplicitConfirm Error Handling Decision Process registered: [No]-> false, [Yes] -> true user_name: [UserName] query.date_time: [DateTime] query.location: [Location] query.network: [Network] AskRegistered ErrorIndicators Login RoomLine Strategies Dialogue Stack Expectation Agenda Task-independence / Reusability • Argument: architecure Dialogue Task (Specification) Domain-Independent Dialogue Engine • Proof: deployment across multiple RavenClaw systems problem : approach : infrastructure : indicators : strategies : decision process : summary

No Action Topic-MDP Explicit Confirmation Topic-MDP No Action user_name registered Topic-MDP No Action Concept-MDP Concept-MDP Explicit Confirm No Action Adaptable • Argument: reinforcement learning approach RoomLine Login Welcome GreetUser Gating Mechanism AskRegistered AskName • Proof: longer term evaluation of adaptability (extension work item) problem : approach : infrastructure : indicators : strategies : decision process : summary

No Action Topic-MDP Explicit Confirmation Topic-MDP No Action user_name registered Topic-MDP No Action Concept-MDP Concept-MDP Explicit Confirm No Action Scalable • Argument: architecture RoomLine Login Welcome GreetUser Gating Mechanism AskRegistered AskName • Proof: deployment and experiments with systems with large tasks problem : approach : infrastructure : indicators : strategies : decision process : summary

Scalability of Reinforcement Learning • NJFun • 3 concepts, 7 state variables, 62 states • Learned a policy from 311 dialogues • Consider • 12 concepts (RoomLine/20, CMU Let’s Go!/27) • 242 states • State-space: grows 4 times • # Parameters: grows 16 times

Extension Work Items • Portability of confidence annotation and belief updating schemes • Use domain-independent features • Train in one domain, test in another • Use of unlabeled data • Self-training and co-training to improve performance in a new domain • Evaluation of adaptability/reusability • Reusability: migrate policies from one domain to another • Adaptability: monitor model changes and system behavior throughout a period of extended use

Study by [Shin et al] • Labeled error segments in 141 dialogs from multiple Communicator systems • 1.66 error segments / session • 22% of the error segments never get back on track • 37% of sessions have breakdowns in interaction

No Action Explicit Confirmation No Action user_name registered No Action Explicit Confirm No Action Errors Indicators, Strategies & Decision Process • Error indicators • Strategies • Decision Process Topic-MDP RoomLine Login Topic-MDP Welcome GreetUser Gating Mechanism Topic-MDP AskRegistered AskName Concept-MDP Concept-MDP

An example S: Where do you want to go to? U: I wanna go to Grand Rapids Michigan on December first[I WANNA GO TO GRAND RAPIDS MICHIGAN FOLLOWINGDECEMBER BOSTON] S: traveling to Boston… departing Pittsburgh on what date? U: traveling to Grand Rapids Michigan [TRAVELING TO GRAND RAPIDS MICHIGAN] S: traveling to Grand Rapids… on what date would you like to travel? U: December first [DECEMBER FOR SHIT] S: sorry, I’m not sure I understood what you said. What day did you need to travel? U: Friday December first [FRIDAY DECEMBER FOURTH] S: a flight on Monday, December fourth… and what time would you like to leave Pittsburgh Monday, December fourth U: traveling on Friday December first [COMING FRIDAY DECEMBER] S: the date you gave was ambiguous. Did you mean November twenty-third or November twenty fourth? U: traveling on Friday December first [TRAVELING ON FRIDAY DECEMBER] S: a flight on Friday, November twenty-fourth… and departing Pittsburgh at what time? U: traveling the following week on Friday [I’M TRAVELING FOLLOWING WEEK ON FRIDAY]

Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems

Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems

Presentation Transcript

Spoken Dialogue Systems

Spoken Dialogue Systems

Spoken Dialogue Systems: System Overview

Spoken Dialogue Systems

Spoken Dialogue Systems: Human and Machine

Turn-Taking in Spoken Dialogue Systems

Error Detection and Correction in Spoken Dialogue Systems

Evaluating Spoken Dialogue Systems

Intonational Variation in Spoken Dialogue Systems

Spoken Dialogue Systems

Spoken Dialogue Systems

Learning, Adaptation and Personalization in Spoken Dialogue Systems

Spoken Dialogue Systems

Misrecognitions and Corrections in Spoken Dialogue Systems

Spoken Dialogue Systems

Spoken Dialogue Systems

Spoken Dialogue Systems: Human and Machine

Spoken Dialogue Systems