730 likes | 865 Views
Three approaches to dialogue management. Planning, Optimizing, and Characterizing. Presented by Lee Becker October 21, 2009. Introduction. “ The real art of conversation is not only to say the right thing at the right place but to leave unsaid the wrong thing at the tempting moment.”
E N D
Three approaches to dialogue management Planning, Optimizing, and Characterizing Presented by Lee Becker October 21, 2009
Introduction “The real art of conversation is not only to say the right thing at the right place but to leave unsaid the wrong thing at the tempting moment.” – Dorothy Neville
The Dialogue Management Problem • Giving an appropriate response • Understanding what was said and how it fits into the overall conversational context • Responding with intention • Obeying social norms • Turn-taking • Feedback / Acknowledgment
Dialogue as Planning “Words are also actions, and actions are a kind of words.” – Ralph Waldo Emerson
System View of Dialogue Management Dialogue Manager User Utterance Response
Planning Agents • Maintain state of the world (beliefs) • Predetermined wants (desires) specify how the world should look • Select goals (intentions) • Build/Execute Plan • Belief Monitoring BDI Architecture
Blocks World Example • Init(On(A, Table) ^ On(B, Table) ^ On(C, Table) ^ Block(A) ^ Block(B) ^ Block(C) ^ Clear(A) ^ Clear(B) ^ Clear(C)) • Goal(On(A,B) ^ On(B,C)) • Action( Move(b,x,y)) • Precondition: On(b,x) ^ Clear(b) ^ Clear(y) ^ Block(b) ^ (b != x) ^ (b != y) ^ (x != y) • Effect: On(b,y) ^ Clear(x) ^ ~On(b,x) ^ ~Clear(y) • Action (MoveToTable(b,x)) • Precondition: On(b,x) ^ Clear(b) ^ Block(b) ^ (b !=x) • Effect: On(b,Table) ^ Clear(x) ^ ~On(b,x) A B B C A B C A C
Speech acts and planning • Planning is intuitive for physical actions • How can utterances fit into a plan? • “Can you give me the directions to The Med?” • “Did you take out the trash?” • “I will try my best to be at home for dinner” • “I name this ship the "Queen Elizabeth”” • Speech Acts (Austin, Searle) • Illocutionary Force • Performative Action
Speech Acts Meet AI • Allen, Cohen, and Perrault • Speech Acts Expressed in terms of • Preconditions • Effects • Related to change in agent’s mental states • Plans are sequence of speech acts
Example Speech Acts • REQUEST(speaker, hearer, act): • effect: speaker WANT hearer DO act • INFORM(speaker, hearer, proposition): • effect: KNOW(hearer, proposition)
TRAINS • A descendant of the Allen, Cohen, and Perrault BDI+Speech Acts tradition • Conversational Agent for Logistics and planning • Users converse with a “Manager” to develop a plan of action in the TRAINS domain.
Sample TRAINS Scenario City B OJ Factory Orange Source City I Empty Car Engine E3 Empty car City G Banana Source
Discourse Obligations • BDI does not account for what compels one speaker to answer another • Two Strangers Example: • A: Do you know the time? • B: Sure. It’s half past two. • Answering Person B’s questions does not help Person A attain his goals.
Discourse Obligations • Obligations – Like Speech Acts, produce observable effects on the speaker.
Discourse Obligations • Inherent tension between Obligations and Goals • Approaches • Perform all obligated actions • Perform only actions that will lead to desired state • A blend of the other two approaches
TRAINS Discourse Obligations loop if system has obligations then address obligations else if system has performable intentions then perform actions else deliberate on goals end if end loop
TRAINS Discourse Obligations • Ensure system cooperation even if response is in conflict with the user’s goals • Aids in developing mixed-initiative • Goal-driven actions Speaker Led Initiative • Obligation-driven actions Other Led Initiative
Mutual Belief and Grounding • Conversational agents do not act in isolation • Mental states should account for: • Private Beliefs • Shared Beliefs • In TRAINS • Shared Belief needed for: • Modeling the domain-plan under-construction • Common understanding
Mutual Belief and Grounding • Extended Conversation Acts
The TRAINS Approach • Attempts to capture the processes underlying dialogue via: • Speech acts • Discourse Obligations • Mutual Belief, Grounding • Potentially Rigid • Rules and logic handcrafted
Dialogue as a Markov Decision Process “When one admits that nothing is certain, one must, I think, also admit that some things are much more nearly certain than others.” – Bertrand Russell
Flexible Dialogue • Qualities of robust dialogue • Flexible conversation flow • Adapted to users’ preferences / skill levels • Resilient to errors in understanding • The dialogue author’s dilemma: • Robustness vs. Effort: • Other Issues: • Noisy Channels: ASR, NLU • Evaluation • What is an optimal decision?
Dialogue with uncertainty • Markov Decision Process (MDP) • Probabilistic Framework • Ability to model planning and decision-making over time • Based on the Markov Assumption: • Future states depend only on the current state • Future states independent of other past states
Markov Decision Processes • Markov chains with choice! ✓ +
Markov Decision Processes • Agent based process defined by 4-tuple: • S: A set of states describing the agent’s world • A: A set of actions the agent may take • T: A set of transition probabilities: • Pa(s,s’) =P(s’|s,a) = P(st+1|st, at) • R: A set of rewards where r(s,a) is the expected, immediate reward the agent receives for taking action a when in state s.
Markov Decision Processes • Policy Function π(s) • Mapping of states to actions • Optimal policy π*(s) yields highest possible cumulative reward • MDP with a fixed policy is a Markov Chain • Rewards • Cumulative Reward
Solving an MDP • Approaches: • Value Iteration, Policy Iteration, Q-Learning • Ideally: • Encode state space with relevant features and rewards • Compute state transition and reward probabilities directly from a corpus of annotated dialogues • In Practice: • Reduce state space and do random exploration • Simulate a user and produce a corpus
Reinforcement Learning for Dialogue Policy Design • NJFun System [Singh, et al 2002]
NJFun Sample Dialogue • S1: Welcome to NJFun. How may I help you? • U1: I’d like to find um winetasting in Lambertville in the morning. (ASR: I’d like to find out wineries the in the Lambertville in the morning) • S2: Did you say you are interested in Lambertville? • U2: Yes. • S3: Did you say you want to go in the morning? • U3: Yes. • S4: I found a winery near Lambertville that is open in the morning. It is the Poor Richard’s Winery in Lambertville. That you for using the system. Please give me feedback by saying ‘good’, ‘so-so’, or ‘bad’. • U4: Good. • S5: Goodbye and have a nice day.
NJFun – State Spaces • Full state space – encode everything • State explosion quickly makes optimization intractable • State-space Estimator • Encode sufficient information to learn good decisions
NJFun – State Space Estimator • Limited policy optimization for two types of decisions • Initiative – Direct vs. Open Ended • System Initiative: “Please say the name of the town or city that you are interested in.” • User Initiative: “Please give me more information.” • Confirmation – Verify or Assume • “Did you say you are interested in <location>?
State Space Estimator • Features yield 62 possible dialogue states • 42 Choice States each with 2 actions per state • Confirm/Not confirm • System/User initiative • In total 242 unique dialogue trajectories
Finding an Optimal Policy • Gathering Training Data • New system built with randomized dialogue policy • Deployed to 54 users each assigned 6 tasks • 311 dialogues in total • Reward Measure • Binary task completion • +1: Dialogues that queried for exact set of attributes (activity type, location, time of day, etc…) • -1: Otherwise • Reinforcement Learning
Finding an Optimal Policy • RL Learned Policy: • Initiative • Begin with user initiative • Back off to mixed or system initiative • Confirmation • In general confirm at lower confidence ASR values • Other features describe more complex interventions
Evaluating the Optimal Policy • System with optimal policy tested on additional 21 users • 124 test dialogues • Did not significantly perform baseline on binary completion measure (p =0.059) • Statistically significant improvement using weak completion and ASR measures
Limited Observability • MDPs assume the world is fully observable • However: • Not all errors or states are directly observable • Undetected errors may propagate • Evidence may not indicate error
Partially Observable Markov Decision Processes (POMDPS) • Intuition • Maintain parallel hypothesis of what was said • Backpedal or switch strategies when a hypothesis becomes sufficiently false
POMDP Example System / User / [ASR] POMDP belief state Traditional Method Initial State Order: { size: <null> } b S M L S: How can I help you? U: A small pepperoni pizza. [A small pepperoni pizza.] Order: { size: small } b S M L S: Ok, what toppings? U: A small pepperoni [A small pepperoni] Order: { size: small } b S M L S: And why type of crust? U: Uh just normal [large] normal Order: { size: large[?] } b S M L
A comparison of Markov Models Table courtesy of http://www.cassandra.org/pomdp/pomdp-faq.shtml
POMDPs • Extends the MDP Model • O: Set of observations agent can receive about the world • Z: Observation Probabilities • b(s): Belief state, probability of being in state s • Not in fixed state, instead maintains a probability distribution over all possible states
POMDPs • Belief Monitoring • Shifting probability mass to match observations • Optimal action depends only on the agent’s current belief state
POMDPs Influence Diagram R: Reward A: Action S: State O: Observation
POMDPs for Spoken Dialogue Systems • SDS-POMDP [Williams and Young 2007] • Claim: POMDPs perform better for SDS because • Maintain parallel dialogue state • Can incorporate ASR confidence scores directly in the belief state update