Planning, Optimizing, and Characterizing

Three approaches to dialogue management Planning, Optimizing, and Characterizing Presented by Lee Becker October 21, 2009

Introduction “The real art of conversation is not only to say the right thing at the right place but to leave unsaid the wrong thing at the tempting moment.” – Dorothy Neville

Sample Dialogue

The Dialogue Management Problem • Giving an appropriate response • Understanding what was said and how it fits into the overall conversational context • Responding with intention • Obeying social norms • Turn-taking • Feedback / Acknowledgment

Dialogue as Planning “Words are also actions, and actions are a kind of words.” – Ralph Waldo Emerson

System View of Dialogue Management Dialogue Manager User Utterance Response

Planning Agents • Maintain state of the world (beliefs) • Predetermined wants (desires) specify how the world should look • Select goals (intentions) • Build/Execute Plan • Belief Monitoring BDI Architecture

Blocks World Example • Init(On(A, Table) ^ On(B, Table) ^ On(C, Table) ^ Block(A) ^ Block(B) ^ Block(C) ^ Clear(A) ^ Clear(B) ^ Clear(C)) • Goal(On(A,B) ^ On(B,C)) • Action( Move(b,x,y)) • Precondition: On(b,x) ^ Clear(b) ^ Clear(y) ^ Block(b) ^ (b != x) ^ (b != y) ^ (x != y) • Effect: On(b,y) ^ Clear(x) ^ ~On(b,x) ^ ~Clear(y) • Action (MoveToTable(b,x)) • Precondition: On(b,x) ^ Clear(b) ^ Block(b) ^ (b !=x) • Effect: On(b,Table) ^ Clear(x) ^ ~On(b,x) A B B C A B C A C

Speech acts and planning • Planning is intuitive for physical actions • How can utterances fit into a plan? • “Can you give me the directions to The Med?” • “Did you take out the trash?” • “I will try my best to be at home for dinner” • “I name this ship the "Queen Elizabeth”” • Speech Acts (Austin, Searle) • Illocutionary Force • Performative Action

Speech Acts Meet AI • Allen, Cohen, and Perrault • Speech Acts Expressed in terms of • Preconditions • Effects • Related to change in agent’s mental states • Plans are sequence of speech acts

Example Speech Acts • REQUEST(speaker, hearer, act): • eﬀect: speaker WANT hearer DO act • INFORM(speaker, hearer, proposition): • eﬀect: KNOW(hearer, proposition)

TRAINS • A descendant of the Allen, Cohen, and Perrault BDI+Speech Acts tradition • Conversational Agent for Logistics and planning • Users converse with a “Manager” to develop a plan of action in the TRAINS domain.

Sample TRAINS Scenario City B OJ Factory Orange Source City I Empty Car Engine E3 Empty car City G Banana Source

Deliberative Agent

Communicative Agent

Discourse Obligations • BDI does not account for what compels one speaker to answer another • Two Strangers Example: • A: Do you know the time? • B: Sure. It’s half past two. • Answering Person B’s questions does not help Person A attain his goals.

Discourse Obligations • Obligations – Like Speech Acts, produce observable effects on the speaker.

Discourse Obligations • Inherent tension between Obligations and Goals • Approaches • Perform all obligated actions • Perform only actions that will lead to desired state • A blend of the other two approaches

TRAINS Discourse Obligations loop if system has obligations then address obligations else if system has performable intentions then perform actions else deliberate on goals end if end loop

TRAINS Discourse Obligations • Ensure system cooperation even if response is in conflict with the user’s goals • Aids in developing mixed-initiative • Goal-driven actions  Speaker Led Initiative • Obligation-driven actions  Other Led Initiative

Mutual Belief and Grounding • Conversational agents do not act in isolation • Mental states should account for: • Private Beliefs • Shared Beliefs • In TRAINS • Shared Belief needed for: • Modeling the domain-plan under-construction • Common understanding

Mutual Belief and Grounding • Extended Conversation Acts

The TRAINS Approach • Attempts to capture the processes underlying dialogue via: • Speech acts • Discourse Obligations • Mutual Belief, Grounding • Potentially Rigid • Rules and logic handcrafted

Dialogue as a Markov Decision Process “When one admits that nothing is certain, one must, I think, also admit that some things are much more nearly certain than others.” – Bertrand Russell

Flexible Dialogue • Qualities of robust dialogue • Flexible conversation flow • Adapted to users’ preferences / skill levels • Resilient to errors in understanding • The dialogue author’s dilemma: • Robustness vs. Effort: • Other Issues: • Noisy Channels: ASR, NLU • Evaluation • What is an optimal decision?

Dialogue with uncertainty • Markov Decision Process (MDP) • Probabilistic Framework • Ability to model planning and decision-making over time • Based on the Markov Assumption: • Future states depend only on the current state • Future states independent of other past states

Markov Decision Processes • Markov chains with choice! ✓ +

Markov Decision Processes • Agent based process defined by 4-tuple: • S: A set of states describing the agent’s world • A: A set of actions the agent may take • T: A set of transition probabilities: • Pa(s,s’) =P(s’|s,a) = P(st+1|st, at) • R: A set of rewards where r(s,a) is the expected, immediate reward the agent receives for taking action a when in state s.

Markov Decision Processes • Policy Function π(s) • Mapping of states to actions • Optimal policy π*(s) yields highest possible cumulative reward • MDP with a fixed policy is a Markov Chain • Rewards • Cumulative Reward

Solving an MDP • Approaches: • Value Iteration, Policy Iteration, Q-Learning • Ideally: • Encode state space with relevant features and rewards • Compute state transition and reward probabilities directly from a corpus of annotated dialogues • In Practice: • Reduce state space and do random exploration • Simulate a user and produce a corpus

Reinforcement Learning for Dialogue Policy Design • NJFun System [Singh, et al 2002]

NJFun Sample Dialogue • S1: Welcome to NJFun. How may I help you? • U1: I’d like to find um winetasting in Lambertville in the morning. (ASR: I’d like to find out wineries the in the Lambertville in the morning) • S2: Did you say you are interested in Lambertville? • U2: Yes. • S3: Did you say you want to go in the morning? • U3: Yes. • S4: I found a winery near Lambertville that is open in the morning. It is the Poor Richard’s Winery in Lambertville. That you for using the system. Please give me feedback by saying ‘good’, ‘so-so’, or ‘bad’. • U4: Good. • S5: Goodbye and have a nice day.

NJFun – State Spaces • Full state space – encode everything • State explosion quickly makes optimization intractable • State-space Estimator • Encode sufficient information to learn good decisions

NJFun – State Space Estimator • Limited policy optimization for two types of decisions • Initiative – Direct vs. Open Ended • System Initiative: “Please say the name of the town or city that you are interested in.” • User Initiative: “Please give me more information.” • Confirmation – Verify or Assume • “Did you say you are interested in <location>?

NJFun State Features & Values

State Space Estimator • Features yield 62 possible dialogue states • 42 Choice States each with 2 actions per state • Confirm/Not confirm • System/User initiative • In total 242 unique dialogue trajectories

Finding an Optimal Policy • Gathering Training Data • New system built with randomized dialogue policy • Deployed to 54 users each assigned 6 tasks • 311 dialogues in total • Reward Measure • Binary task completion • +1: Dialogues that queried for exact set of attributes (activity type, location, time of day, etc…) • -1: Otherwise • Reinforcement Learning

Finding an Optimal Policy • RL Learned Policy: • Initiative • Begin with user initiative • Back off to mixed or system initiative • Confirmation • In general confirm at lower confidence ASR values • Other features describe more complex interventions

Evaluating the Optimal Policy • System with optimal policy tested on additional 21 users • 124 test dialogues • Did not significantly perform baseline on binary completion measure (p =0.059) • Statistically significant improvement using weak completion and ASR measures

Limited Observability • MDPs assume the world is fully observable • However: • Not all errors or states are directly observable • Undetected errors may propagate • Evidence may not indicate error

Limited Observability

Partially Observable Markov Decision Processes (POMDPS) • Intuition • Maintain parallel hypothesis of what was said • Backpedal or switch strategies when a hypothesis becomes sufficiently false

POMDP Example System / User / [ASR] POMDP belief state Traditional Method Initial State Order: { size: <null> } b S M L S: How can I help you? U: A small pepperoni pizza. [A small pepperoni pizza.] Order: { size: small } b S M L S: Ok, what toppings? U: A small pepperoni [A small pepperoni] Order: { size: small } b S M L S: And why type of crust? U: Uh just normal [large] normal Order: { size: large[?] } b S M L

A comparison of Markov Models Table courtesy of http://www.cassandra.org/pomdp/pomdp-faq.shtml

POMDPs • Extends the MDP Model • O: Set of observations agent can receive about the world • Z: Observation Probabilities • b(s): Belief state, probability of being in state s • Not in fixed state, instead maintains a probability distribution over all possible states

POMDPs • Belief Monitoring • Shifting probability mass to match observations • Optimal action depends only on the agent’s current belief state

POMDPs Influence Diagram R: Reward A: Action S: State O: Observation

POMDPs for Spoken Dialogue Systems • SDS-POMDP [Williams and Young 2007] • Claim: POMDPs perform better for SDS because • Maintain parallel dialogue state • Can incorporate ASR confidence scores directly in the belief state update

SDS-POMDP Architecture

SDS-POMDP Components

Planning, Optimizing, and Characterizing

Planning, Optimizing, and Characterizing

Presentation Transcript

Sampling and Characterizing Plaster

Quantifying and characterizing crustal deformation

Characterizing exoplanets’ atmospheres and surfaces

Characterizing Cells

Characterizing Soil Horizons

Characterizing XML

Characterizing characters

Characterizing Learner Texts

Optimizing Power and Energy

Characterizing “Good” Design

Characterizing and Communicating Forecast Uncertainty

Characterizing Soil

Characterizing spikes

An Mixed Integer Approach for Optimizing Production Planning

Characterizing and Detecting Extrasolar Planets

Characterizing Cells

Optimizing and Exporting

Characterizing and Classifying Prokaryotes

Characterizing Risk and Return

Characterizing Polymers

Optimizing Radiation Treatment Planning for Tumors Using IMRT

Characterizing Photospheric Flows