Decision Making Under Uncertainty Lec #4: Planning and Sensing

Decision Making Under UncertaintyLec #4: Planning and Sensing UIUC CS 598: Section EA Professor: Eyal AmirSpring Semester 2005 Uses slides by José Luis Ambite, Son Tran, Chitta Baral and… Paolo Traverso’s (http://sra.itc.it/people/traverso/) tutorial: http://prometeo.ing.unibs.it/sschool/slides/traverso/traverso-slides.ps.gz, Some slides from http://www-2.cs.cmu.edu/~mmv/planning/handouts/BDDplanning.pdfby Rune Jensen http://www.itu.dk/people/rmj

Last Time: Planning by Regression • OneStepPlan(S) in the regression algorithm is the backward image of the set of states S. • Can computed as the QBF formula: xt+1 (Statest+1(xt+1)  R(xt, a, xt+1)) • Quantified Boolean Formula (QBF): x (x y) = (0 y)  (1 y) x (x y) = (0 y)  (1 y)

Last Time • Planning with no observations: • Can be done using belief states (sets of states) • Belief states can be encoded as OBDDs • Complexity? – later today • Other approaches: • Use model-checking approaches • Approximate belief state, e.g., (Petrick & Bacchus ’02, ‘04)

The Model Checking Problem Determine whether a formula is true in a model • A domain of interest is described by a semantic model • A desired property of the domain is described by a logical formula • Check if the domain satisfy the desired property by checking whether the formula is true in the model Motivation: Formal verification of dynamic systems

Now: Sensing Actions • Current solutions for Nondeterministic Planning: • Conditional planning: condition on observations that you make now • Condition on belief state

Medication Example (Deterministic) • Problem • A patient is infected. He can take medicine and get cured if he were hydrated; otherwise, the patient will be dead. To become hydrated, the patient can drink. The check action allows us to determine if the patient is hydrated or not. • Goal: not infected and not dead. • Classical planners cannot solve such kind of problems because • it contains incomplete information: we don’t know whether he is initially hydrated or not. • it has a sensing action: in order to determine whether he is hydrated, the check action is required.

Planning with sensing actions and incomplete information • How to reason about the knowledge of agents? • What is a plan? • Conditional plans: contain sensing actions and conditionals such as “if-then-else” structure • In contrast - Conformant plans: a sequence of actions which leads to the goal regardless of the value of the unknown fluents in the initial state

Plan tree examples nil a a a a f f b b b1 b2 g g h h f f c d c1 c2 d1 d2 [] [a] [a;b] [a;b;if(f,c,d)] a;if(f,[b1;if(g,c1,c2)]; [b2;if(h,d1,d2)])

Plan trees (cont)Example (1,1) Path chk chk (1,1) hyd hyd hyd hyd med dr med dr (2,2) (2,1) (2,1) (2,2) med Time med (3,2) (3,2)

Why plan trees? Path • Think of each node as a state that the agent might be in during the plan execution. • The root is the initial state. • Every leaf can be the final state. • The goal is satisfied if it holds in every final states, i.e., “leaves” of the tree (1,1) (2,1) (2,2) Time (3,2)

Limitations of Approach • Can condition only on current sensing • No accumulation of knowledge • Forward-search approach – can we do better? • Our regression algorithm from last time: • Regress, and allow merging of sets/actions A,B when there is a sensing action that can distinguish the members of A,B

Sensing Actions • Current solutions for Nondeterministic Planning: • Conditional planning: condition on observations that you make now • Condition on belief state

Conditioning on Belief State • Planning Domain D=<S,A,O,I,T,X> • S set of states • A set of actions • O set of observations • I  S initial belief state • T  SAS transition relation (trans. model) • X  SO observation relation (obs. model) Due to (Bertoli & Pistore; ICAPS 2004)

Conditioning on Belief State • Plan P=<C,c0,act,evolve> for planning domain D – what we need to find • C set of belief states • belief states = contexts in (Bertoli & Pistore ‘04) • c0C initial belief state • act: CxO  A action function • evolve: CxO  C belief-state evolution func. • Very similar to belief-state MDPs • Represents an infinite set of executions

Conditioning on Belief State • Configuration (s,o,c,a) for planning domain D – a state of the executor • sS world state • oX(s) observation made in state s • cC belief state that the executor holds • a = act(c,o) the action to be taken with this belief state and observation • How do we evolve a configuration?

Example A planning problem P for a planning Domain Planning Domain D=<S,A,O,I,T,X>: • I  S is the set of initial states • G  S is the set of goal states G I

Example: Patient + Wait between Check and Medication (1,1) Path chk chk (1,1) hyd hyd hyd hyd med dr med dr (2,2) (2,1) (2,1) (2,2) med Time med (3,2) (3,2)

Left-Over Issues • Limitation • Languages for specifying nondeterministic effects, sensing (similar to STRIPS?) • Your Presentation • Complexity • Probabilistic domains – next class

Homework • Read readings for next time: [Michael Littman; Brown U Thesis 1996] chapter 2 (Markov Decision Processes)

Decision Making Under Uncertainty Lec #4: Planning and Sensing

Decision Making Under Uncertainty Lec #4: Planning and Sensing

Presentation Transcript

Decision making under uncertainty

Decision-Making Under Uncertainty

Decision Making Under Uncertainty

Decision Making Under Uncertainty Lec #8: Reinforcement Learning

Decision Making Under Uncertainty

Decision Making Under Uncertainty

Decision Making Under Uncertainty