Refinement Planing

Refinement Planing CSE 574 April 15, 2003 Dan Weld

Planning Applications • RAX/PS (The NASA Deep Space planning agent) • HSTS (Hubble Space Telescope scheduler) • Similar work in planning earth obs: satellite / plane • Spacecraft Repair / Workflow • Shuttle refurbishment • Optimum AIV system • Elevator control • Koehler's Miconic domain – fielded in skyscrapers. • An airport-ground-traffic-control • Company of Wolfgang Hatzack

Planning Applications 2 • Diagnose, reconfigure power distribution systems • Sylvie Thiebaux / EDF • Data Transformation • VICAR (JPL image enhancing system); • CELWARE (CELCorp) • Online games & training • “Intelligent” characters • Classics (fielded?) • Robotics • NLP to database interfaces

Planning Applications 3 • Control of an Australian Brewery (SIPE) • Desert Storm Logistics Planning • “DART saved more $$ during this campaign than the whole DARPS budget for the past 25 years”

More Administrivia • No class Fri 4/18 • But: • Read p1-30 Boutilier, Dean & Hanks • No review necessary • Experiment with 1 Planner • Write a review of the planner • Plan Project • What • Who: 1-3 person groups

Project 1: Goal Selection • Input • Init state • Action schemata • Goals • Output • Plan • assoc resource cost = f(act) • each with assoc utility = f(g) • Resource bound • maximizing utility subject to resource bound

Project 2: Embedded Agent • Implement Simulator • Takes init state + action schemata as input • Communicates with agent • Integrate MDP Agent • SPUDD, GPT, or ?? • Extensions • Augment to take user-specified goals • Incremental policy changes • One-shot vs recurring reward • Real time issues

Project 3: Incomplete Info & Time • Extend SAPA or other temporal planner • Sensory effects • Handle information gathering goals @time • Interleaved execution or conditional plans • (Likely adopt ideas from Petrick & Bacchus)

1970s-1995 1995 1997 2000 - Domination of heuristic state search approach: HSP/R [Bonet & Geffner] UNPOP [McDermott]: POP is dead! Importance of good Domain-independent heuristics Hoffman’s FF – a state search planner won the AIPS-00 competition! … but NASA’s highly publicized RAX still a POP dinosaur! POP believed to be good framework to handle temporal and resource planning [Smith et al, 2000] UCPOP, Zeno [Penberthy &Weld] IxTeT [Ghallab et al] The whole world believed in POP and was happy to stack 6 blocks! Advent of CSP style compilation approach: Graphplan [Blum & Furst] SATPLAN [Kautz & Selman] Use of reachability analysis and Disjunctive constraints RePOP UCPOP UNPOP A recent (turbulent) history of planning

Then it was cruelly UnPOPped In the beginning it was all POP. The good times return withRe(vived)POP

Too many brands of classical planners Planning as Theorem Proving (Green’s planner) Planning as Search Search in the space of States (progression, regression, MEA) (STRIPS, PRODIGY, TOPI, HSP, HSP-R, UNPOP, FF) Planning as Model Checking Search in the space of Plans (total order, partial order, protections, MTC) (Interplan,SNLP,TOCL, UCPOP,TWEAK) Search in the space of Task networks (reduction of non-primitive tasks) (NOAH, NONLIN, O-Plan, SIPE) Planning as CSP/ILP/SAT/BDD (Graphplan, IPP, STAN, SATPLAN, BLackBOX,GP-CSP,BDDPlan)

A Unifying View Candidate set semantics What are Plans? Refinements? 1.0 Refinement Planning FSS, BSS, PS SEARCH How are sets of plans represented compactly? How are they refined? How are they searched? Conjunctive Refinement Planning Disjunctive Refinement Planning 1.1 1.2 PART I Graph-based CSP SAT ILP BDD PART 2 HTN Schemas TL Formulas Cutting Planes Directed Partial Consistency enforcement Case-based Abstraction-based Failure-based Domain Analysis* Reachability Relevance Relax Subgoal interactions CONTROL Hand-coded Learned Domain-customization PART 3 Heuristics/Optimizations

Main Points • Framework • Types of refinements • Presatisfaction, preordering, tractability • Refinement vs. solution extraction • Splitting as a way to decrease soln extraction time … at a cost • Use of disjunctive representations

Tradeoffs among Basic Strategies State Space Plan Space Progression/regression must commit to both position and relevance of actions (Regression can judge relevance— sort of-- but handles sets of states) + Give state information (Easier plan validation) - Leads to premature commitment >but better heuristic guidance - Too many states when actions have durations Plan-space refinement (PSR) avoids constraining position + Reduces commitment (large candidate set /branch) >But harder to get heuristic estimate - Increases plan-validation costs + Easily extendible to actions with duration

(Dis)advantages of partial order planning Action Position, Relevance Branching Factor Depth of Search Tree Maintenance Goals Durative Actions The Commitment angle Progression/regression planners commit to both Position and relevance. PS planners only commit To relevance. --Unnecessary commitments increase the chance of backtracking >>But also make it easier to validate/evalute the partial plan The Heuristic Angle Estimating the distance of a partial plan from a Flaw-less solution plan is conceptually harder Than estimating the distance of a set of states from The init state which in turn is harder than estimating The cost of a single state from the goal state

Weaknesses • Numerous terms, far from first use • Is this interesting? • While (I still have candidate plans) If I have a solution plan, return it Else, improve the existing plans EndWhile • Interleaving different strategies • Dynamically determine which strategy to use • Exploiting learning, symmetry in planning

Future Work • Filling in holes • Can unsupervised learning can be used? • For supervised learning, • Can sufficient training samples be obtained? • Can one extend refinement strategies • E.g. planning under uncertainty?

Transition System Perspective • Model agent-env. dynamics as transition systems • A transition system is a 2-tuple <S,A> where • S is a set of states • A is a set of actions, each action a being a subset of SXS • Graphs with states = to nodes, and actions =edges • If transitions are not deterministic, then the edges will be “hyper-edges • Agent may know that its initial state is subset S’ of S • If the env. is not fully observable, then |S’|>1 . • Consider some subset Sg of S as desirable • Finding a plan is equivalent to finding (shortest) path in the graph corresponding to the transition system

Transition System Models A transition system is a two tuple <S, A> Where S is a set of “states” A is a set of “transitions” each transition a is a subset of SXS --If a is a (partial) function then deterministic transition --otherwise, it is a “non-deterministic” transition --It is a stochastic transition If there are probabilities associated with each state a takes s to --Finding plans becomes is equivalent to finding “paths” in the transition system Each action in this model can be Represented by incidence matrices (e.g. below) The set of all possible transitions Will then simply be the SUM of the Individual incidence matrices Transition system models are called “Explicit state-space” models In general, we would like to represent the transition systems more compactly e.g. State variable representation of states. These latter are called “Factored” models

Manipulating Transition Systems

MDPs = general transition systems • A Markov Decision Process) is a general (deterministic or non-) transition system where the states have “Rewards” • In the general case, all states can have varying amount of rewards • Planning defined as finding a “policy” • A mapping from states to actions • which has the maximal expected reward

Problems with transition systems • Transition systems are a great conceptual tool • …However direct manipulation of transition systems tends to be too cumbersome • The size of the explicit graph corresponding to a transition system is often very large • The remedy is to provide “compact” representations • Start by explicating the structure of the “states” • e.g. states specified in terms of state variables • Represent actions not as incidence matrices but rather functions specified directly in terms of the state vars • An action will work in any state where some state variables have certain values. • When it works, it will change the values of certain (other) state variables

Factoring States • 3 prop variables: P, Q, R • 8 world states

Boolean Functions BDDs {P, Q, R} -> {T/F} P Q P Q T T F

Refinement Planing