1 / 17

What’s Planning?

What’s Planning?. Derek Long University of Strathclyde, Glasgow. What’s Planning?. Johann: “Diagnosis = Planning (almost)” Rearranging the inequality: Planning is more than diagnosis Hadas: planning is finding counter-examples to disprove LTL formulae

lois-boone
Download Presentation

What’s Planning?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What’s Planning? Derek Long University of Strathclyde, Glasgow

  2. What’s Planning? • Johann: “Diagnosis = Planning (almost)” • Rearranging the inequality: • Planning is more than diagnosis • Hadas: planning is finding counter-examples to disprove LTL formulae • Actually, this is a way to view one kind of planning • Brian: planning is venerable and geriatric (or was it ‘generative’?) • Most interesting planning is much less than 10 years old

  3. What do we need? • Start with some assumptions… • Assume the world can be described as a set of states • Assume that things cause transitions between these states – these things include (controllable) actions, but could also include events and processes • Assume that an initial state is (partly) known • Assume that the causal relationship between transitions and states is sufficiently predictable that there is a point in considering how to use the controllable actions to direct the transitions of the world

  4. Hybrid Timed Automaton • Our world model can be seen as a hybrid timed automaton • Finite set of discrete states • Associated vector of real-valued variables that can be changed by discrete transitions or by passage of time (under influence of processes) • Transitions can be triggered (events) or controlled (actions) and can be non-deterministic (with or without probability distributions) • States might be fully observable or only partially observable

  5. What is a plan? • A plan is something that tells an executive what to do in order to bring about desirable states of the world • Perform an action or wait • Desirable states are often states we want to get to (and stop) • Classical planning goals • Could be properties of states in a path (LTL formula) • Could also be determined by some reward function that accumulates reward for visiting states (and perhaps penalises bad states, or actions) • In general, we assume that we can map the trajectories determined by a plan to a value such that the higher the value, the better the plan

  6. What can an executive do? • Simple executives dispatch actions based only on time • Wall clock time (actions must be timestamped) • Sequenced (actions need only be ordered) • More complex executives could dispatch actions based on sensed states and an internal state • So, plans must map from the state of the executive and the sensed state of the world to actions (including wait)

  7. States with Structure • Typically, discrete state sets are large, so are represented using assignments to a vector of finite-domain variables • For example, consider a world in which vehicles perform a search over a square grid • A vehicle occupies a square of the grid and faces north, south, east or west • A vehicle can move forward, left or right, completing its move facing the direction it moved in • A vehicle can search a square it occupies • A state characterises the positions of the vehicles, their facings and the status of the squares (searched or unsearched) Searched Move N W E Search Unsearched S Say 12x12 grid, 4 vehicles: 2144 x 44 x 1444 = 2.5 x 10 54 states

  8. Planning: Classical and more • Classical planning: • only finite-domain variables and only deterministic transitions • Initial state is fully observable and goal specifies a set of alternative destination states • Plan quality is measured by number of actions • A plan can be specified as a sequence of actions • Determinism means we can be sure that only the states on the path from the initial state to the selected goal state are ever visited • Typically hard to find a feasible solution, so optimising is a secondary objective • Current best solutions based on heuristic guided search, using relaxations as the basis for heuristics

  9. Progress in Planning • A great deal of effort has been spent on finding good relaxations • Admissible relaxations guarantee optimal plans: current best based on combination of techniques, including identification of landmarks and automatically constructed pattern databases • The classical planning problem is PSPACE hard, but many benchmark domains are actually only NP-hard • A separate approach to planning has been compilation into other solving technologies: • SAT, CSP, model-checking • None of these approaches is currently competitive with the best dedicated planning systems

  10. Duration Start Preconditions End Preconditions Invariant Condition Start Effects End Effects Temporal Planning • Actions embedded in time, with a coupled pair of transitions marking the starts and ends of the durative actions • Plan quality usually measured by total duration of plan • Current best solutions extend classical planners by coupling the heuristic search to temporal constraint managers (STNs) and using relaxed temporal bounds on earliest application times of actions • Temporal uncertainty can be approached using controllable and uncontrollable temporal transitions (STNUs) • Planning with time is PSPACE-hard if number of copies of the same action executing concurrently is bounded, otherwise EXPTIME

  11. Trajectory Constraints • Planning to satisfy LTL formulae has been explored using approaches based on compiling the formulae to automata, linking these into the existing actions and then applying standard planners • Surprisingly effective for interesting constraints

  12. Using real variables • Resources can be modelled using real-valued variables • Actions can change these values (discretely) and processes can change them as a function of passage of time • Plan quality can be measured by combinations of duration and values of metric variables (eg fuel costs, monetary costs, benefits from rewards etc) • Best current approaches also use heuristic guided search, using bounding interval or LP relaxations of the MILP constraints generated by discrete action effects and LP relaxations of linear effects • Another alternative for continuous processes is discretise-and-validate (can handle non-linear effects) • Adding numbers makes planning undecidable, in general…

  13. For instance • Temporal version of the search problem, vehicles moving at different speeds, with fuel limiting the number of moves each vehicle can perform • Solve 10x10 grid problem, 4 vehicles, in under 10 seconds for balanced problem (makespan 80 versus nominal optimal of 72) • Up to 1 minute for unbalanced vehicles (a 255 step plan), within 10% of optimal

  14. Planning under Uncertainty • Uncertainty can arise in many forms… • Partially observable initial state (and subsequent states) • Non-deterministic action effects • Uncertainty about duration and resource consumption of actions • Plans can no longer specify only what to do in states on the planned trajectory, since uncertainty means we might visit states we had not intended • Now require policies: • Mapping from current “state” to action (and possibly a new internal state) • Here a state might be a world state or a sensed partial world state and an internal state (often a belief state)

  15. Policy construction • Finding policies is hard! • Usually assume we have a reward function and a cost for actions • Reward often assumed to be additive and discounted • Plan quality is measured by expected total net reward • Bellman equations determine the optimal policy for such a problem… • In principle, these can be solved by convergent iterated approximation schemes (policy iteration or value iteration) • In practice, this is not practical for interesting problems

  16. Realistic Policy Building? • Partial policies (only offer actions for states that are likely to be visited) • Abstraction (grouping of similar states) and other “clumping” techniques • Hindsight optimisation: • Solve Monte Carlo samples and then improve the policy based on better solutions to samples • Partial observability introduces a potential exponential lift in complexity due to handling belief states (or state occupancy probability distributions)

  17. What to do with a partial policy • If we find ourselves in a state with no policy mapping then we can extend the policy: • On-line planning/replanning (policy extension or repair) • Default actions (to attempt to return to nominal trajectory)

More Related