1 / 58

Learning Control Knowledge for Planning

Learning Control Knowledge for Planning. Yi-Cheng Huang. Outline. I. Brief overview of planning II. Planning with Control knowledge III. Learning control knowledge IV. Conclusion. I. Overview of Planning. Planning - a very general framework for many applications: Robot control;

bardia
Download Presentation

Learning Control Knowledge for Planning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Control Knowledge for Planning Yi-Cheng Huang

  2. Outline I. Brief overview of planning II. Planning with Control knowledge III. Learning control knowledge IV. Conclusion

  3. I. Overview of Planning • Planning - a very general framework for many applications: • Robot control; • Airline scheduling; • Hubble space telescope control. • Planning – find a sequence of actions that leads from an initial state to a goal state.

  4. Planning Is Difficult –Abundance of Negative Complexity Results • Domain-independent planning: PSPACE-complete or worse (Chapman 1987; Bylander 1991; Backstrom 1993). • Domain-dependent planning: NP-complete or worse (Chenoweth 1991; Gupta and Nau 1992). • Approximate planning: NP-complete or worse(Selman 1994).

  5. Recent State-of-the-art Planners • Constraint-based Planners –Graphplan, Blackbox. • Heuristic Search Planners –HSP, FF. • Both kinds of planners can solve problems in seconds or minutes that traditional planners take hours or days.

  6. Graphplan(Blum & Furst, 1995) Facts Actions Facts ... ... ... Time i Time i+1 Search on planning graph to find plan

  7. Blackbox(Kautz & Selman, 1999) problem Satisfiability Tester( Chaff ,WalkSat, Satz, RelSat, ...) plan

  8. Heuristic Search Based Planning(Bonet & Geffner, ‘97) • Use various heuristic functions to approximate the distance from the current state to the goal state based on the planning graph. • Use Best-First Search or A* search to find plans.

  9. II. Planning With Control • General focus on planning:avoid search as much as possible. • Many real-world applications are tailored and simplified by domain specific knowledge. • TLPlan is an efficient planner when usingcontrol knowledgeto guild a forward-chaining search planner (Bacchus & Kabanza 2000) .

  10. TLPlan Temporal Logic Control Formula

  11. A Simple Control Rule Example (goal (at(obj loc)) at(obj loc)) Temporal logic operator: “always” “next” Goal Do NOT move an object at the goal location

  12. Question: Whether the same level of control can be effectively incorporated into constraint-based planner?

  13. Control Rules Categories • Rules involves only static information. • Rules depends on the current state. • Rules depends on the current state and require dynamic user-defined predicates.

  14. Category I Control Rules(only depends on goal; toy example) a Goal a a L Do NOTunload an package from an airplane if the current location is not in the package’s goal

  15. Pruning the Planning GraphCategory I Rules Facts Actions Facts ... ...

  16. Effect of Graph Pruning

  17. Category II Control Rules a L Do NOTmove an airplane if there is an object in the airplane that needs to be unloaded at that location.

  18. Control by Adding Constraints Temporal Logic Control Rules Planning Formula Constraints Clauses

  19. Rules Without Compact Encoding a Goal b a SFO b DC NYC ORL Do NOT move a vehicle unless (a) there is an object that needs to be picked up (b) there is an object in the vehicle that needs to be unloaded

  20. Complex Encoding for Category III Rules • Need to define extra predicates: need_to_move_by_airplane; need_to_unload_by_airplane • Introduce extra literals and clauses. O(mn) ground literals; O(mn+km^2) clauses at each time step. m: #cities, n: #objects, k: #airports • No easy encoding for category III rules. • However, it appears category I & II rules do most of work.

  21. Blackbox with Control Knowledge(Logistics domain with hand-coded rules) Note: Logarithmic time scale

  22. Comparison of Blackbox and TLPlan(Run Time)

  23. Comparison of Blackbox and TLPlan(parallel plan length; “plan quality”)

  24. Summary Adding Control Knowledge • We have shown how to add declarative control knowledge to a constraint-based planners by using temporal logic statements. • Adding such knowledge gives significant speedups (up to two orders of magnitude). • Pure heuristic search with control can be still faster but with much lower plan quality.

  25. III. Can we learn domain knowledge from example plans?

  26. Motivation • Control Rules used in TLPlan and Blackbox arehand-coded. • Idea: learn control rules on a sequence of small problems solved by planner.

  27. Learning System Framework Problem Blackbox Planner Plan Justification / Type Inference ILP Learning Module / Verification Control Rules

  28. Target Concepts for Actions • Action Select Rule:indicate conditions under which the action can be performed immediately. • Action Reject Rule: indicate conditions under which it must not be performed.

  29. Basic Assumption on Learning Control • Plan found by planner on simple problems are optimal or near-optimal. • Actions appear in an optimal plan must beselected. • Actions that can be executed but do not appear in the plan must berejected.

  30. Definition • Real action: action appears in the plan. • Virtual action: action that its preconditions are hold but does not appear in the plan.

  31. An Toy Planning Example Goal Initial Initial a b a b BOS NYC SFO

  32. Real&VirtualActions for UnloadAirplane Time 1: LoadAirplane (P a BOS) Time 2: FlyAirplane (P SFO NYC) UnloadAirplane (P a BOS) Time 3: LoadAirplane (P b NYC) UnloadAirplane (P a NYC) Time 4: FlyAirplane (P NYC SFO) UnloadAirplane (P a NYC) UnloadAirplane (P b NYC) Time 5: UnloadAirplane (P a SFO) UnloadAirplane (P b SFO) Virtual Real

  33. Heuristics for Extracting Examples

  34. Rule Induction • Based on Quinlan’s FOIL (Quinlan 1990; 1996). Literal: • Xi = Xj , ex., loc1 = loc2 • P(X1,…, Xn), ex., at (pkg, loc) • goal (P(X1,…, Xn)), ex., goal (at (pkg, loc)) • negation of the above

  35. Reject Rule: UnloadAirplane UnloadAirplane (pln pkg apt)

  36. Reject Rule: UnloadAirplane UnloadAirplane (pln pkg apt) goal(at (pkg loc))

  37. Reject Rule: UnloadAirplane UnloadAirplane (pln pkg apt) goal(at (pkg loc)) ^(apt != loc)

  38. Learning Time

  39. Logistics Domain

  40. Learned Logistics Control Rules If an object’s goal location is at different city, do NOT unload the object from airplanes. Unload an object from a truck if the current location is an airport and it is not in the same city as the package’s goal location.

  41. Briefcase Domain

  42. Grid Domain

  43. Gripper Domain

  44. Mystery Domain

  45. Tireworld Domain

  46. Summary of Learning for Planning • Introduced inductive logic programming methodology into constraint-based planning framework to obtain “trainableplanner”. • Demonstrated clear practical speedups on range of benchmark problems.

  47. IV. Single-agent vs. Multi-agentplanning • Observations: heuristic planners degrade rapidly in multi-agent settings. They tend to assign all work to a single agent. • We studied this phenomenon by exploring different work-loaddistributions.

  48. Force the Planners • There is no easy way to modify the heuristic search planners to find better quality plans. • Limit the number of feature actions an agent can performtoforcethe planners to find plans with the same level of participation of all agents.

  49. Sokoban Domain

  50. Restricted Sokoban Domain

More Related