1 / 47

A Hybridized Planner for Stochastic Domains

A Hybridized Planner for Stochastic Domains. Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento. Planning under Uncertainty (ICAPS’03 Workshop). Qualitative (disjunctive) uncertainty Which real problem can you solve?.

Download Presentation

A Hybridized Planner for Stochastic Domains

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Hybridized Planner for Stochastic Domains Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento

  2. Planning under Uncertainty(ICAPS’03 Workshop) • Qualitative (disjunctive) uncertainty • Which real problem can you solve? • Quantitative (probabilistic) uncertainty • Which real problem can you model?

  3. The Quantitative View • Markov Decision Process • models uncertainty with probabilistic outcomes • general decision-theoretic framework • algorithms are slow • do we need the full power of decision theory? • is an unconverged partial policy any good?

  4. The Qualitative View • Conditional Planning • Model uncertainty as logical disjunction of outcomes • exploits classical planning techniques  FAST • ignores probabilities  poor solutions • how bad are pure qualitative solutions? • can we improve the qualitative policies?

  5. HybPlan: A Hybridized Planner • combine probabilistic + disjunctive planners • produces good solutions in intermediate times • anytime: makes effective use of resources • bounds termination with quality guarantee • Quantitative View • completes partial probabilistic policy by using qualitative policies in some states • Qualitative View • improves qualitative policies in more important regions

  6. Outline • Motivation • Planning with Probabilistic Uncertainty (RTDP) • Planning with Disjunctive Uncertainty (MBP) • Hybridizing RTDP and MBP (HybPlan) • Experiments • Conclusions and Future Work

  7. Markov Decision Process < S, A, Pr, C, s0, G > S : a set of states A : a set of actions Pr : prob. transition model C : cost model s0 : start state G: a set of goals Find a policy (S!A) • minimizes expected cost to reach a goal • for an indefinite horizon • for a fully observable • Markov decision process. Optimal cost function, J*, ~ optimal policy

  8. Example 2 Longer path s0 Goal All states are dead-ends 2 Wrong direction, but goal still reachable

  9. Optimal State Costs 2 2 3 3 4 4 1 1 3 2 1 1 4 0 1 1 3 2 1 Goal 8 8 2 7 7 6

  10. Optimal Policy 3 2 1 4 0 3 2 1 Goal

  11. Bellman Backup: Create better approximation to cost function @ s

  12. Bellman Backup: Create better approximation to cost function @ s Trial=simulate greedy policy & update visited states

  13. Real Time Dynamic Programming(Barto et al. ’95; Bonet & Geffner’03) Bellman Backup: Create better approximation to cost function @ s Repeat trials until cost function converges Trial=simulate greedy policy & update visited states

  14. Planning with Disjunctive Uncertainty • < S, A, T, s0, G > S : a set of states A : a set of actions T : disjunctive transition model s0 : the start state G: a set of goals • Find a strong-cyclic policy (S!A) • that guarantees reaching a goal • for an indefinite horizon • for a fully observable • planning problem

  15. Model Based Planner (Bertoli et. al.) • States, transitions, etc. represented logically • Uncertainty  multiple possible successor states • Planning Algorithm • Iteratively removes “bad” states. • Bad = don’t reach anywhere or reach other bad states

  16. MBP Policy Sub-optimal solution Goal

  17. Outline • Motivation • Planning with Probabilistic Uncertainty (RTDP) • Planning with Disjunctive Uncertainty (MBP) • Hybridizing RTDP and MBP (HybPlan) • Experiments • Conclusions and Future Work

  18. HybPlan Top Level Code 0. run MBP to find a solution to goal • run RTDP for some time • compute partial greedy policy (rtdp) • compute hybridized policy (hyb) by • hyb(s) = rtdp(s) if visited(s) > threshold • hyb(s) = mbp(s) otherwise • cleanhyb by removing • dead-ends • probability 1 cycles • evaluatehyb • save best policy obtained so far repeat until 1) resources exhaust or 2)a satisfactory policy found

  19. First RTDP Trial 0 run RTDP for some time 2 0 0 0 0 0 0 0 0 0 0 0 Goal 0 0 0 0 0 0 0 2 0 0 0

  20. Bellman Backup 0 run RTDP for some time 2 0 0 0 0 0 0 0 0 0 0 Goal 0 0 0 0 0 0 0 Q1(s,N) = 1 + 0.5£ 0 + 0.5£ 0 Q1(s,N) = 1 Q1(s,S) = Q1(s,W) = Q1(s,E) = 1 J1(s) = 1 Let greedy action be North 2 0 0 0

  21. Simulation of Greedy Action 0 run RTDP for some time 2 0 0 0 0 0 0 0 0 0 0 1 Goal 0 0 0 0 0 0 0 2 0 0 0

  22. Continuing First Trial 0 run RTDP for some time 2 0 0 0 0 0 0 0 0 0 1 Goal 0 0 0 0 0 0 0 2 0 0 0

  23. Continuing First Trial 0 run RTDP for some time 2 0 0 1 0 0 0 0 0 0 1 Goal 0 0 0 0 0 0 0 2 0 0 0

  24. Finishing First Trial run RTDP for some time 2 1 0 0 1 0 0 0 0 0 0 1 Goal 0 0 0 0 0 0 0 2 0 0 0

  25. Cost Function after First Trial 2 run RTDP for some time 2 1 0 0 1 0 0 0 0 0 0 1 Goal 0 0 0 0 0 0 0 2 0 0 0

  26. Partial Greedy Policy 2 2. compute greedy policy (rtdp) 2 1 0 1 1 Goal

  27. Construct Hybridized Policy w/ MBP 2 3. compute hybridized policy (hyb) (threshold = 0) 2 1 0 0 1 1 Goal

  28. Evaluate Hybridized Policy 2 2 5. evaluatehyb 6. store hyb 2 1 0 3 3 0 1 4 4 1 Goal 5 After first trial J(hyb) = 5

  29. Second Trial 2 2 1 0 0 1 0 0 0 0 0 2 1 Goal 1 1 0 0 0 0 0 2 0 0 0

  30. Partial Greedy Policy 0 2 1 1 1

  31. Absence of MBP Policy 2 2 1 0 MBP Policy doesn’t exist! no path to goal 0 1 0 £ 2 1 Goal 1 1

  32. Third Trial 2 2 1 0 0 1 0 0 0 0 0 2 1 Goal 1 1 0 0 0 0 1 2 1 0 3

  33. Partial Greedy Policy 1 0 1 2 1 3

  34. Probability 1 Cycles repeat find a state s in cycle hyb(s) = mbp(s) until cycle is broken 1 0 1 2 1 0 3

  35. Probability 1 Cycles repeat find a state s in cycle hyb(s) = mbp(s) until cycle is broken 1 0 1 2 1 0 3

  36. Probability 1 Cycles repeat find a state s in cycle hyb(s) = mbp(s) until cycle is broken 1 0 1 2 1 0 3

  37. Probability 1 Cycles repeat find a state s in cycle hyb(s) = mbp(s) until cycle is broken 1 0 1 2 1 0 3

  38. Probability 1 Cycles 2 2 1 0 0 1 repeat find a state s in cycle hyb(s) = mbp(s) until cycle is broken 1 Goal 0 1 2 1 0 3

  39. Error Bound 2 2 2 1 0 3 3 0 1 4 4 J*(s0) · 5 J*(s0) ¸ 1 ) Error(hyb) = 5-1 = 4 1 Goal 5 After 1st trial J(hyb) = 5

  40. Termination • when a policy of required error bound is found • when the planning time exhausts • when the available memory exhausts Properties • outputs a proper policy • anytime algorithm (once MBP terminates) • HybPlan = RTDP, if infinite resources available • HybPlan = MBP, if extremely limited resources • HybPlan = better than both, otherwise

  41. Outline • Motivation • Planning with Probabilistic Uncertainty (RTDP) • Planning with Disjunctive Uncertainty (MBP) • Hybridizing RTDP and MBP (HybPlan) • Experiments • Anytime Properties • Scalability • Conclusions and Future Work

  42. Domains NASA Rover Domain Factory Domain Elevator domain

  43. Anytime Properties RTDP

  44. Anytime Properties RTDP

  45. Scalability

  46. Conclusions • First algorithm that integrates disjunctive and probabilistic planners. • Experiments show that HybPlan is • anytime • scales better than RTDP • produces better quality solutions than MBP • can interleaved planning and execution

  47. Hybridized Planning: A General Notion • Hybridize other pairs of planners • an optimal or close-to-optimal planner • a sub-optimal but fast planner to yield a planner that produces • a good quality solution in intermediate running times • Examples • POMDP : RTDP/PBVI with POND/MBP/BBSP • Oversubscription Planning : A* with greedy solutions • Concurrent MDP : Sampled RTDP with single-action RTDP

More Related