1 / 25

Policy Generation for Continuous-time Stochastic Domains with Concurrency

Policy Generation for Continuous-time Stochastic Domains with Concurrency. Introduction. Policy generation for asynchronous stochastic systems Rich goal formalism policy generation and repair Solve relaxed problem using deterministic temporal planner Decision tree learning to generalize plan

bell
Download Presentation

Policy Generation for Continuous-time Stochastic Domains with Concurrency

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Policy Generation forContinuous-time Stochastic Domains with Concurrency

  2. Introduction • Policy generation for asynchronous stochastic systems • Rich goal formalism • policy generation and repair • Solve relaxed problem using deterministic temporal planner • Decision tree learning to generalize plan • Sample path analysis to guide repair

  3. Motivating Example • Deliver package from CMU to Honeywell PIT CMU Pittsburgh MSP Honeywell Minneapolis

  4. Elements of Uncertainty • Uncertain duration of flight and taxi ride • Plane can get full without reservation • Taxi might not be at airport when arriving in Minneapolis • Package can get lost at airports Asynchronous events  not semi-Markov

  5. Asynchronous Events • While the taxi is on its way to the airport, the plan may become full fill plane @ t0 driving taxiplane not full driving taxi plane full Arrival time distribution: F(t) F(t|t > t0)

  6. Rich Goal Formalism • Goals specified as CSL formulae •  ::= true | a |    |  | P≥ (U≤T) • Goal example: • “Probability is at least 0.9 that the package reaches Honeywell within 300 minutes without getting lost on the way” • P≥0.9 (lostpackageU≤300 atpkg,honeywell)

  7. Problem Specification • Given: • Complex domain model • Stochastic discrete event system • Initial state • Probabilistic temporally extended goal • CSL formula • Wanted: • Policy satisfying goal formula in initial state

  8. Generate, Test and Debug [Simmons, AAAI-88] Generate initial policy good Test if policy is good repeat bad Debug and repair policy

  9. Generate • Ways of generating initial policy • Generate policy for relaxed problem • Use existing policy for similar problem • Start with null policy • Start with random policy Generate Test Debug

  10. Test [Younes et al., ICAPS-03] • Use discrete event simulation to generate sample execution paths • Use acceptance sampling to verify probabilistic CSL goal conditions Generate Test Debug

  11. Debug • Analyze sample paths generated in test step to find reasons for failure • Change policy to reflect outcome of failure analysis Generate Test Debug

  12. Closer Look at Generate Step Gener Generate initial policy Test if policy is good Debug and repair policy

  13. Policy Generation Probabilistic planning problem Eliminate uncertainty Deterministic planning problem Solve using temporal planner(e.g. VHPOP [Younes & Simmons, JAIR 20]) Temporal plan Generate training databy simulating plan State-action pairs Decision tree learning Policy (decision tree)

  14. Conversion to Deterministic Planning Problem • Assume we can control nature: • Exogenous events are treated as actions • Actions with probabilistic effects are split into multiple deterministic actions • Trigger time distributions are turned into interval duration constraints • Objective: Find some execution trace satisfying path formula 1U≤T 2 of probabilistic goal P≥ (1U≤T 2)

  15. arrive-planeplane,pgh-airport,mpls-airport Generating Training Data enter-taxime,pgh-taxi,cmu s0: enter-taxime,pgh-taxi,cmu s1: depart-taxime,pgh-taxi,cmu,pgh-airport depart-planeplane,pgh-airport,mpls-airport s2: idle depart-taxime,pgh-taxi,cmu,pgh-airport s3: leave-taxime,pgh-taxi,pgh-airport arrive-taxipgh-taxi,cmu,pgh-airport s4: check-inme,plane,pgh-airport leave-taxime,pgh-taxi,pgh-airport s5: idle … check-inme,plane,pgh-airport s0 s3 s6 s1 s4 s2 s5

  16. Policy Tree atpgh-taxi,cmu atme,cmu atplane,mpls-airport enter-taxi depart-taxi atmpls-taxi,mpls-airport atme,pgh-airport atme,mpls-airport movingmpls-taxi,mpls-airport,honeywell check-in inme,plane enter-taxi depart-taxi idle leave-taxi idle movingpgh-taxi,cmu,pgh-airport idle leave-taxi

  17. Closer Look at Debug Step Generate initial policy Test if policy is good Debug Debug and repair policy

  18. Policy Debugging Sample execution paths Sample path analysis Failure scenarios Solve deterministic planning problem taking failure scenario into account Temporal plan Generate training databy simulating plan State-action pairs Incremental decision tree learning[Utgoff et al., MLJ 29] Revised policy

  19. Sample Path Analysis • Construct Markov chain from paths • Assign values to states • Failure: –1; Success: +1 • All other: • Assign values to events • V(s′) – V(s) for transition s→s′caused by e • Generate failure scenarios

  20. Markov chain: 2/3 1/2 s0 s1 s2 1/3 1/2 1 s3 s4  = 0.9 State values: Event values: V(s0) = –0.213 V(e1) = 2·(V(s1) – V(s0)) = –1.284 V(s1) = –0.855 V(e2) = (V(s2) – V(s1)) + (V(s2) – V(s4)) = –0.245 V(s2) = –1 V(e3) = V(s3) – V(s0) = +1.213 V(s3) = +1 V(e4) = V(s4) – V(s1) = –0.045 V(s4) = –0.9 Sample Path Analysis: Example Sample paths: e1 e2 s0 s1 s2 e1 e4 e2 s0 s1 s4 s2 e3 s0 s3

  21. Failure Scenarios Failure paths: e1 e2 s0 s1 s2 e1 e4 e2 s0 s1 s4 s2

  22. Additional Training Data leave-taxime,pgh-taxi,cmu s0: leave-taxime,pgh-taxi,cmu s1: make-reservationme,plane,cmu depart-planeplane,pgh-airport,mpls-airport s2: enter-taxime,pgh-taxi,cmu fill-planeplane,pgh-airport s3: depart-taxime,pgh-taxi,cmu,pgh-airport make-reservationme,plane,cmu s4: idle enter-taxime,pgh-taxi,cmu s5: idle … depart-taxime,pgh-taxi,cmu,pgh-airport arrive-taxipgh-taxi,cmu,pgh-airport s0 s4 s5 s6 s1 s3 s2

  23. Revised Policy Tree atpgh-taxi,cmu atme,cmu … has-reservationme,plane has-reservationme,plane enter-taxi make-reservation depart-taxi leave-taxi

  24. Summary • Planning with stochastic asynchronous events using a deterministic planner • Decision tree learning to generalize deterministic plan • Sample path analysis for generating failure scenarios to guide plan repair

  25. Coming Attractions • Decision theoretic planning with asynchronous events: • “A formalism for stochastic decision processes with asynchronous events”, MDP Workshop at AAAI-04 • “Solving GSMDPs using continuous phase-type distributions”, AAAI-04

More Related