1 / 34

A Framework for Planning in Continuous-time Stochastic Domains

A Framework for Planning in Continuous-time Stochastic Domains. Introduction. Policy generation for complex domains Uncertainty in outcome and timing of actions and events Time as a continuous quantity Concurrency Rich goal formalism Achievement, maintenance, prevention Deadlines.

ciaran-fry
Download Presentation

A Framework for Planning in Continuous-time Stochastic Domains

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Framework for Planning in Continuous-time Stochastic Domains

  2. Introduction • Policy generation for complex domains • Uncertainty in outcome and timing of actions and events • Time as a continuous quantity • Concurrency • Rich goal formalism • Achievement, maintenance, prevention • Deadlines

  3. Motivating Example • Deliver package from CMU to Honeywell PIT CMU Pittsburgh MSP Honeywell Minneapolis

  4. Elements of Uncertainty • Uncertain duration of flight and taxi ride • Plane can get full without reservation • Taxi might not be at airport when arriving in Minneapolis • Package can get lost at airports

  5. Modeling Uncertainty • Associate a delay distribution F(t) with each action/event a • F(t) is the cumulative distribution function for the delay from when a is enabled until it triggers arrive U(20,40) driving at airport Pittsburgh Taxi

  6. Concurrency • Concurrent semi-Markov processes arrive U(20,40) driving at airport Pittsburgh Taxi move PT drivingMT at airport Exp(1/40) t=0 at airport moving MT move U(10,20) Minneapolis Taxi return PT drivingMT moving t=24 Generalized semi-Markov process

  7. Rich Goal Formalism • Goals specified as CSL formulae •  ::= true | a |    |  | Pr≥ () •  ::=  U≤t  | ◊≤t  | □≤t 

  8. Goal for Motivating Example • Probability at least 0.9 that the package reaches Honeywell within 300 minutes without getting lost on the way • Pr≥0.9(pkg lost U≤300 pkg@Honeywell)

  9. Problem Specification • Given: • Complex domain model • Stochastic discrete event system • Initial state • Probabilistic temporally extended goal • CSL formula • Wanted: • Policy satisfying goal formula in initial state

  10. Generate, Test and Debug [Simmons 88] Generate initial policy good Test if policy is good repeat bad Debug and repair policy

  11. Generate • Ways of generating initial policy • Generate policy for relaxed problem • Use existing policy for similar problem • Start with null policy • Start with random policy Not focus of this talk! Generate Test Debug

  12. Test • Use discrete event simulation to generate sample execution paths • Use acceptance sampling to verify probabilistic CSL goal conditions Generate Test Debug

  13. Debug • Analyze sample paths generated in test step to find reasons for failure • Change policy to reflect outcome of failure analysis Generate Test Debug

  14. More on Test Step Generate initial policy Test Test if policy is good Debug and repair policy

  15. Error Due to Sampling • Probability of false negative: ≤ • Rejecting a good policy • Probability of false positive: ≤ • Accepting a bad policy (1−)-soundness

  16. Acceptance Sampling • Hypothesis: Pr≥()

  17. 1 –  Probability of acceptingPr≥() as true   Actual probability of  holding Performance of Test

  18. 1 –  False negatives Probability of acceptingPr≥() as true   False positives Actual probability of  holding Ideal Performance

  19. 1 –  False negatives Probability of acceptingPr≥() as true Indifference region   –    +  False positives Actual probability of  holding Realistic Performance

  20. True, false,or anothersample? SequentialAcceptance Sampling [Wald 45] • Hypothesis: Pr≥()

  21. Number ofpositive samples Number of samples Graphical Representation of Sequential Test

  22. Accept Number ofpositive samples Continue sampling Reject Number of samples Graphical Representation of Sequential Test • We can find an acceptance line and a rejection line given , , , and  A,,,(n) R,,,(n)

  23. Accept Number ofpositive samples Continue sampling Reject Number of samples Graphical Representation of Sequential Test • Reject hypothesis

  24. Accept Number ofpositive samples Continue sampling Reject Number of samples Graphical Representation of Sequential Test • Accept hypothesis

  25. Number ofpositive samples Number of samples Anytime Policy Verification • Find best acceptance and rejection lines after each sample in terms of  and 

  26. Verification Example • Initial policy for example problem =0.01 Error probability CPU time (seconds)

  27. More on Debug Step Generate initial policy Test if policy is good Debug Debug and repair policy

  28. Role of Negative Sample Paths • Negative sample paths provide evidence on how policy can fail • “Counter examples”

  29. Generic Repair Procedure • Select some state along some negative sample path • Change the action planned for the selected state Need heuristics to make informed state/action choices

  30. Scoring States • Assign –1 to last state along negative sample path and propagate backwards • Add over all negative sample paths s1 s2 s5 s5 s9 failure −1 −4 −3 −2 − s1 s5 s5 s3 failure −1 −3 −2 −

  31. Example • Package gets lost at Minneapolis airport while waiting for the taxi • Repair: store package until taxi arrives

  32. Verification of Repaired Policy =0.01 Error probability CPU time (seconds)

  33. Comparing Policies • Use acceptance sampling: • Pair samples from the verification of two policies • Count pairs where policies differ • Prefer first policy if probability is at least 0.5 of pairs where first policy is better

  34. Summary • Framework for dealing with complex stochastic domains • Efficient sampling-based anytime verification of policies • Initial work on debug and repair heuristics

More Related