1 / 28

Deliberation Scheduling for Planning in Real-Time

Deliberation Scheduling for Planning in Real-Time. David J. Musliner Honeywell Laboratories Robert P. Goldman SIFT, LLC Kurt Krebsbach Lawrence University. Outline. Application summary. Deliberation scheduling problem. Analytic experiments. Demonstration tests. Conclusions.

jaegar
Download Presentation

Deliberation Scheduling for Planning in Real-Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deliberation Scheduling for Planning in Real-Time David J. Musliner Honeywell Laboratories Robert P. Goldman SIFT, LLC Kurt Krebsbach Lawrence University

  2. Outline • Application summary. • Deliberation scheduling problem. • Analytic experiments. • Demonstration tests. • Conclusions.

  3. Planning and Action for Real-Time Control Adaptive Mission Planner: Decomposes an overall mission into multiple control problems, with limited performance goals designed to make the controller synthesis problem solvable with available time and available execution resources. Controller Synthesis Module: For each control problem, synthesizes a real-time reactive controller according to the constraints sent from AMP. Real Time Subsystem: Continuously executes synthesized control reactions in hard real-time environment; does not “pause” waiting for new controllers. Adaptive Mission Planner Controller Synthesis Module Real Time System

  4. Controller Synthesis Module (CSM) Available actions Uncontrollable transitions Controller Synthesis Module Timed Automata Controller Design & Executable Reactive Controller Goal state description Initial state description Problem configuration

  5. AMP Overview • Mission is the main input: threats and goals, specific to different mission phases (e.g., ingress, attack, egress). • Threats are safety-critical: must guarantee to maintain safety (sometimes probabilistically) in worst case, using real-time reactions. • Goals are best-effort: don’t need to guarantee. • Each mission phase requires a plan (or controller), built by the CSM to handle a problem configuration. • Changes in capabilities, mission, environment can lead to need for additional controller synthesis.

  6. AMP Responsibilities • Divide mission into phases, subdividing them as necessary to handle resource restrictions. • Build problem configurations for each phase, to drive CSM. • Modify problem configurations, both internally and via negotiation with other AMPs, to handle resource limitations. • Capabilities (assets). • Bounded rationality: deliberation resources. • Bounded reactivity: execution resources.

  7. AMP Deliberation Scheduling • MDP-based approach for AMP to adjust CSM problem configurations and algorithm parameters to maximize expected utility of deliberation. • Issues: • Complex utility function for overall mission plan. • Survival dependencies between sequenced controllers. • Require CSM algorithm performance profiles. • Planning that is expected to complete further in the future must be discounted. • Differences from other deliberation scheduling techniques: • CSM planning is not an anytime algorithm --- it’s more a Las Vegas than a Monte Carlo algorithm. • It’s not a problem of trading deliberation versus action: deliberation and action proceed in concert. • Survival of the platform is key concern.

  8. AMP Deliberation Scheduling • Mission phases characterized by: • Probability of survival/failure. • Expected reward. • Expected start time and duration. • Agent keeps reward from all executed phases. • Different CSM problem configuration operators yield different types of plan improvements. • Improve probability of survival. • Improve expected reward (number or likelihood of goals). • Configuration operators can be applied to same phase in different ways (via parameters). • Configuration operators have different expected resource requirements (computation time/space).

  9. Expected Mission Utility Markov chain behavior in the mission phases: Probability of surviving vs. entering absorbing failure state. Reward expectations unevenly distributed. s1 s2 s3 s4 Phase 2 Phase 1 Phase 4 Phase 3 Phase 5 R5 R3 1-s1 FAILURE

  10. The Actions: CSM Performance Profiles AMP attempts to predict time-to-plan from domain characteristics, so AMP can be smart about configuring CSM problems in time-constrained situations.

  11. Histogram of Same Performance Results 1T 2T 3T 4T AMP’s performance estimate: 80% likely to find plan in given deliberation quanta for given number of threats. T = threats. Q = 4 seconds. 1Q 2Q 4Q 7Q Note increasing spread (uncertainty of runtime) as problem grows.

  12. Modeling the Problem as MDP • Actions: commit to 80% success time for CSM plan. • All actions have equal probability of success. • Durations vary. • States: • Sink states: destruction and mission completion. • Other states: vector of survival probabilities. • Utility model: goal achievement + survival.

  13. Algorithms • Optimal MDP solution: Bellman backup (finite horizon problem). • Very computationally expensive. • Greedy one-step lookahead. • Assume you do only one computational action, which is best. • Discounted variant. • Strawmen: shortest-action first, earliest-phase first, etc. • Conducted a number of comparison experiments (results published elsewhere).

  14. Discount Factors • Greedy use of basic expected utility formula requires discounting to take into account two important effects: • Window of opportunity for deliberation: you have more future time to deliberate on phases that start later. • Otherwise, large potential improvements in far-out phases can distract from near-term improvements. • Split phase when new plan downloaded during execution: Amount of improvement limited by time remaining in phase.

  15. Runtime Comparison of Optimal & Greedy

  16. Quality Result for Medium Scenarios

  17. “Medium” Quality Comparison Summary • Discounted greedy agent beats simple greedy agent 79 times, ties 3, loses 2. • Discounted greedy agent averages 86% of optimal expected utility; simple greedy averages 79%. • More difficult domains challenge myopic policies, and crush random policy (73% overall). Discounted greedy beats random 83/84 times. • Even on easy scenarios, optimal is waaaay too slow!

  18. Mission Testing • Modified AMP to incorporate deliberation scheduling algorithms. • Tested three different agents: • S – shortest problem first; • U – simple greedy DS; • DU – greedy with discounting. • Tested in mission with multiple threats and two goals.

  19. Mission Overview Egress 5 Attack 6 4 3 2 Ingress 1 0

  20. Demo Outcome • Shortest: • Builds all the easy single-threat plans quickly. • Survives the entire mission. • Waits too long before building plans for goal achievement; fails to hit targets. • Utility: • Builds safe plans for most threats • Gets distracted by high-reward goal in egress phase. • Dies in attack phase due to unhandled threat. • Discounted utility: • Completes entire mission successfully.

  21. Expected Payoff vs. Time Apparent drop in utility is due to phase update. Utility chooses badly, tries to plan for egress but ignores threat during attack. Shortest chooses badly, discards good plans and tries goal plans too late.

  22. Demo 2: Ingress Phase • All three are attacked but defend selves successfully.

  23. Demo 2: Attack Phase • Utility and Discounted utility hit targets. • Utility dies from unhandled threat. • Shortest stays safe but does not strike target.

  24. Demo 2: Second Attack Phase (“Egress”) • Only Discounted utility hits second target. • Shortest stays safe but does not strike target.

  25. Summary

  26. The End

  27. Related Topics • Conventional Deliberation Scheduling Work: • Typically this work assumes the object-level computation is based on anytime algorithms. • CSM algorithms are not readily converted to anytime. Performance improvements are discrete and all-or-nothing. • Because of true parallel Real Time System/AI System, don’t have conventional think/act tradeoffs. • Design-to-time: appropriate, but building full schedules versus single action choices. Comparison may be possible. • MDP solvers: either infinite horizon or finite horizon with offline policy computation. We have on-line decision making with dynamic MDP.

  28. Demo Scenario • Three types of threats (IR, radar, radar2) during ingress, attack, and egress phases. • Targets in attack and egress phases. • Overall, there are 41 valid different problem configurations that can be sent to the CSM. Some are unsolvable in allocated time. • Performance profiles are approximate: • Predicted planning times range from 1 to 60 seconds. • Some configurations take less than predicted. • Some take more, and time out rather than finishing. • Mission begins as soon as first plan available (< 1 second). • Mission lasts approx 4 minutes. • Doing all plans would require 22.3 minutes.

More Related