deliberation scheduling for planning in real time n.
Skip this Video
Loading SlideShow in 5 Seconds..
Deliberation Scheduling for Planning in Real-Time PowerPoint Presentation
Download Presentation
Deliberation Scheduling for Planning in Real-Time

Loading in 2 Seconds...

play fullscreen
1 / 28

Deliberation Scheduling for Planning in Real-Time - PowerPoint PPT Presentation

  • Uploaded on

Deliberation Scheduling for Planning in Real-Time. David J. Musliner Honeywell Laboratories Robert P. Goldman SIFT, LLC Kurt Krebsbach Lawrence University. Outline. Application summary. Deliberation scheduling problem. Analytic experiments. Demonstration tests. Conclusions.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Deliberation Scheduling for Planning in Real-Time' - jaegar

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
deliberation scheduling for planning in real time

Deliberation Scheduling for Planning in Real-Time

David J. Musliner

Honeywell Laboratories

Robert P. Goldman


Kurt Krebsbach

Lawrence University

  • Application summary.
  • Deliberation scheduling problem.
  • Analytic experiments.
  • Demonstration tests.
  • Conclusions.
planning and action for real time control
Planning and Action for Real-Time Control

Adaptive Mission Planner: Decomposes an overall mission into multiple control problems, with limited performance goals designed to make the controller synthesis problem solvable with available time and available execution resources.

Controller Synthesis Module: For each control problem, synthesizes a real-time reactive controller according to the constraints sent from AMP.

Real Time Subsystem: Continuously executes synthesized control reactions in hard real-time environment; does not “pause” waiting for new controllers.

Adaptive Mission Planner

Controller Synthesis Module

Real Time System

controller synthesis module csm
Controller Synthesis Module (CSM)





Controller Synthesis Module









Goal state


Initial state


Problem configuration

amp overview
AMP Overview
  • Mission is the main input: threats and goals, specific to different mission phases (e.g., ingress, attack, egress).
    • Threats are safety-critical: must guarantee to maintain safety (sometimes probabilistically) in worst case, using real-time reactions.
    • Goals are best-effort: don’t need to guarantee.
  • Each mission phase requires a plan (or controller), built by the CSM to handle a problem configuration.
  • Changes in capabilities, mission, environment can lead to need for additional controller synthesis.
amp responsibilities
AMP Responsibilities
  • Divide mission into phases, subdividing them as necessary to handle resource restrictions.
  • Build problem configurations for each phase, to drive CSM.
  • Modify problem configurations, both internally and via negotiation with other AMPs, to handle resource limitations.
    • Capabilities (assets).
    • Bounded rationality: deliberation resources.
    • Bounded reactivity: execution resources.
amp deliberation scheduling
AMP Deliberation Scheduling
  • MDP-based approach for AMP to adjust CSM problem configurations and algorithm parameters to maximize expected utility of deliberation.
  • Issues:
    • Complex utility function for overall mission plan.
    • Survival dependencies between sequenced controllers.
    • Require CSM algorithm performance profiles.
    • Planning that is expected to complete further in the future must be discounted.
  • Differences from other deliberation scheduling techniques:
    • CSM planning is not an anytime algorithm --- it’s more a Las Vegas than a Monte Carlo algorithm.
    • It’s not a problem of trading deliberation versus action: deliberation and action proceed in concert.
    • Survival of the platform is key concern.
amp deliberation scheduling1
AMP Deliberation Scheduling
  • Mission phases characterized by:
    • Probability of survival/failure.
    • Expected reward.
    • Expected start time and duration.
  • Agent keeps reward from all executed phases.
  • Different CSM problem configuration operators yield different types of plan improvements.
    • Improve probability of survival.
    • Improve expected reward (number or likelihood of goals).
  • Configuration operators can be applied to same phase in different ways (via parameters).
  • Configuration operators have different expected resource requirements (computation time/space).
expected mission utility
Expected Mission Utility

Markov chain behavior in the mission phases:

Probability of surviving vs. entering absorbing failure state.

Reward expectations unevenly distributed.



















the actions csm performance profiles
The Actions: CSM Performance Profiles

AMP attempts to predict time-to-plan from domain characteristics, so AMP can be smart about configuring CSM problems in time-constrained situations.

histogram of same performance results
Histogram of Same Performance Results

1T 2T 3T 4T

AMP’s performance estimate: 80% likely to find plan in given deliberation quanta for given number of threats.

T = threats.

Q = 4 seconds.

1Q 2Q 4Q 7Q

Note increasing spread (uncertainty of runtime) as problem grows.

modeling the problem as mdp
Modeling the Problem as MDP
  • Actions: commit to 80% success time for CSM plan.
    • All actions have equal probability of success.
    • Durations vary.
  • States:
    • Sink states: destruction and mission completion.
    • Other states: vector of survival probabilities.
  • Utility model: goal achievement + survival.
  • Optimal MDP solution: Bellman backup (finite horizon problem).
    • Very computationally expensive.
  • Greedy one-step lookahead.
    • Assume you do only one computational action, which is best.
    • Discounted variant.
  • Strawmen: shortest-action first, earliest-phase first, etc.
  • Conducted a number of comparison experiments (results published elsewhere).
discount factors
Discount Factors
  • Greedy use of basic expected utility formula requires discounting to take into account two important effects:
    • Window of opportunity for deliberation: you have more future time to deliberate on phases that start later.
      • Otherwise, large potential improvements in far-out phases can distract from near-term improvements.
    • Split phase when new plan downloaded during execution: Amount of improvement limited by time remaining in phase.
medium quality comparison summary
“Medium” Quality Comparison Summary
  • Discounted greedy agent beats simple greedy agent 79 times, ties 3, loses 2.
  • Discounted greedy agent averages 86% of optimal expected utility; simple greedy averages 79%.
  • More difficult domains challenge myopic policies, and crush random policy (73% overall). Discounted greedy beats random 83/84 times.
  • Even on easy scenarios, optimal is waaaay too slow!
mission testing
Mission Testing
  • Modified AMP to incorporate deliberation scheduling algorithms.
  • Tested three different agents:
    • S – shortest problem first;
    • U – simple greedy DS;
    • DU – greedy with discounting.
  • Tested in mission with multiple threats and two goals.
mission overview
Mission Overview











demo outcome
Demo Outcome
  • Shortest:
    • Builds all the easy single-threat plans quickly.
    • Survives the entire mission.
    • Waits too long before building plans for goal achievement; fails to hit targets.
  • Utility:
    • Builds safe plans for most threats
    • Gets distracted by high-reward goal in egress phase.
    • Dies in attack phase due to unhandled threat.
  • Discounted utility:
    • Completes entire mission successfully.
expected payoff vs time
Expected Payoff vs. Time

Apparent drop in utility is due to phase update.

Utility chooses badly, tries to plan for egress but ignores threat during attack.

Shortest chooses badly, discards good plans and tries goal plans too late.

demo 2 ingress phase
Demo 2: Ingress Phase
  • All three are attacked but defend selves successfully.
demo 2 attack phase
Demo 2: Attack Phase
  • Utility and Discounted utility hit targets.
  • Utility dies from unhandled threat.
  • Shortest stays safe but does not strike target.
demo 2 second attack phase egress
Demo 2: Second Attack Phase (“Egress”)
  • Only Discounted utility hits second target.
  • Shortest stays safe but does not strike target.
related topics
Related Topics
  • Conventional Deliberation Scheduling Work:
    • Typically this work assumes the object-level computation is based on anytime algorithms.
    • CSM algorithms are not readily converted to anytime. Performance improvements are discrete and all-or-nothing.
    • Because of true parallel Real Time System/AI System, don’t have conventional think/act tradeoffs.
  • Design-to-time: appropriate, but building full schedules versus single action choices. Comparison may be possible.
  • MDP solvers: either infinite horizon or finite horizon with offline policy computation. We have on-line decision making with dynamic MDP.
demo scenario
Demo Scenario
  • Three types of threats (IR, radar, radar2) during ingress, attack, and egress phases.
  • Targets in attack and egress phases.
  • Overall, there are 41 valid different problem configurations that can be sent to the CSM. Some are unsolvable in allocated time.
  • Performance profiles are approximate:
    • Predicted planning times range from 1 to 60 seconds.
    • Some configurations take less than predicted.
    • Some take more, and time out rather than finishing.
  • Mission begins as soon as first plan available (< 1 second).
  • Mission lasts approx 4 minutes.
  • Doing all plans would require 22.3 minutes.