1 / 45

Scheduling Policy Design using Stochastic Dynamic Programming

Scheduling Policy Design using Stochastic Dynamic Programming. Robert Glaubius Dissertation Defense 2009, November 13. Sensing on a Mobile Robot. Camera for mission objectives ( e . g ., finding people). Laser range-finder for obstacle detection. Sensing on a Mobile Robot.

jalila
Download Presentation

Scheduling Policy Design using Stochastic Dynamic Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scheduling Policy Design using Stochastic Dynamic Programming Robert Glaubius Dissertation Defense 2009, November 13

  2. Sensing on a Mobile Robot • Camera for mission objectives (e.g., finding people). • Laser range-finder for obstacle detection. R. Glaubius

  3. Sensing on a Mobile Robot • Some common obstacles may escape laser detection. R. Glaubius

  4. Sensing on a Mobile Robot • Use camera to supplement obstacle detection. R. Glaubius

  5. Resource Contention 67% • Now we have two tasks that need the camera. • We need a rational policy for allocating the camera to each task. • Assign each task a resource time share. 33% R. Glaubius

  6. Task Scheduling Model • Multiple, repeating tasks use a mutually-exclusive shared resource. • Each task has a utilization target specifying its share, u=(u1,…,un). • Each task instance has stochastic duration. • Tasks may not be preempted. R. Glaubius

  7. The Main Contribution • Scheduling policy design techniques for non-preemptive, non-deterministic systems that are • Share Aware • Scalable • Adaptive R. Glaubius

  8. Share Aware Scheduling • System state: cumulative resource usage of each task. • Dispatching a task moves the system stochastically through the state space according to that task’s duration. (8,17) R. Glaubius

  9. Share Aware Scheduling u • Utilization target induces a ray{u:0} through the state space. • Encode “goodness” relative to the share as a cost. • Require that costs grow with distance from utilization ray. u=(1/3,2/3) R. Glaubius

  10. Task Scheduling MDP • States are the cumulative resource utilization of each task. • Actions correspond to dispatching a task. • Transitions dictated by task duration distributions. • Costs grow with deviation from the share target. • Goal: find a policy that minimizes long-term cost. R. Glaubius

  11. Transition Structure • Transitions are state-independent • Relative distribution over successor states is the same in each state. R. Glaubius

  12. Cost Structure • State equivalence under costs: • States along lines parallel to the utilization ray have equal cost R. Glaubius

  13. Equivalence Classes • Transition and cost structure induces state equivalence. • Equivalent states have the same optimal long-term cost and policy! R. Glaubius

  14. Periodicity • Periodic structure allows us to remove all but one exemplar from each equivalence class. R. Glaubius

  15. Wrapped state model • Remove all but one exemplar from each equivalence class. • Actions and costs remain unchanged. • Remap transitions to removed states to the corresponding exemplar. (0,0) R. Glaubius

  16. c(x)= c(x)= Bounded state model • Inexpensive states are near the utilization target. • Good policies should keep costs small. • Can truncate the state space by bounding costs. R. Glaubius

  17. Bounded state model • Mapping “dangling” transitions to a high-cost absorbing state guarantees that we find bounded cost policies when they exist. • Bounded costs guarantee bounded deviation from the resource share. R. Glaubius

  18. Scheduling Policy Design • Iteratively increase bounds and resolve problem. • As bounds increase, the bounded model solution converges to the optimal wrapped model policy. R. Glaubius

  19. Automating Model Discovery • ESPI: Expanding State Policy Iteration • Start with a policy that only reaches finitely many states from (0,…,0). • E.g., always run most underutilized task. • Enumerate enough states to evaluate and improve that policy. • If policy can not be improved, stop. • Otherwise, goto (2) with improved policy. R. Glaubius

  20. Policy Evaluation Envelope • Enumerate states reachable from the initial state. • Breadth-first state space exploration under the current policy, starting from the initial state. (0,0) R. Glaubius

  21. Policy Improvement Envelope • Consider alternative actions. • Close under the current policy using breadth-first expansion. • Evaluate and improve the policy within this envelope. R. Glaubius

  22. ESPI Termination • As long as the initial policy has finite closure, each ESPI iteration terminates. • Satisfied by the policy that always runs the most underutilized task. • Policy strictly improves at each iteration. • Empirically, ESPI terminates on task scheduling MDPs. R. Glaubius

  23. Comparing Design Methods • Policy performance normalized and centered on ESPI solution. • Larger bounded state models yield ESPI solution. R. Glaubius

  24. Share Aware Scheduling • MDP representation allows consistent approximation of the optimal scheduling policy. • Empirically, bounded model and ESPI solutions appear optimal. • Approach scales exponentially in the number of tasks. R. Glaubius

  25. Addressing the Curse of Dimensionality • Focus attention on a restricted class of appropriate scheduling policies. • How do we choose and parameterize these policies? R. Glaubius

  26. Two-task MDP Policy • Scheduling policies induce a partition on the state space with boundary parallel to the share target. • Establish a decision offset to identify the partition boundary. • Sufficient in 2-d, but what about higher dimensions? R. Glaubius

  27. Time Horizons Ht={x : x1+x2+…+xn=t} u (0,0,2) u (0,2,0) H0 H1 (0,0) (2,0,0) H0 H1 H2 H3 H4 H2 R. Glaubius

  28. Three-task MDP Policy t =10 t =20 t =30 • Action partitions meet along a decision ray that is parallel to the utilization ray. • Action partitions are roughly cone-shaped. R. Glaubius

  29. x Parameterizing the Partition • Specify a decision offset at the intersection of partitions. • Anchor action vectors at the decision offset to approximate partitions. • The conic policy selects action vector best aligned with the displacement between the query state and the decision offset. a2 a1 a3 R. Glaubius

  30. Decision offset d Action vectors a1,a2,…,an Sufficient to partition each time horizon into nregions. Tune policies through local search. Conic Policy Parameterization R. Glaubius

  31. Four Tasks R. Glaubius

  32. Ten Tasks R. Glaubius

  33. Varying Numbers of Tasks R. Glaubius

  34. Addressing the Curse of Dimensionality • Conic policy approximates the geometry of those found using ESPI. • Number of parameters grows just quadratically with the number of tasks. • Contains cost-bounded, stable policies. • Performance is competitive with ESPI. • Improves on heuristic policies. R. Glaubius

  35. Conclusions • We have addressed a novel set of scheduling concerns present in many cyber-physical systems • Non-preemptive resource semantics. • Stochastic task execution times. • Enforce a user-selected resource share. • Our model-based solution methods provide strong approximations to optimal policies. • Our conic policies allow us to scale model-based techniques to larger problems. R. Glaubius

  36. Further Contributions • Adaptive scheduling: online learning • Sample complexity of learning is similar to optimal control of single-state MDPs. • The domain enforces rational exploration without explicit exploration mechanisms. • Formal Guarantees: • Existence of optimal scheduling policies. • Periodicity of optimal scheduling policies. • Existence of cost-bounded policies. • Existence of stable conic policies. R. Glaubius

  37. Publications • R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Policy Design for Autonomic Systems”, International Journal on Autonomous and Adaptive Communications Systems, 2(3):276-296, 2009. • R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Design and Verification for Open Soft Real-Time Systems”, RTSS 2008. • R. Glaubius, T. Tidwell, B. Sidoti, D. Pilla, J. Meden, C. Gill, and W.D. Smart, “Scalable Scheduling Policy Design for Open Soft Real-Time Systems”, Tech. Report WUCSE-2009-71, 2009 (Under Review for RTAS 2010) • R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Design with Unknown Execution Time Distributions or Modes”. Tech. Report WUCSE-2009-15, 2009. • T. Tidwell, R. Glaubius, C. Gill, and W.D. Smart, “Scheduling for Reliable Execution in Autonomic Systems”, ATC 2008. • C. Gill, W.D. Smart, T. Tidwell, and R. Glaubius, “Scheduling as a Learned Art”, OSPERT, 2008. R. Glaubius

  38. Acknowledgements • Bill Smart • Chris Gill • Terry Tidwell • David Pilla, Braden Sidoti, and Justin Meden R. Glaubius

  39. Questions? ? R. Glaubius

  40. Comparison to Real-Time Scheduling • Earliest-Deadline-First (EDF) scheduling: • Enforces timeliness by meeting task deadlines. • Not share aware. • We introduce deadlines as a function of worst-case execution time. • Miss rate is a function of deadline tightness. R. Glaubius

  41. Varying Temporal Resolution R. Glaubius

  42. Stable Conic Policies (0,0,t) • Guaranteed that stable conic policies exist. • For example, set each action vector to point opposite its corresponding vertex. • Induces a vector field that stochastically orbits the decision ray. (t,0,0) (0,t,0) R. Glaubius

  43. Stable Conic Policies (0,0,t) • Guaranteed that stable conic policies exist. • For example, set each action vector to point opposite its corresponding vertex. • Induces a vector field that stochastically orbits the decision ray. (t,0,0) (0,t,0) R. Glaubius

  44. More Tasks  Higher Cost • Simple problem: Fair-share scheduling of n deterministic tasks with unit duration. • Trajectories under round robin scheduling: • 2 tasks: E{c(x)} = 1/2. • Trajectory: (0,0)(1,0)(1,1)(0,0) • Costs: c(0,0)=0; c(1,0)=1. • 3 tasks: E{c(x)} = 8/9. • Trajectory: (0,0,0)(1,0,0)(1,1,0)(1,1,1)(0,0,0) • Costs:c(0,0,0)=0; c(1,0,0)=4/3; c(1,1,0)=4/3 • n tasks: E{c(x)} = (n+1)(n-1)/(3n) R. Glaubius

  45. Share Complexity R. Glaubius

More Related