1 / 24

Planning using dynamic optimization

Planning using dynamic optimization.  Chris Atkeson 2009. Problem characteristics. Want optimal plan, not just feasible plan We will minimize a cost function C(execution). Some examples: C() = c T (x T ) + c(x k ,u k ): deterministic with explicit terminal cost function

jeromez
Download Presentation

Planning using dynamic optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Planning using dynamic optimization  Chris Atkeson 2009

  2. Problem characteristics • Want optimal plan, not just feasible plan • We will minimize a cost function C(execution). Some examples: • C() = cT(xT) + c(xk,uk): deterministic with explicit terminal cost function • C() = E(cT(xT) + c(xk,uk)): stochastic

  3. Dynamic Optimization • General methodology is dynamic programming (DP). • We will talk about ways to apply DP. • Requirement to represent all states, and consider all actions from each state, lead to “curse of dimensionality”: Rxdx •Rudu • We will talk about special purpose solution methods.

  4. Dynamic Optimization Issues • Discrete vs. continuous states and actions? • Discrete vs. continuous time? • Globally optimal? • Stochastic vs. deterministic? • Clocked vs. autonomous? • What should be optimized, anyway?

  5. Policies vs. Trajectories • u(t) open loop trajectory control • u = uff(t) + K(x – xd(t)) closed loop trajectory control • u(x) policy

  6. Types of tasks • Regulator tasks: want to stay at xd • Trajectory tasks: go from A to B in time T, or attain goal set G • Periodic tasks: cyclic behavior such as walking

  7. Typical reward functions • Minimize error • Minimum time • Minimize tradeoff of error and effort

  8. Example: Pendulum Swingup • State: • Action: • Cost:

  9. Possible Trajectories

  10. Global Planning Using Dynamic Programming

  11. The Policy

  12. Trajectories

  13. Value Function

  14. Value Function: Bird’s Eye View

  15. Position Velocity Torque

  16. Policy

  17. Policy: Bird’s Eye View

  18. Discrete-Time Deterministic Dynamic Programming (DP) • Fudging on whether states are discrete or continuous.

  19. How to do Dynamic Programming (specified end time T) • Dynamics: xk+1 = f(xk,uk) • Cost: C() = cT(xT) + c(xk,uk) • Value function Vk(x) is represented by table. • VT(x) = cT(x) • For each x, Vk(x) = minu(c(x,u) + Vk+1(f(x,u))) • This is the Bellman Equation • This version of DP is value iteration • Can also tabulate policy: u = k(x)

  20. How to do Dynamic Programming (no specified end time) • Cost: C() = c(xk,uk) • VN(x) = a guess, or all zeros. • Apply the Bellman equation. • V(x) is given by Vk(x) when V stops changing. • Goal needs to have zero cost, or need to discount so V() does not grow to infinity: • Vk(x) = minu(c(x,u) + Vk+1(f(x,u))),  < 1

  21. Policy Iteration • u = (x): general policy (a table in discrete state case). • *) Compute V(x): Vk(x) = c(x,(x)) + Vk+1(f(x,(x))) • Update policy (x) = argminu(c(x,u) + V(f(x,u))) • Goto *)

  22. Stochastic Dynamic Programming • Cost: C() = E(c(xk,uk)) • The Bellman equation now involves expectations: • Vk(x) = minuE(c(x,u) + Vk+1(f(x,u))) = minu(c(x,u) + p(xk+1)Vk+1(xk+1)) • Modified Bellman equation applies to value and policy iteration. • May need to add discount factor.

  23. Continuous State DP • Time is still discrete. • How do we discretize the states?

  24. How to handle continuous states. • Discretize states on a grid. • At each point (x0), generate trajectory segment of length N by minimizing C(u) = c(xk,uk) + V(xN) • V(xN): interpolate using surrounding V() • Typically multilinear interpolation used. • N typically determined by when V(xN) independent of V(x0) • Use favorite continuous function optimizer to search for best u when minimizing C(u) • Update V() at that cell.

More Related