1 / 14

Ch 17. Optimal control theory and the linear Bellman equation HJ Kappen

Ch 17. Optimal control theory and the linear Bellman equation HJ Kappen. BTSM Seminar 12.07.19.(Thu) Summarized by Joon Shik Kim. Introduction. Optimising a sequence of actions to attain some future goal is the general topic of control theory.

kisha
Download Presentation

Ch 17. Optimal control theory and the linear Bellman equation HJ Kappen

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ch 17. Optimal control theory and the linear Bellman equationHJ Kappen BTSM Seminar 12.07.19.(Thu) Summarized by Joon Shik Kim

  2. Introduction • Optimising a sequence of actions to attain some future goal is the general topic of control theory. • In an example of a human throwing a spear to kill an animal, a sequence of actions can be assigned a cost consists of two terms. • The first is a path cost that specifies the energy consumption to contract the muscles. • The second is an end cost that specifies whether the spear will kill animal, just hurt it, or miss it. • The optimal control solution is a sequence of motor commands that results in killing the animal by throwing the spear with minimal physical effort.

  3. Discrete Time Control (1/3) where xtis an n-dimensional vector describing the state of the system and utis an m-dimensional vector that specifies the control or action at time t. • A cost function that assigns a cost to each sequence of controls where R(t,x,u) is the cost associated with taking action u at time t in state x, and Φ(xT)is the cost associated with ending up in state xTat time T.

  4. Discrete Time Control (3/3) • The problem of optimal control is to find the sequence u0:T-1that minimisesC(x0, u0:T-1). • The optimal cost-to-go

  5. Discrete Time Control (1/3) • The algorithm to compute the optimal control, trajectory, and the cost is given by • 1. Initialization: • 2. Backwards: For t=T-1,…,0 and for x compute • 3. Forwards: For t=0,…,T-1 compute

  6. The HJB Equation (1/2) • (Hamilton-Jacobi-Belman equation) • The optimal control at the current x, t is given by • Boundary condition is

  7. The HJB Equation (2/2) Optimal control of mass on a spring

  8. Stochastic Differential Equations (1/2) • Consider the random walk on the line with x0=0. • In a closed form, . • In the continuous time limit we define • The conditional probability distribution (Wiener Process)

  9. Stochastic Optimal Control Theory (2/2) • dξ is a Wiener process with . • Since <dx2> is of order dt, we must make a Taylor expansion up to order dx2. Stochastic Hamilton-Jacobi-Bellman equation : drift : diffusion

  10. Path Integral Control (1/2) • In the problem of linear control and quadratic cost, the nonlinear HJB equation can be transformed into a linear equation by a log transformation of the cost-to-go. HJB becomes

  11. Path Integral Control (2/2) • Let describe a diffusion process for defined Fokker-Planck equation (1)

  12. The Diffusion Process as a Path Integral (1/2) • Let’s look at the first term in the equation 1 in the previous slide. The first term describes a process that kills a sample trajectory with a rate of V(x,t)dt/λ. • Sampling process and Monte Carlo With probability 1-V(x,t)dt/λ, with probability V(x,t)/λ, in this case, path is killed.

  13. The Diffusion Process as a Path Integral (2/2) where ψ is a partition function, J is a free-energy, S is the energy of a path, and λ the temperature.

  14. Discussion • One can extend the path integral control of formalism to multiple agents that jointly solve a task. In this case the agents need to coordinate their actions not only through time, but also among each other to maximise a common reward function. • The path integral method has great potential for application in robotics.

More Related