1 / 32

Dynamic Programming Applications

Dynamic Programming Applications. INSEAD Ph.D. Programme May-June 2003. What are we doing here ?. Learning to make long-term decisions …. Today. Introduction Deterministic problems Shortest paths Principle of optimality Deterministic finite state systems & SP

chace
Download Presentation

Dynamic Programming Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Programming Applications INSEAD Ph.D. Programme May-June 2003 DPA1

  2. What are we doing here ? Learning to make long-term decisions … DPA1

  3. Today • Introduction • Deterministic problems • Shortest paths • Principle of optimality • Deterministic finite state systems & SP • DP vs IP: Knapsack & more .. DPA1

  4. In context • Optimization class: LP/NLP/IP/Networks: static decisions, nothing random or hidden • Contrast: DP = optimize “over time” (*) • Main features: • dynamical systems • stochastic evolution DPA1

  5. Action Action Present state Next state Cost Cost system Sequential decision making model & ingredients DPA1

  6. Planning ahead Present decisions affect future events by: • making certain opportunities available • precluding others • altering costs of still others Trade-off: low cost now vs. high costs in future DP: techniques for making interrelated sequential decisions DPA1

  7. Objective Reflects the decision makers inter-temporal tradeoffs. minimize/maximize • total expected return • total discounted return • average reward per stage • worst case expected return • expected utility • preference ordering • multi-objective (e.g. mean-variance) DPA1

  8. Tools • Decision rule: specifies action to be taken at particular time • Policy: sequence of decision rules; prescription for taking actions in the future. • Optimal policy: policy that optimizes objective. DPA1

  9. Questions • When does an optimal policy exist ? • When does it have a particular form ? • How do we determine or compute efficiently an optimal policy ? • Can we obtain an almost optimal policy, and how good is this? DPA1

  10. Problem Types • finite vs infinite state set • finite vs infinite horizon (epoch set) (L.6-7) • discrete vs continuous time (L.9) • deterministic (Lec.1) vs stochastic system DPA1

  11. Early history • 17th century calculus of variations • Cayley 1875 • Wald: sequential statistical problems 1947 • Pierre Massé 1946: water resource mgt. • RAND Corp., Ca: 1949-1953 • Books: Bellman 1957, Howard 1960 DPA1

  12. B E H A C F J I D G The Stagecoach story • some 150 years ago there was a salesman travelling west by stagecoach .. DPA1

  13. 7 1 2 4 4 6 3 B 3 E 6 4 2 H 3 4 4 A C F J 4 3 3 I 1 3 D G 5 Insurance Costs DPA1

  14. The Stagecoach Cont’d • Greedy: A-B-F-I-J costs 13$ • But.. A-D-F = 4 < A-B-F = 6 A-D = 3 > A-B = 2 • not to be greedy pays off! • Trial and error ~ exhaustive enumeration – takes forever! • Idea: Work backwards! DPA1

  15. The Stagecoach Solution • F(X)= min cost from X to J (“cost-to-go”) • F(J)=0 • F(H)=3, F(I)=4 • F(G)=6, F(F)=7, F(E)= 4 • F(D)= 8, F(C )=7, F(B)= 11 • F(A)=11 on A-D-F-I-J (not unique!) DPA1

  16. ut xt xt+1 gt(xt,ut) t t+1 xt+1= f(xt, ut,,t) = ft(xt, ut), t = 0,1,..N-1 state, control, time horizon Deterministic Dynamical System DDS: the state at the next stage is completely determined by state and decision at current stage. DPA1

  17. N-1 t=0 DDP Ingredients • Deterministic dynamic system described by state xt St ( St = state space at time t ). • Control/action to be selected at time t: utUt(xt). (Ut(xt) = action set at time t in state xt). • Dynamics (plant equation): xt+1= ft(xt, ut), t= 0,1,..N-1 • Total cost function: additive over time gN(xN) + S gt(xt,ut) where gt(xt,ut) = cost of decision ut DPA1

  18. Policies • Rule for choosing the value of control variables under all possible circumstances, as a function of perceived circumstances (= strategy, control law) • Actions are taken in real time, whereas a policy is to be formulated in advance. • Closed-loop (or feedback): ut = u(xt, t) sequential decisions depend on the current state • Open-loop control: ut = u*(x0, t) all decisions are made at time t=0 (actions are determined by the clock, as opp. to current state) DPA1

  19. Principle of Optimality • Given the current state, an optimal policy for the remaining stages is independent of the policy adopted in previous stages. • From any point on an optimal trajectory, the remaining trajectory is optimal for the corresp. subproblem initiated at that point. • Action: select a decision to minimize the sum of cost incurred at current stage and least total cost that could be incurred from all subsequent stages, consequent on present decision. DPA1

  20. Bellman’s principle • Jt(xt) = optimal cost starting in state xt at stage t. • Bellman’s principle of optimality: JN(xN) = gN(xN) Jt(xt) = min { gt(xt,ut) + Jt+1(ft (xt,ut)) } “cost-to-go” • Optimal expected cost for overall problem: J0(x0) utUt(xt) DPA1

  21. Deterministic finite state systems • The state space St is finite for each t. • DFS system can be represented by a ‘levelled’ graph of stages, or decision tree. • DFS problem  Shortest Path* problem: DPA1

  22. g1(x1,u1) x1 u1 Artificial terminal node Initial state s t Terminal arcs w/cost = terminal cost gN(xN) Stage 0 Stage 1 Stage 2 Stage N DFS  SP x2 DPA1

  23. SP  DFS • Assume no negative cost cycles ! • So optimal path takes at most N ‘steps’ (allow degenerate steps i i at cost aii=0) • Jk(i) = min cost from i to t in N-k moves • Jk(i) = minj { aij + Jk+1(j)} , k=0,1,..N-2 • JN-1(i) = ait , i=1,2,..N • J0(i) = cost of optimal path from i to t DPA1

  24. Deterministic Applications • Knapsack • Project planning (critical path analysis) DPA1

  25. Knapsack • Squeeze most value in bag of capacity K with objects j=1,…,n of value vj and weight wj. • Vi(w) = max value using some of the first i items and total weight allowed  w • Bellman’s equation: Vi+1(w) = max{Vi(w), Vi(w-wi+1) + vi+1} • Boundary conditions ? State set ? • Complexity ? • Interactive applet: http://memento.ieor.berkeley.edu/~jun/knapsack/ DPA1

  26. Project Planning and Critical Path Analysis • Project: K activities of known durations • Some need to be completed before others • Find min completion time & critical activities • nodes = completion of some project phase node 1 = start ; node N = end of project • arc (i, j) =activity that starts once phase i is completed and has duration tij • Acyclic network with all nodes reachable from 1 DPA1

  27. Critical Path Analysis • Path 1  i: p ={ (1,j1), (j1,j2),..,(jk,i)} • Duration: Dp= t(1,j1) + t(j1,j2) + .. + t(jk,i) • Completion of phase i: Ti = max{Dp| paths p: 1 i} • Longest path problem  SP(G, -tij) shortest path for graph with negative arc lengths DPA1

  28. Critical Path Analysis • Let Sk={i| all paths: 1 i have  k arcs}, S0={1} (nodes reachable in  k steps from node 1 ) • Threshold property:  k* s.t. Sk=S for all k  k*, else SkS. • Shortest path algorithm: Ti = max { tij + Tj}, for all iSk, iSk-1 • Forward DP algorithm (j,i), jSk-1 DPA1

  29. 3 order transport end start 2 2 construction 1 1 5 training 4 2 3 4 2 hire training Critical Path Analysis S0={1}, S1={1,2}, S2={1,2,3}, S3={1,2,3,4}, S4={1,2,3,4,5} Completion times: T1=0, T2=3, T3=4, T4=6, T5=10 Critical path: 1  2 3  4 5 DPA1

  30. Fun Applets • GIDEN network animation algorithms http://www.iems.nwu.edu/~giden/download/ • Shortest paths applet http://www.princeton.edu/~rvdb/JAVA/CIV201/shortpaths/shortpaths.html • Knapsack http://memento.ieor.berkeley.edu/~jun/knapsack/ DPA1

  31. Which is the most crucial step ? Guidelines for DP algorithms • View solution as a sequence of decisions occurring in stages and incurring additive costs • Define state as a summary of all relevant past decisions • Determine which state transitions are possible and identify their corresponding costs. • Write a recursion on the optimal cost from the origin state to a destination state DPA1

  32. To Remember • The optimal ut is only a function of state xt & time t • The DP equation expresses the optimal ut in closed loop form. It is optimal whatever the past control policy may have been. • The DP equation is backward induction in time; always the later policy is decided first. • Determ. FS. DP  Shortest Path w/o neg. cycles Read Bkas Chapter 2 DPA1

More Related