1 / 17

OR II GSLM 52800

OR II GSLM 52800. Policy and Action. policy the rules to specify what to do for all states action what to do at a state as dictated by the policy examples policy: replacement only at state 3 do nothing at states 0, 1, and 2, replacing at state 3

analu
Download Presentation

OR II GSLM 52800

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OR IIGSLM 52800 1

  2. 2

  3. Policy and Action • policy • the rules to specify what to do for all states • action • what to do at a state as dictated by the policy • examples • policy: replacement only at state 3 • do nothing at states 0, 1, and 2, replacing at state 3 • policy: overhaul at state 2 and replacement at state 3 • do nothing at state 0 and 1, overhaul at state 2, and replace at state 3 3

  4. Expected Reward • pij(k) = the probability of changing from state i to state j when action k is taken • qij(k) = expected cost at state i when action k is taken and the state changes to j • Cik= the expected cost at state i with action k j i pij(k) 4

  5. Definition of Variables • policy R • g(R) = the long-term average cost per unit time of policy R • objective: finding the policy that minimizes g • . • . • vi(R) = the effect on the total expected cost when adopting policy R and starting at state i 5

  6. Relationship Between & Claim: The intuitive idea is exact 6

  7. Key Result in Policy Improvement • M+1equations, M+2 unknowns • g(R) = the long-term average cost of policy R • vi(R) = the effect on the total expected cost when adopting policy R and starting at state i 7

  8. Idea of Policy Improvement • the collection of vi(R) does not change by adding a constant • vi(R) = vi+c • the set of equations can be solved by arbitrarily setting vM(R) = 0 8

  9. Idea of Policy Improvement • given policy R with action k, suppose that there exists policy Ro with action ko such that • then it can be shown that g(Ro) < g(R) 9

  10. Policy Improvement • 1 Value Determination: Fix policy R. Set vM(R) to 0 and solve • 2 Policy Improvement: For each state i, find action k as argument minimum of • 3 Form a new policy from actions in 2. Stop if this policy is the same as R; else go to 1 10

  11. Idea of Policy Improvement • it can be proven that • g is non-increasing • R is minimum if there is no change in policy • the algorithm stops after finite number of iterations 11

  12. Example • Policy: Replacement only at state 3 • transition probability matrix • C11 = 0, C21 = 1000, C31 = 3000, C33 = 6000 12

  13. Example • Iteration 1: • Value Determination 13

  14. Example • Iteration 1: • Policy Improvement • nothing can be done at state 0 and machine must be replaced at state 3 • possible decisions at • state 1: decision 1 (do nothing, $1000) decision 3 (replace, $6000) • state 2: decision 1 (do nothing, $3000) decision 2 (overhaul, $4000) decision 3 (replace, $6000) 14

  15. Example • Iteration 1: • Policy Improvement : the general expressions 15

  16. Example new policy: do nothing at states 0 and 1, overhaul at state 2, and replace at state 3 • Iteration 1: • Policy Improvement 16

  17. Example • Iteration 2: • Value Determination It can be shown that there is no improvement in policy so that doing nothing at states 0 and 1, overhauling at state 2, and replacing at state 3 is an optimum policy 17

More Related