140 likes | 264 Views
Learn about Markov Decision Processes for highway pavement maintenance. Understand repair options, transition matrices, policies, costs, and optimization methods. Explore different policy matrices and how to find the optimal maintenance strategy.
E N D
Math 419/519 Prof. Andrew Ross Markov Decision Processes
Highway Pavement Maintenance • Thanks to Pablo Durango-Cohen for this example. • Though I have made up the numbers. • Classify highway pavement condition as: • Good • Fair • Poor • Can do 4 kinds of repairs: • Expensive • Moderate • Cheap • Nothing
Timeline • April: check condition of road, decide on action. • Summer: repair road as decided • Fall/Winter: road might deteriorate • April: check condition, etc.
Markov Assumptions • How we got to current condition does not matter. • Future deterioration depends only on the present condition and action. • When choosing an action, we will only look at the present condition, not the past • This is a policy decision, not a statement about road physics. We could change this policy, but it would make the problem bigger.
If we do Nothing • Road deteriorates according to this transition matrix: • Do the zeros make sense? • Does it make sense that the probabilities decrease from right to left?
If we do Cheap repairs • Road improves/deteriorates according to this transition matrix:
If we do Moderate repairs • Road improves/deteriorates according to this transition matrix:
If we do Expensive repairs • Road improves/deteriorates according to this transition matrix:
Repair Policy • Natural to say: “If it's in Good condition, do Nothing. If it's in Fair condition, do ___. If it's in Poor condition, do ___.” • Rather than if/then, let's make a Policy Matrix:
Mixed Policies? • Maybe we can't afford to do Expensive repairs each time the road becomes Poor—only 30% of the time? Etc.
“The” transition matrix? • Changes when you change your policy matrix. • Pr(Good next | Fair now) = Pr(Good next | Fair now, do Nothing)*Pr(Nothing|Fair) + Pr(Good next | Fair now, do Cheap)*Pr(Cheap|Fair) + Pr(Good next | Fair now, do Moderate)*Pr(Moderate|Fair) + Pr(Good next | Fair now, do Expensive)*Pr(Expensive|Fair) • And that's just one of 9 entries in the 3x3 matrix!
Overall Cost • Given a policy matrix, find the transition matrix • Then find the steady-state distribution • Then find how often we do each action • Then account for the cost of each action • Then change the policy matrix a little, try to find a cheaper overall cost. • See the book for the math notation.
Other Thoughts • Can find optimal policy through: • “Policy Iteration” • “Value Iteration” • Related to Dynamic Programming
References • Wayne Winston: “Introduction to Operations Research” book • Ronald A. Howard: “Comments on the Origin and Application of Markov Decision Processes” article in journal “Operations Research”, Vol 50 issue 1.