### Efficient Solution Algorithms for Factored MDPs

by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman

Presented by Arkady Epshteyn

Problem with MDPs

- Exponential number of states
- Example: Sysadmin Problem
- 4 computers: M1, M2 , M3 , M4
- Each machine is working or has failed.
- State space: 24
- 8 actions: whether to reboot each machine or not
- Reward: depends on the number of working machines

Factored Representation

- Transition model: DBN
- Reward model:

Solving MDPMethod 1: Policy Iteration

- Value determination
- Policy Improvement

- Polynomial in the number of states N
- Exponential in the number of variables K

Solving MDPMethod 2: Linear Programming

- Intuition: compare with the fixed point of V(x):

- Polynomial in the number of states N
- Exponential in the number of variables

Objective function

- Objective function polynomial in the number of basis functions

Variable Elimination

- similar to Bayesian Networks

Maximization as Linear Constraints

- Exponential in the size of each function’s
- domain, not the number of states

Multiplying over Rules

- Analogous construction

Rule-based Linear Program

- Backprojection, objective function – handled in a similar way
- All the operations (summation, multiplication, maximization) – keep rule representation intact
- is a linear function

Conclusions

- Compact representation can be exploited to solve MDPs with exponentially many states efficiently.
- Still NP-complete in the worst case.
- Factored solution may increase the size of LP when the number of states is small (but it scales better).
- Success depends on the choice of the basis functions for value approximation and the factored decomposition of rewards and transition probabilities.

