efficient solution algorithms for factored mdps
Download
Skip this Video
Download Presentation
Efficient Solution Algorithms for Factored MDPs

Loading in 2 Seconds...

play fullscreen
1 / 22

Efficient Solution Algorithms for Factored MDPs - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

Efficient Solution Algorithms for Factored MDPs . by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman. Presented by Arkady Epshteyn. Problem with MDPs. Exponential number of states Example: Sysadmin Problem 4 computers: M 1 , M 2 , M 3 , M 4

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Efficient Solution Algorithms for Factored MDPs' - elvis


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
efficient solution algorithms for factored mdps

Efficient Solution Algorithms for Factored MDPs

by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman

Presented by Arkady Epshteyn

problem with mdps
Problem with MDPs
  • Exponential number of states
  • Example: Sysadmin Problem
  • 4 computers: M1, M2 , M3 , M4
  • Each machine is working or has failed.
  • State space: 24
  • 8 actions: whether to reboot each machine or not
  • Reward: depends on the number of working machines
factored representation
Factored Representation
  • Transition model: DBN
  • Reward model:
approximate value function
Approximate Value Function
  • Linear value function:
  • Basis functions:

hi(Xi=true)=1

hi(Xi=false)=0

h0=1

markov decision processes
Markov Decision Processes

For fixed policy :

The optimal value function V*:

solving mdp method 1 policy iteration
Solving MDPMethod 1: Policy Iteration
  • Value determination
  • Policy Improvement
  • Polynomial in the number of states N
  • Exponential in the number of variables K
solving mdp method 2 linear programming
Solving MDPMethod 2: Linear Programming
  • Intuition: compare with the fixed point of V(x):
  • Polynomial in the number of states N
  • Exponential in the number of variables
objective function
Objective function
  • Objective function polynomial in the number of basis functions
restricted domain
Restricted Domain

1

2

3

  • Backprojection - depends on few variables
  • Basis function
  • Reward function
variable elimination
Variable Elimination

- similar to Bayesian Networks

maximization as linear constraints
Maximization as Linear Constraints
  • Exponential in the size of each function’s
  • domain, not the number of states
approximate value function1
Approximate Value Function

x1

h1:

x3

0

5

0.6

Notice: compact representation (2/4 variables, 3/16 rules)

summing over rules
Summing Over Rules

x2

h1(x)

h2(x)

x1

x1

x2

x1

=

+

u1+u4

x3

u5+u1

x3

x1

x3

u4

u1

u3+u4

u2+u4

u5

u6

u2

u3

u2+u6

u3+u6

multiplying over rules
Multiplying over Rules
  • Analogous construction
rule based maximization
Rule-based Maximization

x1

x1

Eliminate x2

x3

x2

u1

u1

u2

x3

max(u2,u3)

max(u2,u4)

u3

u4

rule based linear program
Rule-based Linear Program
  • Backprojection, objective function – handled in a similar way
  • All the operations (summation, multiplication, maximization) – keep rule representation intact
  • is a linear function
conclusions
Conclusions
  • Compact representation can be exploited to solve MDPs with exponentially many states efficiently.
  • Still NP-complete in the worst case.
  • Factored solution may increase the size of LP when the number of states is small (but it scales better).
  • Success depends on the choice of the basis functions for value approximation and the factored decomposition of rewards and transition probabilities.
ad