1 / 36

Stochastic Dynamic Programming with Factored Representations

Stochastic Dynamic Programming with Factored Representations. Presentation by Dafna Shahaf (Boutilier, Dearden, Goldszmidt 2000). The Problem. Standard MDP algorithms require explicit state space enumeration Curse of dimensionality Need: Compact Representation (intuition: STRIPS)

seth
Download Presentation

Stochastic Dynamic Programming with Factored Representations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stochastic Dynamic Programming with Factored Representations Presentation by Dafna Shahaf(Boutilier, Dearden, Goldszmidt 2000)

  2. The Problem • Standard MDP algorithms require explicit state space enumeration • Curse of dimensionality • Need: Compact Representation(intuition: STRIPS) • Need: versions of standard dynamic programming algorithms for it

  3. A Glimpse of the Future Policy Tree Value Tree

  4. A Glimpse of the Future: Some Experimental Results

  5. Roadmap • MDPs- Reminder • Structured Representation for MDPs: Bayesian Nets, Decision Trees • Algorithms for Structured Representation • Experimental Results • Extensions

  6. MDPs- Reminder • (states, actions, transitions, rewards) • Discounted infinite-horizon • Stationary Policies (an action to take at state s) • Value functions: is k-stage-to-go value function for π

  7. Roadmap • MDPs- Reminder • Structured Representation for MDPs: Bayesian Nets, Decision Trees • Algorithms for Structured Representation • Experimental Results • Extensions

  8. O: Robot is in office W: Robot is wet U: Has umbrella R: It is raining HCR: Robot has coffee HCO: Owner has coffee Go: Switch location BuyC: Buy coffee DelC: Deliver coffee GetU: Get umbrella Representing MDPs as Bayesian Networks: Coffee world The effect of the actions might be noisy.Need to provide a distribution for each effect.

  9. Representing Actions: DelC 00.300

  10. Representing Actions: Interesting Points • No need to provide marginal distribution over pre-action variables • Markov Property: we need only the previous state • For now, no synchronic arcs • Frame Problem? • Single Network vs. a network for each action • Why Decision Trees?

  11. Representing Reward Generally determined by a subset of features.

  12. Policies and Value Functions Policy Tree Value Tree Features HCR=T HCR=F Actions Values The optimal choice may depend only on certain variables (given some others).

  13. Roadmap • MDPs- Reminder • Structured Representation for MDPs: Bayesian Nets, Decision Trees • Algorithms for Structured Representation • Experimental Results • Extensions

  14. Value Iteration- Reminder • Bellman Backup • Q-Function: The value of performing a in s, given value function v

  15. Structured Value Iteration- Overview Input: Tree( ). Output: Tree( ). 1. Set Tree()= Tree( ) 2. Repeat (a) Compute Tree( )= Regress(Tree( ),a) for each action a (b) Merge (via maximization) trees Tree( ) to obtain Tree( ) Until termination criterion. Return Tree( ).

  16. Example World

  17. Step 2a: Calculating Q-Functions 2. DiscountingFutureValue 3. AddingImmediateReward 1. Expected FutureValue How to use the structure of the trees? Tree( ) should distinguish only conditions under which a makes a branch of Tree(V) true with different odds.

  18. Calculating : 1*10+0*0 Z: Z: Z: Tree(V0) PTree( ) FVTree( ) Tree( ) Undiscounted Expected Future Value for performing action a with one-stage-to-go. Finding conditions under which a will have distinct expected value, with respect to V0 Discounting FVTree (by 0.9), and adding the immediate reward function.

  19. An Alternative View:

  20. (a more complicated example) Tree(V1) PartialPTree( ) Unsimplified PTree( ) PTree( ) FVTree( ) Tree( )

  21. The Algorithm: Regress Input: Tree(V), action a. Output: Tree( ) 1. PTree( )= PRegress(Tree(V),a) (simplified)

  22. The Algorithm: Regress Input: Tree(V), action a. Output: Tree( ) 1. PTree( )= PRegress(Tree(V),a) (simplified) 2. Construct FVTree( ): for each branch b of PTree, with leaf node l(b) (a) Prb =the product of individual distr. from l(b) (b) (c) Re-label leaf l(b) with vb.

  23. The Algorithm: Regress Input: Tree(V), action a. Output: Tree( ) 1. PTree( )= PRegress(Tree(V),a) (simplified) 2. Construct FVTree( ): for each branch b of PTree, with leaf node l(b) (a) Prb =the product of individual distr. from l(b) (b) (c) Re-label leaf l(b) with vb. 3. Discount FVTree( ) with , append Tree(R) 4. Return FVTree( )

  24. The Algorithm: PRegress Input: Tree(V), action a. Output: PTree( ) 1. If Tree(V) is a single node, return emptyTree 2. X = the variable at the root of Tree(V) = the tree for CPT(X) (label leaves with X)

  25. The Algorithm: PRegress Input: Tree(V), action a. Output: PTree( ) 1. If Tree(V) is a single node, return emptyTree 2. X = the variable at the root of Tree(V) = the tree for CPT(X) (label leaves with X) 3. = the subtrees of Tree(V) for X=t, X=f 4. = call PRegress on

  26. The Algorithm: PRegress Input: Tree(V), action a. Output: PTree( ) 1. If Tree(V) is a single node, return emptyTree 2. X = the variable at the root of Tree(V) = the tree for CPT(X) (label leaves with X) 3. = the subtrees of Tree(V) for X=t, X=f 4. = call PRegress on 5. For each leaf l in , add or both (according to distribution. Use union to combine labels) 6. Return

  27. Step 2b. Maximization Value Iteration Complete.

  28. Roadmap • MDPs- Reminder • Structured Representation for MDPs: Bayesian Nets, Decision Trees • Algorithms for Structured Representation • Experimental Results • Extensions

  29. Experimental Results WorstCase: BestCase:

  30. Roadmap • MDPs- Reminder • Structured Representation for MDPs: Bayesian Nets, Decision Trees • Algorithms for Structured Representation • Experimental Results • Extensions

  31. Extensions • Synchronic edges • POMDPs • Rewards • Approximation

  32. Questions?

  33. Backup slides • Here be dragons.

  34. Regression through a Policy

  35. Improving Policies: Example

  36. Maximization Step, Improved Policy

More Related