Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

1 / 24

# Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach - PowerPoint PPT Presentation

Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach. Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State University. Markov Decision Processes. MDP M : R : Policy Seek optimal policy:. Environment. Agent. Environment M1.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
1. Transfer Learning in Sequential Decision Problems:A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State University

2. Markov Decision Processes • MDP • M : • R : • Policy • Seek optimal policy: Environment Agent

3. Environment M1 Environment M2 Environment Mn Multi Task Reinforcement Learning (MTRL) • Given: A sequence of Markov Decision Processes drawn from an unknown distribution D. • Goal: Leverage past experience to improve performance on new MDPs drawn from D.

4. MTRL Problem • Tasks have hierarchical relationships. • Set of classes (unknown to the agent). • Natural means of transfer (class discovery).

5. Hierarchical Bayesian Modeling • Foundation: • Dirichlet Process Models • Unknown number of classes. • Discover hierarchical structure. • Explicit formulation of Uncertainty • Adapt machinery to the RL setting. • Well justified transfer for RL problems.

6. Compute Posterior Select Best Hierarchy Select Actions (Bayesian RL) Basic Hierarchical Transfer Process Process Inference

7. Hierarchical Bayesian Transfer for RL • Model-Based Multi-Task RL • Prior model for domain models. • Action selection: • Thompson sampling • Planning • Policy-Based Multi-Task RL • Prior for policy parameters. • Action selection: • Bayesian Policy Search algorithm.

8. Model-Based MTRL • Explicitly Model the Generative Process D • Hierarchy represents classes of MDPs. Class Prior Estimate D

9. Compute Posterior Plan Action Selection: Exploit estimate of D • Exploit the refined prior (class information). • Sample the MDPs using Thompson Sampling. • Plan with the sampled model (Value Iteration).

10. Domain 1 • State is a bit vector: • True reward function: • Set of 20 test maps. State

11. Domain 1 16 previous tasks No Transfer

12. Policy-Based MTRL • Policy prior. • Infer policy components. • Hierarchy represents reusable policy components. Class Prior Estimate H

13. Consider Wargus RTS • Multiple Unit types. • Units fulfill tactical roles. • Roles are useful in multiple maps. • Simple->hard instances • Hierarchical policy prior. • Facilitate reuse of roles.

14. Role Based Policies Set of Roles. Vectors of policy parameters. Who to attack. Set of role assignments. A strategy for assigning agents to roles. Assignment depends on state features. Executing role-based policy 1. Make the assignment 2. Each agent selects action

15. Transfer of Role-Based Policies • Bayesian Policy Search • Learns • Individual Role parameters. • Role assignment function. • Assignments of agents to roles. • Sample role-based policies • Construct an artificial distribution [Hoffman et. al. NIPS 2007, Muller Bayes Stats.1999] • Search using stochastic simulation • Model free. Bayesian Policy Search

16. Experiments • Tactical battles in Wargus • Transfer given expert examples. • Learning without expert examples.

17. Transfer from expert play.

18. Transfer from self play Use BPS on Training Map 1. Transfer to new map.

19. Conclusion • Hierarchical Bayesian Modeling for RL Transfer • Model-Based MTRL • Learn classes of domain models. • Transfer: Improved priors for model-based Bayesian RL. • Policy-Based MTRL • Learn re-usable policies. • Transfer: Recombine learned policy components in new tasks. • Solved tactical games in Wargus

20. Thank You

21. Outline • Multi-Task Reinforcement Learning (RL). • Markov Decision Processes. • Multi-task RL setting • Policy-Based Multi-task RL • Discover classes of policy components. • Bayesian Policy Search Algorithm. • Conclusion

22. Policy-Based MTRL • Observed property: • Bags of trajectories. • Transfer: • Classes of policy components • Means of exploiting transferred information: • Recombine existing components in new tasks. • Consequence: • Components reused to learn hard tasks.

23. Outline • Markov Decision Processes • Bayesian Model Based Reinforcement Learning • Multi Task Reinforcement Learning (MTRL) • Modeling the MTRL Problem • MTRL Transfer Algorithm • Estimating parameters of the generative process. • Action Selection. • Results • Conclusion

24. Environment Bayesian Model Based RL • Given prior: • Plan using updated model. • Most work uses uninformed priors. • Selection of prior not supported by data. • Priors do not facilitate transfer.