Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach - PowerPoint PPT Presentation

transfer learning in sequential decision problems a hierarchical bayesian approach n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach PowerPoint Presentation
Download Presentation
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

play fullscreen
1 / 24
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach
75 Views
Download Presentation
mac
Download Presentation

Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Transfer Learning in Sequential Decision Problems:A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State University

  2. Markov Decision Processes • MDP • M : • R : • Policy • Seek optimal policy: Environment Agent

  3. Environment M1 Environment M2 Environment Mn Multi Task Reinforcement Learning (MTRL) • Given: A sequence of Markov Decision Processes drawn from an unknown distribution D. • Goal: Leverage past experience to improve performance on new MDPs drawn from D.

  4. MTRL Problem • Tasks have hierarchical relationships. • Set of classes (unknown to the agent). • Natural means of transfer (class discovery).

  5. Hierarchical Bayesian Modeling • Foundation: • Dirichlet Process Models • Unknown number of classes. • Discover hierarchical structure. • Explicit formulation of Uncertainty • Adapt machinery to the RL setting. • Well justified transfer for RL problems.

  6. Compute Posterior Select Best Hierarchy Select Actions (Bayesian RL) Basic Hierarchical Transfer Process Process Inference

  7. Hierarchical Bayesian Transfer for RL • Model-Based Multi-Task RL • Prior model for domain models. • Action selection: • Thompson sampling • Planning • Policy-Based Multi-Task RL • Prior for policy parameters. • Action selection: • Bayesian Policy Search algorithm.

  8. Model-Based MTRL • Explicitly Model the Generative Process D • Hierarchy represents classes of MDPs. Class Prior Estimate D

  9. Compute Posterior Plan Action Selection: Exploit estimate of D • Exploit the refined prior (class information). • Sample the MDPs using Thompson Sampling. • Plan with the sampled model (Value Iteration).

  10. Domain 1 • State is a bit vector: • True reward function: • Set of 20 test maps. State

  11. Domain 1 16 previous tasks No Transfer

  12. Policy-Based MTRL • Policy prior. • Infer policy components. • Hierarchy represents reusable policy components. Class Prior Estimate H

  13. Consider Wargus RTS • Multiple Unit types. • Units fulfill tactical roles. • Roles are useful in multiple maps. • Simple->hard instances • Hierarchical policy prior. • Facilitate reuse of roles.

  14. Role Based Policies Set of Roles. Vectors of policy parameters. Who to attack. Set of role assignments. A strategy for assigning agents to roles. Assignment depends on state features. Executing role-based policy 1. Make the assignment 2. Each agent selects action

  15. Transfer of Role-Based Policies • Bayesian Policy Search • Learns • Individual Role parameters. • Role assignment function. • Assignments of agents to roles. • Sample role-based policies • Construct an artificial distribution [Hoffman et. al. NIPS 2007, Muller Bayes Stats.1999] • Search using stochastic simulation • Model free. Bayesian Policy Search

  16. Experiments • Tactical battles in Wargus • Transfer given expert examples. • Learning without expert examples.

  17. Transfer from expert play.

  18. Transfer from self play Use BPS on Training Map 1. Transfer to new map.

  19. Conclusion • Hierarchical Bayesian Modeling for RL Transfer • Model-Based MTRL • Learn classes of domain models. • Transfer: Improved priors for model-based Bayesian RL. • Policy-Based MTRL • Learn re-usable policies. • Transfer: Recombine learned policy components in new tasks. • Solved tactical games in Wargus

  20. Thank You

  21. Outline • Multi-Task Reinforcement Learning (RL). • Markov Decision Processes. • Multi-task RL setting • Policy-Based Multi-task RL • Discover classes of policy components. • Bayesian Policy Search Algorithm. • Conclusion

  22. Policy-Based MTRL • Observed property: • Bags of trajectories. • Transfer: • Classes of policy components • Means of exploiting transferred information: • Recombine existing components in new tasks. • Consequence: • Components reused to learn hard tasks.

  23. Outline • Markov Decision Processes • Bayesian Model Based Reinforcement Learning • Multi Task Reinforcement Learning (MTRL) • Modeling the MTRL Problem • MTRL Transfer Algorithm • Estimating parameters of the generative process. • Action Selection. • Results • Conclusion

  24. Environment Bayesian Model Based RL • Given prior: • Plan using updated model. • Most work uses uninformed priors. • Selection of prior not supported by data. • Priors do not facilitate transfer.