transfer learning in sequential decision problems a hierarchical bayesian approach n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach PowerPoint Presentation
Download Presentation
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach

Loading in 2 Seconds...

play fullscreen
1 / 24

Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach - PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on

Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach. Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State University. Markov Decision Processes. MDP M : R : Policy Seek optimal policy:. Environment. Agent. Environment M1.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Transfer Learning in Sequential Decision Problems:A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State University

    2. Markov Decision Processes • MDP • M : • R : • Policy • Seek optimal policy: Environment Agent

    3. Environment M1 Environment M2 Environment Mn Multi Task Reinforcement Learning (MTRL) • Given: A sequence of Markov Decision Processes drawn from an unknown distribution D. • Goal: Leverage past experience to improve performance on new MDPs drawn from D.

    4. MTRL Problem • Tasks have hierarchical relationships. • Set of classes (unknown to the agent). • Natural means of transfer (class discovery).

    5. Hierarchical Bayesian Modeling • Foundation: • Dirichlet Process Models • Unknown number of classes. • Discover hierarchical structure. • Explicit formulation of Uncertainty • Adapt machinery to the RL setting. • Well justified transfer for RL problems.

    6. Compute Posterior Select Best Hierarchy Select Actions (Bayesian RL) Basic Hierarchical Transfer Process Process Inference

    7. Hierarchical Bayesian Transfer for RL • Model-Based Multi-Task RL • Prior model for domain models. • Action selection: • Thompson sampling • Planning • Policy-Based Multi-Task RL • Prior for policy parameters. • Action selection: • Bayesian Policy Search algorithm.

    8. Model-Based MTRL • Explicitly Model the Generative Process D • Hierarchy represents classes of MDPs. Class Prior Estimate D

    9. Compute Posterior Plan Action Selection: Exploit estimate of D • Exploit the refined prior (class information). • Sample the MDPs using Thompson Sampling. • Plan with the sampled model (Value Iteration).

    10. Domain 1 • State is a bit vector: • True reward function: • Set of 20 test maps. State

    11. Domain 1 16 previous tasks No Transfer

    12. Policy-Based MTRL • Policy prior. • Infer policy components. • Hierarchy represents reusable policy components. Class Prior Estimate H

    13. Consider Wargus RTS • Multiple Unit types. • Units fulfill tactical roles. • Roles are useful in multiple maps. • Simple->hard instances • Hierarchical policy prior. • Facilitate reuse of roles.

    14. Role Based Policies Set of Roles. Vectors of policy parameters. Who to attack. Set of role assignments. A strategy for assigning agents to roles. Assignment depends on state features. Executing role-based policy 1. Make the assignment 2. Each agent selects action

    15. Transfer of Role-Based Policies • Bayesian Policy Search • Learns • Individual Role parameters. • Role assignment function. • Assignments of agents to roles. • Sample role-based policies • Construct an artificial distribution [Hoffman et. al. NIPS 2007, Muller Bayes Stats.1999] • Search using stochastic simulation • Model free. Bayesian Policy Search

    16. Experiments • Tactical battles in Wargus • Transfer given expert examples. • Learning without expert examples.

    17. Transfer from expert play.

    18. Transfer from self play Use BPS on Training Map 1. Transfer to new map.

    19. Conclusion • Hierarchical Bayesian Modeling for RL Transfer • Model-Based MTRL • Learn classes of domain models. • Transfer: Improved priors for model-based Bayesian RL. • Policy-Based MTRL • Learn re-usable policies. • Transfer: Recombine learned policy components in new tasks. • Solved tactical games in Wargus

    20. Thank You

    21. Outline • Multi-Task Reinforcement Learning (RL). • Markov Decision Processes. • Multi-task RL setting • Policy-Based Multi-task RL • Discover classes of policy components. • Bayesian Policy Search Algorithm. • Conclusion

    22. Policy-Based MTRL • Observed property: • Bags of trajectories. • Transfer: • Classes of policy components • Means of exploiting transferred information: • Recombine existing components in new tasks. • Consequence: • Components reused to learn hard tasks.

    23. Outline • Markov Decision Processes • Bayesian Model Based Reinforcement Learning • Multi Task Reinforcement Learning (MTRL) • Modeling the MTRL Problem • MTRL Transfer Algorithm • Estimating parameters of the generative process. • Action Selection. • Results • Conclusion

    24. Environment Bayesian Model Based RL • Given prior: • Plan using updated model. • Most work uses uninformed priors. • Selection of prior not supported by data. • Priors do not facilitate transfer.