1 / 34

Multi-armed Bandit Problem and Bayesian Optimization in Reinforcement Learning

Multi-armed Bandit Problem and Bayesian Optimization in Reinforcement Learning. From Cognitive Science and Machine Learning Summer School 2010 . Loris Bazzani. Outline Summer School. www.videolectures.net. Outline Summer School. www.videolectures.net. Outline Presentation.

deion
Download Presentation

Multi-armed Bandit Problem and Bayesian Optimization in Reinforcement Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-armed Bandit Problem and Bayesian Optimization in Reinforcement Learning From Cognitive Science and Machine Learning Summer School 2010 Loris Bazzani

  2. Outline Summer School www.videolectures.net

  3. Outline Summer School www.videolectures.net

  4. Outline Presentation • What are Machine Learning and Cognitive Science? • How are they related each other? • Reinforcement Learning • Background • Discrete case • Continuous case

  5. Outline Presentation • What are Machine Learning and Cognitive Science? • How are they related each other? • Reinforcement Learning • Background • Discrete case • Continuous case

  6. What is Machine Learning (ML)? • Endow computers with the ability to “learn” from “data” • Present data from sensors, the internet, experiments • Expect computer to make decisions • Traditionally categorized as: • Supervised Learning: classification, regression • Unsupervised Learning: dimensionality reduction, clustering • Reinforcement Learning: learning from feedback, planning From N. Lawrence slides

  7. What is Cognitive Science (CogSci)? • How does the mind get so much out of so little? • Rich models of the world • Make strong generalizations • Process of reverse engineering of the brain • Create computational models of the brain • Much of cognition involves induction: finding patterns in data From N. Chater slides

  8. Outline Presentation • What are Machine Learning and Cognitive Science? • How are they related each other? • Reinforcement Learning • Background • Discrete case • Continuous case

  9. Link between CogSci and ML • ML takes inspiration from psychology, CogSci and computer science • Rosenblatt’s Perceptron • Neural Networks • … • CogSci uses ML as engineering toolkit • Bayesian inference in generative models • Hierarchical probabilistic models • Approximated methods of learning and inference • …

  10. Outline Presentation • What are Machine Learning and Cognitive Science? • How are they related each other? • Reinforcement Learning • Background • Discrete case • Continuous case

  11. Outline Presentation • What are Machine Learning and Cognitive Science? • How are they related each other? • Reinforcement Learning • Background • Discrete case • Continuous case

  12. Multi-armed Bandit Problem[Auer et al. ‘95] I wanna win a lot of cash!

  13. Multi-armed Bandit Problem[Auer et al. ‘95] • Trade-off between Exploration and Exploitation • Adversary controls payoffs • No statistical assumptions on the rewards distribution • Performances measurement: Regret = Player Reward – Best Reward • Upper Bound on the Expected Regret

  14. Multi-armed Bandit Problem[Auer et al. ‘95] Reward(s) Sequence of Trials Actions Goal: define a probability distribution over

  15. The Full Information Game[Freund & Shapire ‘95] Regret Bound: Problem: Compute the reward for each action!

  16. The Partial Information Game Exp3 = Exponential-weight algorithm for Exploration and Exploitation Bound for certain values of and depending on the best reward Tries out all the possible actions Update only the selected action

  17. The Partial Information Game Exp3.1 = Exp3 with rounds, where a round consists of a sequence of trials Each round guesses a bound for the total reward of the best action Bound:

  18. Applications [Hedge][Bazzani et al. ‘10]

  19. Outline Presentation • What are Machine Learning and Cognitive Science? • How are they related each other? • Reinforcement Learning • Background • Discrete case • Continuous case

  20. Bayesian Optimization [Brochu et al. ‘10] • Optimize a nonlinear function over a set: Function that gives rewards actions

  21. Bayesian Optimization [Brochu et al. ‘10] • Uses the Bayesian Theorem where Prior: our beliefs about the space of possible objective functions Posterior: our updated beliefs about the unknown objective function Likelihood: given what we think we know about the prior, how likely is the data we have seen? Goal: maximize the posterior at each step, so that each new evaluation decreases the distance between the true global maximum and the expected maximum given the model.

  22. Bayesian Optimization [Brochu et al. ‘10]

  23. Priors over Functions • Convergence conditions of BO: • The acquisition function is continuous and approximately minimizes the risk • Conditional variance converges to zero • The objective is continuous • The prior is homogeneous • The optimization is independent of the m-th differences Guaranteed by Gaussian Processes (GP)

  24. Priors over Functions • GP = extension of the multivariate Gaussian distribution to an infinite dimension stochastic process • Any finite linear combination of samples will be normally distributed • Defined by its mean function and covariance function • Focus on defining the covariance function

  25. Why use GPs? • Assume zero-mean GP, function values are drawn according to , where • When a new observation comes • Using Sherman-Morrison-Woodbury formula

  26. Choice of Covariance Functions • Isotropic model with hyperparameter • Squared Exponential Kernel • Mater Kernel Gamma function Bessel function

  27. Acquisition Functions • The role of the acquisition function is to guide the search for the optimum and the uncertainty is great • Assumption: Optimize the acquisition function is simple and cheap • Goal: high acquisition corresponds to potentially high values of the objective function • Maximizing the probability of improvement

  28. Acquisition Functions • Expected improvement • Confidence bound criterion CDF and PDF of normal distribution

  29. Applications [BO] Learn a set of robot gait parameters that maximize velocity of a Sony AIBO ERS-7 robot Find a policy for robot path planning that would minimize uncertainty about its location and heading Select the locations of a set of sensors (e.g., cameras) in a dynamic system

  30. Take-home Message • ML and CogSci are connected • Reinforcement Learning is useful for optimization when dealing with temporal information • Discrete case: Multi-armed bandit problem • Continuous case: Bayesian optimization • We can employ these techniques for Computer Vision and System Control problems

  31. [Abbeel et al. 2007] http://heli.stanford.edu/

  32. Some References P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. 1995. Gambling in a rigged casino: The adversarial multi-armed bandit problem. FOCS '95. Yoav Freund and Robert E. Schapire. 1995. A decision-theoretic generalization of on-line learning and an application to boosting. EuroCOLT '95. Eric Brochu, Vlad Cora and Nando de Freitas. 2009. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. Technical Report TR-2009-023. UBC. Loris Bazzani, Nando de Freitas and Jo-Anne Ting. 2010. Learning attentional mechanisms for simultaneous object tracking and recognition with deep networks. NIPS 2010 Deep Learning and Unsupervised Feature Learning Workshop. Carl Edward Rasmussen and Christopher K. I. Williams. 2005. Gaussian Processes for Machine Learning. The MIT Press. Pieter Abbeel, Adam Coates, Morgan Quigley, and Andrew Y. Ng. 2007. An Application of Reinforcement Learning to Aerobatic Helicopter Flight. NIPS 2007.

More Related