1 / 24

Introduction to Reinforcement Learning and Q-Learning

Introduction to Reinforcement Learning and Q-Learning. Andrew L. Nelson Visiting Research Faculty University of South Florida. Overview. Overview References Introduction Agent and environment Nomenclature Cell World Policy Learning in known space Example Reinforcement Policy Learning

wilhoit
Download Presentation

Introduction to Reinforcement Learning and Q-Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Reinforcement Learning and Q-Learning Andrew L. Nelson Visiting Research Faculty University of South Florida Q-Learning

  2. Overview • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • Outline to the left in green • Current topic in yellow • References • Introduction • Learning an optimal policy in a known environment • Learning an approximate optimal policy in an unknown environment • Example • Generalization and representation • Knowledge based vs general function approximation methods Q-Learning

  3. References • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • C. Watkins, P. Dayan, “Q-Learning,” Machine Learning, vol. 8, pp. 279-292, 1989. • T.M. Mitchell, Machine Learning, WCB/McGraw-Hill, 1997. Q-Learning

  4. Introduction • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • Situated Learning Agents • The Goal of a leaning agent is to learn to choose actions (a) so that the net reward over a sequence of actions is maximized • Supervised learning methods make use of knowledge of the world and of known reward functions • Reinforcement learning methods use rewards to learn an optimal policy in a given (unknown) environment Q-Learning

  5. Agent and Environment • Overview • References • Introduction • Agent andenvironment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • An agent produces an action (a), and receives a reward (and changes the state, s) from a given environment Q-Learning

  6. Nomenclature • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • Action: a  A. • State: s  S. • Reward: r = R(s) • Policy: π: A → S • Optimal Policy: π * • World Model: s' = T(s, a) • Utility: U(s) • Value: Q(a, s) Q-Learning

  7. Cell World • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • Agent • States • Transitions • Reward Q-Learning

  8. Learning π* in Known Environments • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • The supervised method: • Find the maximum possible utility for each state (Iterative search) • learn the optimal policy π*: A → S by learning the action associated with each state s that leads to the next state s' with maximum possible utility, U* • Requirements: • Known world model, T(s, a) • Known reward function, R(s) Q-Learning

  9. Known Rewards and Transitions • References • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • R(s) and s' = T(s, a) known for all s  S and a  A Q-Learning

  10. Calculate U* for Each State (Using an iterative search algorithm, for example) • References • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary Q-Learning

  11. Calculate π* using the known U* values • References • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary π*: U*(s), for all s Q-Learning

  12. Notes • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • Supervised learning methods work well when a complete model of the environment and the reward function are known • Since R(s) and T(s, a) are known, we can reduce learning to a standard iterative learning process. Q-Learning

  13. Unknown Environments • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • What if the environment is unknown? Q-Learning

  14. Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary Q-Learning

  15. The Q-Function • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • Instead of learning utilities, action-state values (Q) will be learned • U(s) = maxaQ(s, a) • Local action and exploration can be used to discover and learn Q(s, a) values in an unknown environment • We will use the following equation: Q(s, a) ← r + maxa' Q(s', a') Q-Learning

  16. The Q-Learning Algorithm • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • Build up a table of Q(s, a) values as follows: • Do forever: From the current state s • Set each un-initialized state-action Q(s, a) value to 0 and add it to table of Q values • With probability p, Select action a with maximum Q value (otherwise select a at random) • Execute a and receive immediate reward r. • Update the table entry for Q(s, a) as Q(s, a) ← r + maxa' Q(s', a') • s ← s' Q-Learning

  17. Q-Learning Example • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • Initialize table and first position Q-Learning

  18. Q-Learning Example • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • Move to s'... iterate Q-Learning

  19. Q-Learning Example • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • Continue Q-Learning

  20. Q-Learning Example • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • Terminal state, start over Q-Learning

  21. Q-Learning Example • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • Starting new iteration Q-Learning

  22. Q-Learning Example • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • After a few more iterations... Q-Learning

  23. Representation and Generalization • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • Policies learned using state transition representations do not generalize to un-visited stated. • Functional representations allow for generalization to states not explored f(s) = p1a + p2a2 + p3a3 ... • Functional representations might cover search spaces that do not contain the target policy. Q-Learning

  24. Summary • Overview • References • Introduction • Agent and environment • Nomenclature • Cell World • Policy Learning in known space • Example • Reinforcement Policy Learning • Q-Function • Q-Algorithm • Example • Generalization • Summary • Reinforcement learning (RL) is useful for learning policies in un-characterized environments • RL uses reward from actions taken during exploration • RL is useful on small state transition spaces • Functional representations increase the power of RL both in terms of generalization and representation Q-Learning

More Related