260 likes | 300 Views
This outline covers the basics of Reinforcement Learning (RL), including Model-Based and Model-Free approaches like Q-Learning and SARSA, along with challenges and examples like Pac-Man and Spider MDPs. Dive into important terms, basic equations, transfer learning, and agent models in RL. Practical applications on physical robots are discussed, along with advanced combinations and simulator results. Find resources and references for further exploration into RL and AI in this comprehensive guide.
E N D
Presented by: Kyle Feuz Reinforcement Learning
Outline • Motivation • MDPs • RL • Model-Based • Model-Free • Q-Learning • SARSA • Challenges
Examples • Pac-Man • Spider
MDPs • 4-tuple (State, Actions, Transitions, Rewards) .
Important Terms • Policy • Reward Function • Value Function • Model
Model-Based RL • Learn transition function • Learn expected rewards • Compute the optimal policy
Model-Free RL • Learn expected rewards/values • Skip learning transistion function • Trade-offs?
Examples • Pac-Man • Spider • Mario
Q-Learning Q(s, a) = = (1 − α)Q(s, a) + α[R(s, s′ ) + Max Q(s′ , a′ )]
Q-Learning • Demo Video
SARSA Q-Learning Q(s, a) = = (1 − α)Q(s, a) + α[R(s, s′ ) + Q(s′ , a′ )]
Challenges • Explore vs. Exploit • State Space representation • Training Time • Multiagent Learning • Moving Target • Competive or Cooperative
Transfer Learning for Reinforcement Learning on a Physical Robot • Applied TL and RL on Nao robot • TL using the q-value reuse approach • RL uses SARSA variant • State space is represented via CMAC • Neural Network inspired by the cerebellum • Acts as an associative memory • Allows agents to generalize the state space
SARSA Update Rule Q(s, a) = = (1 − α)Q(s, a) + α[R(s, s′ ) + γe(s, a)Q(s′ , a′ )]
Q-Value Reuse Q(s, a) = = Qsource (χX (s), χA (a)) + Qtarget (s, a)
Experimental Setup • Seated Nao robot • Hit the ball at 45 angle • 5 Actions in Source – 9 Actions in Target
Examples • Pac-Man • Spider • Mario • Q-Learning • Penalty Kick • Others
References and Resources • rl repository • rl-community • rl on PBWorks • rl warehouse • Reinforcement Learning: An Introduction • Artificial Intelligence: A Modern Approach • How to Make Software Agents do the Right Thing