Recurrent Neural Networks and Reinforcement Learning

Neural NetworksChapter 7 Joost N. Kok Universiteit Leiden

Recurrent Networks • Learning Time Sequences: • Sequence Recognition • Sequence Reproduction • Temporal Association

Recurrent Networks • Tapped Delay Lines: • Keep several old values in a buffer

Recurrent Networks • Drawbacks: • Length must be chosen in advance, leads to large number of input units, large number of training patterns, etc. • Replace fixed time delays by filters:

Output Nodes Hidden Nodes Input Nodes Context Nodes Recurrent Networks • Partially recurrent networks

Recurrent Networks • Jordan Network

Elman Network Output Nodes Hidden Nodes Input Nodes Context Nodes Recurrent Networks

Output Units Context Layer Hidden Layer Context Layer Hidden Layer Context Layer Input Layer Recurrent Networks • Expanded Hierarchical Elman Network

Recurrent Networks

Recurrent Networks • Back-Propagation Through Time

Reinforcement Learning • Supervised learning with some feedback • Reinforcement Learning Problems: • Class I: reinforcement signal is always the same for given input-output pair • Class II: stochastic environment, fixed probability for each input-output pair • Class III: reinforcement and input patterns depend on past history of network output

Associative Reward-Penalty • Stochastic Output Units • Reinforcement Signal • Target • Error

Associative Reward Penalty • Learning Rule

Models and Critics Environment

Reinforcement Comparison Critic Environment

Reinforcement Learning • Reinforcement-Learning Model • Agent receives input I which is some indication of current state s of environment • Then the agent chooses an action a • The action changes the state of the environment and the value is communicated through a scalar reinforcement signal r

Reinforcement Learning • Environment: You are in state 65. You have four possible actions. • Agent: I’ll take action 2. • Environment: You received a reinforcement of 7 units. You are now in state 15. You have two possible actions. • Agent: I’ll take action 1. • Environment: You received a reinforcement of -4 units. You are now in state 12. You have two possible actions. • Agent: I’ll take action 2. • …

Reinforcement Learning • Environment is non-deterministic: • same action in same state may result in different states and different reinforcements • The environment is stationary: • Probabilities of making state transitions or receiving specific reinforcement signals do not change over time

Reinforcement Learning • Two types of learning: • Model-free learning • Model based learning • Typical application areas: • Robots • Mazes • Games • …

Reinforcement Learning • Paper: A short introduction to Reinforcement Learning (Stephan ten Hagen and Ben Krose)

Reinforcement Learning • Environment is a Markov Decision Proces

Reinforcement Learning • Optimize interaction with environment • Optimize action selection mechanism • Temporal Credit Assignment Problem • Policy: action selection mechanism • Value function:

Reinforcement Learning • Optimal Value function based on optimal policy:

Reinforcement Learning • Policy Evaluation: approximate value function for given policy • Policy Iteration: start with arbitrary policy and improve

Reinforcement Learning • Improve Policy:

Reinforcement Learning • Value Iteration: combine policy evaluation and policy improvement steps:

Reinforcement Learning • Monte Carlo: use if and are not known • Given a policy, several complete iterations are performed • Exploration/Exploitation Dilemma • Extract Information • Optimize Interaction

Reinforcement Learning • Temporal Difference (TD) Learning • During interaction, part of the update can be calculated • Information from previous interactions is used

Reinforcement Learning • TD(l) learning: discount factor l : the longer ago the state was visited, the less it will be effected by the present update

Reinforcement Learning • Q-learning: combine actor and critic:

Reinforcement Learning • Use temporal difference learning

Reinforcement Learning • Q(l) learning:

Reinforcement Learning • Feedforward Neural Networks are used when state/action spaces are large for of estimates of V(s) and Q(s,a).

Recurrent Neural Networks and Reinforcement Learning

Recurrent Neural Networks and Reinforcement Learning

Presentation Transcript

Neural Networks Chapter 2

Chapter 7 Artificial Neural Networks

Neural Networks Chapter 4

Chapter 4: Artificial Neural Networks

Chapter 3 ARTIFICIAL NEURAL NETWORKS

Chapter 11 Neural Networks

Chapter 7 Artificial Neural Networks

Neural Networks

Neural Networks

Neural networks

Neural Networks

Chapter 11 – Neural Networks

Chapter 7 (part 3) Neural Networks

Chapter 7 Neural Networks in Data Mining

Chapter 3 ARTIFICIAL NEURAL NETWORKS

Chapter 5 NEURAL NETWORKS

Neural Networks

Neural Networks Chapter 6

Neural Networks Chapter 9

Chapter 11 Neural Networks

Neural Networks Chapter 6

Neural Networks