1 / 34

Neural Networks Chapter 7

Neural Networks Chapter 7. Joost N. Kok Universiteit Leiden. Recurrent Networks. Learning Time Sequences: Sequence Recognition Sequence Reproduction Temporal Association. Recurrent Networks. Tapped Delay Lines: Keep several old values in a buffer. Recurrent Networks. Drawbacks:

spalladino
Download Presentation

Neural Networks Chapter 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neural NetworksChapter 7 Joost N. Kok Universiteit Leiden

  2. Recurrent Networks • Learning Time Sequences: • Sequence Recognition • Sequence Reproduction • Temporal Association

  3. Recurrent Networks • Tapped Delay Lines: • Keep several old values in a buffer

  4. Recurrent Networks • Drawbacks: • Length must be chosen in advance, leads to large number of input units, large number of training patterns, etc. • Replace fixed time delays by filters:

  5. Output Nodes Hidden Nodes Input Nodes Context Nodes Recurrent Networks • Partially recurrent networks

  6. Recurrent Networks • Jordan Network

  7. Elman Network Output Nodes Hidden Nodes Input Nodes Context Nodes Recurrent Networks

  8. Output Units Context Layer Hidden Layer Context Layer Hidden Layer Context Layer Input Layer Recurrent Networks • Expanded Hierarchical Elman Network

  9. Recurrent Networks

  10. Recurrent Networks • Back-Propagation Through Time

  11. Reinforcement Learning • Supervised learning with some feedback • Reinforcement Learning Problems: • Class I: reinforcement signal is always the same for given input-output pair • Class II: stochastic environment, fixed probability for each input-output pair • Class III: reinforcement and input patterns depend on past history of network output

  12. Associative Reward-Penalty • Stochastic Output Units • Reinforcement Signal • Target • Error

  13. Associative Reward Penalty • Learning Rule

  14. Models and Critics Environment

  15. Reinforcement Comparison Critic Environment

  16. Reinforcement Learning • Reinforcement-Learning Model • Agent receives input I which is some indication of current state s of environment • Then the agent chooses an action a • The action changes the state of the environment and the value is communicated through a scalar reinforcement signal r

  17. Reinforcement Learning • Environment: You are in state 65. You have four possible actions. • Agent: I’ll take action 2. • Environment: You received a reinforcement of 7 units. You are now in state 15. You have two possible actions. • Agent: I’ll take action 1. • Environment: You received a reinforcement of -4 units. You are now in state 12. You have two possible actions. • Agent: I’ll take action 2. • …

  18. Reinforcement Learning • Environment is non-deterministic: • same action in same state may result in different states and different reinforcements • The environment is stationary: • Probabilities of making state transitions or receiving specific reinforcement signals do not change over time

  19. Reinforcement Learning • Two types of learning: • Model-free learning • Model based learning • Typical application areas: • Robots • Mazes • Games • …

  20. Reinforcement Learning • Paper: A short introduction to Reinforcement Learning (Stephan ten Hagen and Ben Krose)

  21. Reinforcement Learning • Environment is a Markov Decision Proces

  22. Reinforcement Learning • Optimize interaction with environment • Optimize action selection mechanism • Temporal Credit Assignment Problem • Policy: action selection mechanism • Value function:

  23. Reinforcement Learning • Optimal Value function based on optimal policy:

  24. Reinforcement Learning • Policy Evaluation: approximate value function for given policy • Policy Iteration: start with arbitrary policy and improve

  25. Reinforcement Learning • Improve Policy:

  26. Reinforcement Learning • Value Iteration: combine policy evaluation and policy improvement steps:

  27. Reinforcement Learning • Monte Carlo: use if and are not known • Given a policy, several complete iterations are performed • Exploration/Exploitation Dilemma • Extract Information • Optimize Interaction

  28. Reinforcement Learning • Temporal Difference (TD) Learning • During interaction, part of the update can be calculated • Information from previous interactions is used

  29. Reinforcement Learning • TD(l) learning: discount factor l : the longer ago the state was visited, the less it will be effected by the present update

  30. Reinforcement Learning • Q-learning: combine actor and critic:

  31. Reinforcement Learning • Use temporal difference learning

  32. Reinforcement Learning • Q(l) learning:

  33. Reinforcement Learning • Feedforward Neural Networks are used when state/action spaces are large for of estimates of V(s) and Q(s,a).

More Related