1 / 33

9. Modular networks, motor control, and reinforcement learning

9. Modular networks, motor control, and reinforcement learning. Fundamentals of Computational Neuroscience (The 2 nd Ed.), T. P. Trappenberg, 2010. Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer Science and Engineering

daire
Download Presentation

9. Modular networks, motor control, and reinforcement learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 9.Modular networks, motor control, and reinforcement learning Fundamentals of Computational Neuroscience (The 2nd Ed.), T. P. Trappenberg, 2010. Lecture Notes on Brain and Computation Byoung-Tak Zhang Biointelligence Laboratory School of Computer Science and Engineering Graduate Programs in Cognitive Science, Brain Science and Bioinformatics Brain-Mind-Behavior Concentration Program Seoul National University E-mail: btzhang@bi.snu.ac.kr This material is available online at http://bi.snu.ac.kr/

  2. Outline

  3. 9.1 Modular mapping networks • Modular networks • Large-scale networks with constraints • Modular specialization in brain • Mixture of experts • Combining feedforward mapping networks • Experts: working modules

  4. 9.1 Modular mapping networks • Mixture of experts (cont’d) • Property: - Universal function approximator → can solve any mapping task- ex) abstract function (Fig. 9.2) • Divide and conquer strategy • Training networks • Assign the experts to particular tasks • Train each expert on the designated task • Train the gating network(Credit-assignment problem) • Not appropriate in biology systems

  5. 9.1 Modular mapping networks • The ‘what-and-where’ task • Two visual pathways • Ventral visual pathway (what) • Dorsal visual pathway (where) • Modular networks • What → object recognition (what) • Where → location of objects (where)

  6. 9.1 Modular mapping networks • The ‘what-and-where’ task • Jacob’s idea (1991) • Input channels (26): retinal (25) & the task specification (1) • Output channels (18): objects (9) & location (9) • A single network with 36 hidden nodes using back-propagation • Conflicting training information • Temporal cross-talk • Spatial cross-talk • Task decomposition

  7. 9.1 Modular mapping networks • Modular network for what-and-where task • Architectural constraints • Where: linear separable → a single layer network→ a simple expert without hidden layer • What: linear inseparable → need hidden nodes • Jordan’s study • Considering physical location • Objective function • The 1st term: error term • The 2nd term: distance bias

  8. 9.1 Modular mapping networks • Product of experts (G. Hinton) • Summation of experts • Normalization • Averaged • In wide distribution, do not provide precise answer • Product of experts • Opinions of experts outside their domain of expertise have less of an effect

  9. 9.2 Coupled attractor networks • Coupled attractor networks • The combination of basic recurrent networks • Distinguish between network groups • Strongly connected subsystem (intra-module) • Weakly connected subsystem (inter-module)

  10. 9.2 Coupled attractor networks • Imprinted and composite patterns • Comparison of a single attractor with two single attractors • Objects described by two independent features • Left-right visual fields (Fig. 9.5B) • Two independent sub-networks (with 1000 nodes each network) • # ofweights: 10002×2 = 2×106 • One attractor network • At least 138,000 nodes • # of weights: 138,0002 • The reason to use large single networks • Specific combination of features • Green square vs. blue triangle

  11. 9.2 Coupled attractor networks • Signal-to-noise analysis • Provide insights of behavior of coupled attractor networks • N: # of nodes, N’: # of nodes in each module, m: # of modules • Weights with Hebbian rule • New matrix with components

  12. 9.2 Coupled attractor networks • Evaluation of the stability of the imprinted pattern • Simplified z2 instead of z1

  13. 9.2 Coupled attractor networks • The case of starting the network from states that correspond to different sub-patterns in the different modules • The starting state • After one update • Signal and noise • Lower bound of g (with (9.11), Fig. 9.6B)

  14. 9.2 Coupled attractor networks • The reverse case • The starting state • After one update • Signal and noise • Upper bound of g-factor (Fig. 9.6B) • Allude to the possible interaction between sub-networks in modular networks

  15. 9.3 Sequence learning • Sequential aspects in brain processing • Some memories trigger other memories → dynamic system • Modular architecture provides some merits in sequence learning to associative networks • Hetero-association • Auto-associative weights: clean up the noisy version of the new system • Hetero-associative weights: drive the system to a noisy and new version • Hopfield network

  16. 9.3 Sequence learning • Modular networks for sequence learning

  17. 9.4 Complementary memory systems • Distributed model of working memory • R. O’Reilly’s model • PFC (prefrontal cortex) • Many independent recurrent subsystems • Short period • HCMP (hippocampus and related area) • Rapid learning of association for episodic memory • PMC (perceptual and motor cortex) • Semantic memory and action-repertoires

  18. 9.4 Complementary memory systems • Limited capacity of working memory • Magical number 7 ± 2 • Task to remember numbers • Limitations of working memory • Various hypotheses on the reason of limited working memory • A bottleneck in the information processing capabilities of the brain (D. Broadbent) • How limits in attentional systems (N. Cowan) • To reverberating neural models

  19. 9.4 Complementary memory systems • The spurious synchronization hypothesis • Luck and Vogel’s study • Computational neuroscience model

  20. 9.4 Complementary memory systems • The interacting-reverberating-memory hypothesis

  21. 9.5 Motor learning and control • Motor learning • Activities: catch a ball, ride a bicycle • Require more times compared than associative learning • Using reinforcement leanrning • Feedback controller • Using feedforward mapping network

  22. 9.5 Motor learning and control • Forward and inverse model controller

  23. 9.5 Motor learning and control • The cerebellum and motor controller

  24. 9.6 Reinforcement learning • Supervised learning vs. reinforcement learning • Answers vs. feedback (rewards) • Classical conditioning and the reinforcement learning problem • Conditioning • Temporal assignment problem

  25. 9.6 Reinforcement learning • Formulation • Policies and value functions • r(s, a): (s: state, a: action, r: reward function) • Goal: maximizing future reward R • R(t) = summation of r(s, a) in some time window before t • Policies: π(s, a) • State value function: Vπ(s) • Action value function: Qπ(s, a) • Temporal difference learning • Off-policy vs. on-policy

  26. 9.6 Reinforcement learning • Temporal delta rule • Learn from reward within neural architectures • Episodes → Vπ(s) • riin(s): a specific pattern of rates in the input channels • wi(t): the weight in time t

  27. 9.6 Reinforcement learning • Temporal difference learning • Limitation of temporal delta learning: next time step only • Different time steps • Introduction of discount factor γ (0 < γ < 1) • The perfect prediction V* • Minimizing the temporal difference error

  28. 9.6 Reinforcement learning • The learning of a state value function • The learning of an action value function

  29. 9.6 Reinforcement learning • Simulation MATLAB code (produce Fig. 9.16)

  30. 9.6 Reinforcement learning • The actor-critic scheme and the basal ganglia • The actor-critic scheme • Temporal difference learning into a control method • Sutton and Barto proposed • Actor: the motor command generator • The adaptive critic: estimate value functions and guide actions

  31. 9.6 Reinforcement learning • Information stream • The basal ganglia • Anatomical overview

  32. 9.6 Reinforcement learning • Signals of neural activities in the basal ganglia • MATLAB code for Fig. 9.20

  33. 9.6 Reinforcement learning • Q-learning for the basal ganglia functions

More Related