1 / 46

Outline

Outline. 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) ‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) ‏ Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning

jihan
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007)‏ Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008)‏ Task 6.4 Learning hierarchical world models for planning

  2. Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007)‏ Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008)‏ Task 6.4 Learning hierarchical world models for planning

  3. reinforcement learning action a weights input s

  4. reinforcement learning actor input (state space)‏ simple input complex input go right! go right? go left?

  5. complex input scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’; reward given if horizontal bar at specific position sensory input action reward

  6. need another layer(s) to pre-process complex data a action P(a=1) = softmax(Q s)‏ Q weight matrix encodes v = a Q s sstate s = softmax(W I)‏ position of relevant bar W weight matrix feature detector I input minimize error: E = (0.9 v(s’,a’) - v(s,a))2 = δ2 d Q ≈ dE/dQ = δ a s d W ≈ dE/dW = δ Q s I + ε learning rules:

  7. SARSA with WTA input layer

  8. memory extension model uses previous state and action to estimate current state

  9. learning the ‘short bars’ data feature weights RL action weights data action reward

  10. short bars in 12x12 average # of steps to goal: 11

  11. learning ‘long bars’ data RL action weights feature weights data input reward 2 actions (not shown)‏

  12. WTA non-negative weights SoftMax no weight constraints SoftMax non-negative weights

  13. models’ background: - gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995) - reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)‏ - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)‏

  14. unsupervised learning in cortex actor state space reinforcement learning in basal ganglia Doya, 1999

  15. Discussion - may help reinforcement learning work with real-world data ... real visual processing!

  16. Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007)‏ Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008)‏ Task 6.4 Learning hierarchical world models for planning

  17. Representation of depth • How to learn disparity tuned neurons in V1?

  18. Reinforcement learning in a neural network • after vergence: input at a new disparity • if disparity is zero  reward

  19. Attention-Gated Reinforcement Learning (Roelfsema, van Ooyen, 2005)‏ Hebbian-like weight learning:

  20. Measured disparity tuning curves • Six types of tuning curves (Poggio, Gonzalez, Krause, 1988)‏

  21. Development of disparity tuning All six types of tuning curves emerge in the hidden layer!

  22. Discussion - requires application ... use 2D images from 3D space ... open question as to the implementation of the reward ... learning of attention?

  23. Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007)‏ Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008)‏ Task 6.4 Learning hierarchical world models for planning

  24. Reinforcement Learning leads to a fixed reactive systemthat always strives for the same goal actor units value task: in exploration phase, learn a general model to allow the agent to plan a route to any goal

  25. Learning randomly move around the state space actor learn world models: ● associative model ● inverse model ● forward model state space

  26. Learning: Associative Model weights to associate neighbouring states use these to find any possible routes between agent and goal

  27. Learning: Inverse Model weights to “postdict” action given state pair use these to identify the action that leads to a desired state Sigma-Pi neuron model

  28. Learning: Forward Model weights to predict state given state-action pair use these to predict the next state given the chosen action

  29. Planning

  30. Planning

  31. Planning

  32. Planning

  33. Planning

  34. Planning

  35. Planning

  36. Planning

  37. Planning

  38. Planning

  39. Planning

  40. Planning

  41. Planning

  42. Planning actor units goal agent

  43. Planning

  44. Discussion - requires embedding ... learn state space from sensor input ... only random exploration implemented - tong ... hand-designed planning phases ... hierarchical models?

More Related