1 / 19

Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein

Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein MTAT.03.292 Seminar in Computational Neuroscience Zurab Bzhalava. Introduction. Operant Learning

kibo-hicks
Download Presentation

Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reinforcement learning and human behavior Hanan Shteingart and Yonatan Loewenstein MTAT.03.292 Seminar in Computational Neuroscience Zurab Bzhalava

  2. Introduction • Operant Learning • Dominant computational approach to model operant learning is model-free RL • Human behavior is far more complex • Remaining Challenges

  3. Reinforcement Learning RL: A class of learning problems in which an agent interacts with an unfamiliar, dynamic and stochastic environment Goal: Learn a policy to maximize some measure of long-term reward

  4. Markov Decision Process • A (finite) set of states S • A (finite) set of actions A • Transition Model: T(s, a, s’) = P(s’ | a ,s) • Reward Function: R(s) • ᵧ is a discount factor ᵧ ∈ [0; 1] • Policy π • Optimal policy π*

  5. Markov Decision Process Bellman equation:

  6. Biological Algorithms • Behavioral control • Evaluate the world quickly • Choose appropriate behavior based on those valuations

  7. midbrain's dopamine neurons • Central role in guiding our behavior and thoughts • Valuation of our world • Value of money • Other human being • Major role in decision-making • Reward-dependent learning • Malfunction in mental illness • Related to Parkinson's disease. • Schizophrenia

  8. Reinforcement signals define an agent's goals • organism is in state X an receives reward information; • organism queries stored value of state X; • organism updates stored value of state X based on current reward information; • organism selects action based on stored policy • organism transitions to state Y and receives reward information.

  9. The reward-prediction error hypothesis Difference between the experienced and predicted “reward” of an event • Neurons of the ventral tegmental area • phasic activity changes encode a 'prediction error about summed future reward'

  10. prediction-error signal encoded in dopamine neuron firing.

  11. Value binding

  12. Human reward responses • Orbitofrontal Cortex (OFC) • Amygdala (Amyg) • Nucleus Accumbens • Sublenticularextended amygdala • Hypothalamus (Hyp) • Ventral Tegmental Area (VTA)

  13. Human reward responses

  14. Model-based RL vs Model-free RL • goal-directed vs habitual behaviors • Implemented by two anatomically distinct systems (subject of debate) • Some findings suggest: • Medial striatum is more engaged during planning • Lateral striatum is more engaged during choices in extensively trained tasks

  15. Model-based RL vs Model-free RL (b) Model-free RL (c) Model-based RL Human subjects in exhibited a mixture of both effects.

  16. Challenges in relating human behavior to RL algorithms • Humans tend to alternate rather than repeat an action after receiving a positively surprising payoff • Tremendous heterogeneity in reports on human operant learning • Probability matching or not

  17. Heterogeneity in world model Questions?

  18. Learning the world model Questions?

  19. Reference List: • Reinforcement learning and human behavior HananShteingartand Yonatan Loewenstein • The ubiquity of model-based reinforcement learning Bradley B Doll Dylan A Simon3 and Nathaniel D Daw • Computational roles for dopamine in behavioral control P. Read Montague1,2, Steven E. Hyman3 & Jonathan D. Cohen4,5

More Related