1 / 54

A Computational Unification of Cognitive Control, Emotion, and Learning

A Computational Unification of Cognitive Control, Emotion, and Learning. Bob Marinier Oral Defense University of Michigan, CSE June 17, 2008. Introduction. The link between core cognitive functions and emotion has not been fully explored Existing computational models are largely pragmatic

lixue
Download Presentation

A Computational Unification of Cognitive Control, Emotion, and Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Computational Unification of Cognitive Control, Emotion, and Learning Bob Marinier Oral Defense University of Michigan, CSE June 17, 2008

  2. Introduction • The link between core cognitive functions and emotion has not been fully explored • Existing computational models are largely pragmatic • We integrate the PEACTIDM theory of cognitive control and appraisal theories of emotion • PEACTIDM supplies process, appraisal theories supply data • We use emotion-driven reinforcement learning to demonstrate improved functionality • Automatically generate rewards, set parameters

  3. Cognitive Control: PEACTIDM

  4. PEACTIDM Cycle Perceive Environmental Change Raw Perceptual Information Motor Encode What is this information? Stimulus Relevance Motor Commands Prediction Decode Attend Action Stimulus chosen for processing Intend Comprehend Current Situation Assessment

  5. Appraisal Theories of Emotion • A situation is evaluated along a number of appraisal dimensions,many of which relate the situation to current goals • Novelty, goal relevance, goal conduciveness, expectedness, causal agency, etc. • Result of appraisals influences emotion • Emotion can then be coped with (via internal or external actions) Situation Goals Coping Appraisals Emotion

  6. Appraisals to Emotions (Scherer 2001) • Why these dimensions? • What is the functional purpose of emotion?

  7. Unification of PEACTIDM and Appraisal Theories Perceive Environmental Change Raw Perceptual Information Motor Encode Suddenness Unpredictability Goal Relevance Intrinsic Pleasantness Stimulus Relevance Motor Commands Prediction Outcome Probability Decode Attend Causal Agent/Motive Discrepancy Conduciveness Control/Power Action Stimulus chosen for processing Intend Comprehend Current Situation Assessment

  8. Example: Simple Choice Response Task

  9. PEACTIDM in the Button Task Appraisal Frame Suddenness 1 Goal Relevance 1 Conduciveness 1 Discrepancy 0 Outcome Probability 1 “Surprise Factor”

  10. PEACTIDM in the Button Task Appraisal Frame Suddenness 1 Goal Relevance 1 Conduciveness Conduciveness -1 1 Discrepancy Discrepancy 0 1 Outcome Probability 1

  11. Summary of Evaluation • Cognitively generated emotions • Emotions arise from appraisals • Fast primary emotions • Some appraisals generated and activated early • Emotional experience • Cognitive access to emotional state, but no physiology • Body-mind interactions • Emotions can influence behavior • Emotional behavior • Model works and produces useful, purposeful behavior • Different environments lead to: • Different time courses • Different feeling profiles • Choices impact emotions and success

  12. Primary Contributions • Appraisals are functionally required by cognition • They specify the data used by certain steps in PEACTIDM • Appraisals provide a task-independent language for control knowledge • They influence choices such as Attend and Intend • PEACTIDM implies a partial ordering of appraisal generation • Data dependencies imply that some appraisals can’t be generated until after others • Circumplex models can be synthesized from appraisal models • Emotion intensity and valence can be derived from appraisals • Emotion intensity is largely determined by expectations • “Surprise Factor” is determined by Outcome Probability and Discrepancy from Expectation • Some appraisals may require an arbitrary amount of inference • Comprehend can theoretically require arbitrary processing • Internal and external stimuli are treated identically • Tasking options can be Attended and Intended just like external stimuli

  13. Additional Exploration • Functionality: What is emotion good for? • Emotion-driven reinforcement learning • Scale: Does it work in non-trivial domains? • Continuous time/space environment • More complex appraisal generation • Understanding: How do appraisals influence performance? • Try subsets of appraisals

  14. Intrinsically Motivated Reinforcement Learning(Sutton & Barto 1998; Singh et al. 2004) External Environment • Reward = Intensity * Valence Environment Actions Sensations Critic “Organism” Internal Environment Actions Rewards States Critic Appraisal Process Agent +/- Emotion Intensity Decisions Rewards States Agent

  15. Clean House Domain Storage Room Blocks Gateways Rooms Agent

  16. Stimuli in the Environment Gateway to 73 Gateway to 78 Current room Block 1 Gateway to 93 Create subtask clean current room Create subtask go to room 73 Create subtask go to room 78 Create subtask go to room 93

  17. Learning • In this domain, the agent is only learning what to Attend to (including Tasking) • Not learning what action to take • Goal: What is the impact of various appraisals? • Disabled most and developed a few • Conduciveness • Discrepancy from Expectation and Outcome Probability • Goal Relevance • Intrinsic Pleasantness • Method: SARSA, epsilon-greedy, fixed ER and LR • 50 trials, 15 episodes per trial

  18. Conduciveness • Measures how good or bad a stimulus is • Influences emotion intensity and valence • Sufficient to generate a reward • Value based on “progress” and “path” • Progress: Is agent getting closer to goal over time? • Path: Will acting on stimulus get agent closer to goal?

  19. Conduciveness

  20. Outcome Probability and Discrepancy from Expectation • Measures how likely a prediction is and how accurate the prediction is • Influences emotion intensity via “surprise factor” (unvalenced) • Predictions and Outcome Probability generated via learned task model • Results in non-stationary reward • Discrepancy generated via comparison to prediction • Added these appraisals on top of Conduciveness

  21. Outcome Probability and Discrepancy from Expectation

  22. Goal Relevance • Measures how important a stimulus is for the goal • Influences emotion intensity (unvalenced) • Value based on “path” knowledge • Agent actually had too much path knowledge, so removed some • The value of Goal Relevance for some stimulus is used to “boost” the Q-value of the Attend operator for that stimulus • Added this appraisal on top of Conduciveness, Outcome Probability, and Discrepancy

  23. GR Knowledge Reduction Results

  24. Intrinsic Pleasantness • Measures how attracted the agent is to a stimulus independent of the current goal • Influences emotion intensity and valence • Made blocks intrinsically pleasant • This is good because blocks need to be Attended to get cleaned up • This is bad because agent may be distracted by blocks that have already been cleaned up • Replaced Goal Relevance with this appraisal

  25. Intrinsic Pleasantness Results

  26. Dynamic Exploration Rate • Dynamically adjust exploration rate based on current emotion • If Valence < 0, then things could probably be betterER = |Intensity * Valence| • If Valence > 0, then things are okER = 0 • Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only

  27. Dynamic Exploration Rate

  28. Dynamic Learning Rate • Dynamically adjust learning rate based on current emotion • If reward magnitude is large, then there may be something to learnLR = |Intensity*Valence| • Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only, Dynamic Exploration Rate enabled

  29. Dynamic Exploration and Learning Rates • Dynamically adjust exploration and learning rates based on current emotion • If Valence < 0, then things could probably be betterER = |Intensity * Valence| • If Valence > 0, then things are okER = 0 • If reward magnitude is large, then there may be something to learnLR = |Intensity*Valence| • Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only • Results: Tighter convergences, better prediction accuracy, small number of failures

  30. Learning Summary

  31. Secondary Contributions • Reinforcement learning can be driven by intrinsically generated rewards based on the agent’s feeling • Reinforcement learning parameters can be influenced by the current emotional state, resulting in improved performance • Each appraisal contributes to the agent’s performance • The system scales to continuous time and space environments • Mood averages reward over time, allowing states with no reward-invoking stimulus to still have a reward associated with them

  32. Future Work • Integration with other architectural mechanisms • Learning (appraisal values, intend, etc.) Non-verbal communication • Sociocultural interactions • More appraisals (social, perceptual, etc.) Basic drives • Human data • Functionality • Decision making Action tendencies • Behavior • Believability • Physiological measures

  33. Backup Slides

  34. Benefits of Soar • Parallel rule firing allows for: • Parallel Encoding • Parallel appraisal generation • Parallel Decoding (theoretically) • Impasses provide: • Architectural support for PEACTIDM-related subgoals • Intend • Comprehend (theoretically) • Support for fast and extended inference, and transitioning from extended to fast (chunking) • Intend in button task starts out extended and becomes fast • Reinforcement learning allows fast learning from emotion feedback • Future benefits: • New modules may assist in appraisal generation • Episodic/semantic memories, visual imagery, etc.

  35. Architectural Requirements:Soar vs. ACT-R

  36. PEACTIDM and GOMS • In general, these are complementary techniques • GOMS • Focused on HCI • Focused on motor actions (e.g. keypresses) • Less focus on cognitive aspects (more abstract) • PEACTIDM • Focused on required cognitive functions • Allows for a mapping with appraisals • Could implement PEACTIDM with GOMS, but would lack the proper labels that allow for the mapping

  37. Relating Emotion to Intrinsically Motivated RL • Emotion intensity and valence used to: • Generate intrinsic rewards • Various appraisals contribute to the reward signal with varying success • Frequent reward signals allow agent to learn faster, but can also introduce infinite reward cycles • Task modeling helps address cycles • Automatically adjust parameters • Learning and exploration rates • Helps reduce unnecessary exploration, bad learning

  38. Button Task Timing:Before and After Learning

  39. Learning the Task Model Perception/Encoding Stimulus 1 Stimulus 2 Stimulus 3 Task Memory Stim3 Stim2 Prediction (generic) Stim2 0.57 Outcome Probability 0.5 1.0 0.0 0.1 0.15 0.2 Discrepancy 0.0 0.5 Stim1 Surprise Factor 0.5 Medium Medium 0.43 Lower Lower 0.0 0.0 0.0 1.0 0.4 0.2 Intensity Stim3 Reward

  40. Extending Soar with Emotion(Marinier & Laird 2007) • Soar is a cognitive architecture • A cognitive architecture is a set of task-independent mechanisms that interact to give rise to behavior • Cognitive architectures are general agent frameworks Episodic Semantic Symbolic Long-Term Memories Procedural Semantic Learning Episodic Learning Chunking Reinforcement Learning Feeling Generation Short-Term Memory Situation, Goals Decision Procedure Visual Imagery Perception Action Body

  41. Extending Soar with Emotion(Marinier & Laird 2007) Episodic Semantic Symbolic Long-Term Memories Procedural Semantic Learning Episodic Learning Chunking Reinforcement Learning +/-Intensity Feeling Generation Feeling .9,.6,.5,-.1,.8,… Short-Term Memory Situation, Goals Feelings Decision Procedure Feelings Appraisals Visual Imagery Emotion .5,.7,0,-.4,.3,… Mood .7,-.2,.8,.3,.6,… Perception Action Knowledge Body Architecture

  42. Appraisal Value Ranges

  43. Computing Feeling from Emotion and Mood • Assumption: Appraisal dimensions are independent • Limited Range: Inputs and outputs are in [0,1] or [-1,1] • Distinguishability: Very different inputs should lead to very different outputs • Non-linear: Linearity would violate limited range and distinguishability

  44. Example

  45. Maze Tasks no distractions distractions single subgoal impossible multiple subgoals

  46. Time Course and Impact of Feelings

  47. Feeling Dynamics Results very easy

  48. Computing Feeling Intensity • Motivation: Intensity gives a summary of how important (i.e., how good or bad) the situation is • Limited range: Should map onto [0,1] • No dominant appraisal: No single value should drown out all the others • Can’t just multiply values, because if any are 0, then intensity is 0 • Realization principle: Expected events should be less intense than unexpected events

  49. Example

  50. Learning task Start Goal Optimal Subtasks

More Related