1 / 18

Affective Facial Expressions Facilitate Robot Learning

Affective Facial Expressions Facilitate Robot Learning. Joost Broekens Pascal Haazebroek LIACS, Leiden University, The Netherlands. Why?. Interactive robot learning Facilitate human-robot interaction Study learner-teacher relations Study learning and adaptation Future:

justin
Download Presentation

Affective Facial Expressions Facilitate Robot Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Affective Facial Expressions Facilitate Robot Learning Joost Broekens Pascal Haazebroek LIACS, Leiden University, The Netherlands

  2. Why? • Interactive robot learning • Facilitate human-robot interaction • Study learner-teacher relations • Study learning and adaptation • Future: • enable robots to interactively cooperate in an efficient manner with humans in ways that are natural to humans. • This talk: • human affective facial expressions as training signal to robot

  3. Outline • Emotion influences thought and behavior • EARL: • Studying relation emotion and adaptation in reinforcement learning • This talk: Human affect as reinforcement to robot • Reinforcement-based robot learning • Experiments: • Affect as additional reinforcement • Affect as input to social reward function • Results and conclusions • Learning is positively influence, especially in the learned social reward function case

  4. Emotion, Thought and Behavior • Emotion • Bodily expression (face, posture) • Action tendencies (Frijda) • Feelings • Cognitive appraisal (Arnolds, Lazarus, Scherer) • Affect • “Everything to do with emotion”, as in affective computing, or • abstraction over emotion (e.g., Russell) composed of • Arousal (alertness) • Valence (pleasure) • We use the latter definition of affect in the experiment • Short timescale • Ignore arousal

  5. Emotion, Thought and Behavior • Emotion and affect influence thought and behavior: • The kind of thoughts we have • Mood congruency • The way we process information • Narrow vs. broad look (Goschke & Dreisbach) • A lot vs. a little processing effort (Scherer, Forgas) • What we think about things • Emotion/mood as information (Clore & Gasper) • Emotion as belief anchor (Frijda & Mesquita) • How we learn and adapt • Emotion/affect as social reinforcement • Emotion/affect as intrinsic reinforcement • Emotion as “metaparameter” to control learning process • Empathy

  6. EARL • To study relations between emotion and adaptation in context of reinforcement learning. • Simulated robot (but see later comments) • Maze navigation tasks • Webcam and emotion recognition to interpret emotions • Reinforcement learning (RL) approach to robot learning • Robot has own model of emotion • Robot head to express emotion • Potential influences experimented with • Evaluate models of emotion in RL setting • Evaluate models of emotional expression. • Test influence of emotion/affect on RL learning parameters • Experiment with communicated and robot emotion as reward

  7. EARL • Short movie

  8. Human Affect as Reinforcement to Robot • Interactive robot learning • Learning by Example • E.g., imitation learning (see Breazeal & Scassellati) • Learning by Guidance (Thomaz & Breazeal) • Future directed learning cues • Anticipatory reward • Learning by Feedback • Additional reinforcement signal (Breazeal & Velasquez; Isbell et al; Mitsunaga et al; Papudesi & Hubert) • In our experiment: affective signal as additional reinforcement

  9. Human Affect as Reinforcement to Robot • Affective signal as additional reinforcement • Web cam • Emotional expression analysis • Positive emotion (happy) = reward • Negative emotion (sad) = punishment • So: emotional expression is used in learning as rhuman, a social reward coming from the human observer • Note: • We interpret happy as positively valenced and sad as negatively valenced. • THIS IS A SIMPLIFIED SETUP THAT ENABLES US TO TEST OUR HYPOTHESIS!

  10. Reinforcement-based robot learning • Continuous Gridworld • World features placed on grid • Agent has Real coordinates and speed, unlimited locations • Local perception, agent-based perspective, = current state s • Task • find food (as usual) • Training: Multilayer perceptron networks (MLP) • Input is agent’s perceived state, s. • Each action (fwd, left, right) has two networks • First to train action-value Qa(s) • Second to train inverse action-value (value of NOT doing the action) • Value function has network trained to predict Q(s) • Action-selection uses action values as predicted by MLPs • In terms of representing the world, the perceived state and the actions, this setup is close to real-world robotics.

  11. 2 path wall food b 1/e Reinforcement-based robot learning

  12. Experiments:Affect as additional reinforcement • Test difference between standard agent and social agents • 200 trials to learn path to food • Standard agent uses R(s) from environment to update Qa(s) and Q(s). • Social agent uses rhuman in addition to R(s)

  13. Experiments:Affect as additional reinforcement • Three social settings • Moderate social reinforcement (setting a) • rhuman is small • Long period of training with rhuman (trials 20-30) • Strong social reinforcement (setting b) • rhuman is large • Short period of training with rhuman (trials 20-25) • Learned social reinforcement (setting c) • rhuman is used as above and to train Rsocial(s) (an MLP). • Period using rhuman is between 29 and 45. • After that, Rsocial(s) is used.

  14. Results • Moderate social reinforcement

  15. Results • Strong, short, social reinforcement

  16. Results • Learned social reinforcement

  17. Conclusion • A critical learning period can be used to influence robot learning using affective signals, in real-time, in a non-trivial learning environment. • This has a benefit on learning • Most specifically when the robot learns to predict the social feedback by training a reward function Rsocial(s)

  18. Further work • Use affect/emotion as metaparameter to control • Learning rate • Exploration exploitation • Differentiate between meanings of negative and positive emotions • Anger: negative feedback due to action of agent • Fear: negative anticipatory feedback • Surprise: strong positive feedback due to action of agent • Frustration: connect to exploration/exploitation rate? • Affective Robot-Robot interaction? • Use robot to human signals such as hesitation

More Related