1 / 83

Socially Guided Machine Learning

Socially Guided Machine Learning. Andrea L. Thomaz PhD Thesis Defense April 7, 2006. Socially Guided Machine Learning. Andrea L. Thomaz PhD Thesis Defense April 7, 2006. Thesis Committee. Cynthia Breazeal Associate Professor of Media Arts & Sciences, MIT. Rosalind Picard

kimn
Download Presentation

Socially Guided Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Socially Guided Machine Learning • Andrea L. Thomaz • PhD Thesis Defense • April 7, 2006

  2. Socially Guided Machine Learning Andrea L. Thomaz PhD Thesis Defense April 7, 2006

  3. Thesis Committee Cynthia Breazeal Associate Professor of Media Arts & Sciences, MIT Rosalind Picard Professor of Media Arts & Sciences, MIT Andrew Barto Professor of Computer Science, U. Massachusetts, Amherst

  4. If robots are going to be successfully deployed in human environments, like homes schools and offices... They will need to learn new skills from everyday people.

  5. Socially Guided Machine Learning How can algorithms and systems take better advantage of learning from a human partner and the ways that partner will naturally approach teaching?

  6. Personalization agents, Adaptive user interfaces {Lashkari, Metral, Maes, Collaborative Interface Agents, AAAI 1994} {E. Horovitz et al., The Lumiere project, UAI 1998} Active Learning, Learning with Queries {Cohn, Ghahramani, Jordan, Active learning with statistical models, 1995} {Cohn et al., Semi-supervised clustering with user feedback, 2003} Learning by Demonstration, Programming by Example {Voyles, Khosla, Programming robotic agents by demonstration, 1998} {Lieberman, Your Wish is my Command, 2001} Learning by Imitation {S. Schaal review in TICS 1999} Animal training techniques {Stern, Frank, Resner, Virtual Petz, Agents 1998} {Blumberg et al. Integrated learning for interactive characters, SIGGRAPH 2002} {Kaplan et al., Robot clicker training, RAS 2002} Reinforcement Learning with humans {Isbell et al. Cobot: a social reinforcement learning agent, UAI 1998} {Evans, Varieties of Learning, AI Game Programming Wisdom, 2002} {Clouse, Utgoff, Teaching a Reinforcement Learner, ICML 1992}

  7. Socially Guided Machine Learning

  8. Guidance Initial Experiment Transparency } Asymmetry Overview

  9. Research Platforms Leonardo Sophie’s Kitchen

  10. The Leonardo Platform Inputs Cognitive Architecture . Pointing gesture recognition . Eye cameras & environmental cameras for object recognition . Head pose tracking (Darrell) . Sphinx-4 speech recognition . Builds on c5m system of the Synthetic Characters Group

  11. Sophie’s Kitchen • A “computer game” - players teach a virtual robot to bake a cake, by sending various messages with a mouse interface. Sophie learns via Q-Learning ~10,000 states 2-7 actions/state

  12. Sophie’s Kitchen • A “computer game” - players teach a virtual robot to bake a cake, by sending various messages with a mouse interface. Human player uses the mouse to give feedback to Sophie

  13. Sophie’s Kitchen • A “computer game” - players teach a virtual robot to bake a cake, by sending various messages with a mouse interface. An object specific reward is about a particular part of the world

  14. Guidance Initial Experiment } Transparency Asymmetry

  15. Experiments in Sophie’s Kitchen • “How Do People Want to Teach?”

  16. Findings: Guidance • People tried to use the object specific rewards as future directed guidance.

  17. Never About Most Recent Object Always About Most Recent Object % % % % % % Each player’s %Object Rewards about last object • Many object rewards not about the last object used

  18. Number of People At least 1 reward to Empty Bowl Zero rewards to Empty Bowl • Almost everyone gave rewards to the bowl or tray sitting empty on the shelf...a guidance reward.

  19. Findings: People Infer a Mental Model • People gave more rewards after realizing their feedback made a difference

  20. human rewards : agent actions (Avg) (Avg) (Avg) Individual Individual Individual

  21. Findings: Positive Bias • Even in the first quarter of their training sessions, most people had a positive bias in their rewards.

  22. Guidance Transparency } Asymmetry Initial Experiment

  23. Guidance Initial Experiment } Transparency Asymmetry

  24. Guidance • What’s the right level of interaction?

  25. Guidance Exploration

  26. Guidance Exploration . Learning by Demonstration . Programming by Example . Imitation learning . Programming with natural language

  27. Guidance Exploration RL with human reward . Robot shaping . RL game characters . . Learning by Demonstration . Programming by Example . Imitation learning . Programming with natural language

  28. Leo Learning in a Social Dialog Leo Learning in Guided Exploration Adding Guidance to Sophie Original Sophie Guidance Exploration

  29. Learning within a Social Dialog • Goal-oriented task built based on known actions and tasks. • Expands hypotheses of goal representations. • Through tightly coupled dialog with a human partner, the hypothesis space is refined to the best representation of the task.

  30. Tasks & Goals • Task structure & goals inferred in interaction with human teacher

  31. Goals Goal Inferred: Criteria & Expectation Features for each object incurring change over the task/action. Example: Task X A A B B expectation: color: red criteria: type: toy shape: cir. loc: 1,2,3 name: A

  32. Expand Task Hypotheses • Exact action sequence is always a hypothesis • AND expands hypothesis space of representations consistent with the current task example

  33. color: red color: red color: red type: toy shape: cir. loc: 1,2,3 name: A type: toy shape: cir. loc: 4,5,6 name: C type: toy shape: cir. Expand Task Hypotheses • Common Goal Belief = least common denominator for all the changed objects. Example: C C Task X A A B B

  34. color: red color: red color: red shape: cir. type: toy shape: cir. type: toy And include the literal version too... color: red color: red type: toy shape: cir. loc: 1,2,3 name: A type: toy shape: cir. loc: 4,5,6 name: C Expand Task Hypotheses Expand various combinations

  35. Hypothesis Testing • Current best task representation chosen through Bayesian likelihood, P(h|D) ~ P(D|h)P(h) • D = examples seen of this task so far • P(D|h) = % examples consistent with hypothesis, h • P(h) = prefers specific (more criteria over less)

  36. Learning within a Social Dialog

  37. Utilizing Guidance in Sophie’s Kitchen Interactive Q-Learning Algorithm used in the Original Sophie experiment slight delay to animate act and receive human reward }

  38. Utilizing Guidance in Sophie’s Kitchen

  39. + >> only Effects of Guidance • 28 subjects played Sophie’s Kitchen in lab Conditions: feedback vs. feedback+guidance t(26); p<.01 for each

  40. Leo Learning in a Social Dialog Leo Learning in Guided Exploration Benefits: Benefits: . Teacher need not know the task exactly . Teacher need not be present for learning . Social Cues frame the learning interaction . Assumes goal-oriented partner helps build flexible task Guidance Exploration Sophie with Guidance

  41. Novelty Mastery Activity hi lo initial range | | | drift Guided Exploration Self-Motivated Behavior

  42. Novelty Mastery Activity Novelty Guided Exploration

  43. Task Learning Action Group Novelty Action Explore Action Relevance Action

  44. expectation: Goal features ... ... criteria: action values: Task Option Model State x Act.1 : val Act.2 : val .... learning mechanism: RL with self created goal, learn hierarchical policy to achieve it (Options & Intra-Option learning) Task Representation

  45. Guided Exploration Human partner influences learning . Directing attention . Suggesting actions . Labeling goal states . Providing positive / negative feedback

  46. Leo’s Virtual Playroom

More Related