1 / 29

Skill Acquisition via Transfer Learning and Advice Taking

Skill Acquisition via Transfer Learning and Advice Taking. Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University of Minnesota-Duluth, USA. Transfer Learning. Agent encounters related Task B. So far the user provides this info to the agent.

jamesmdrake
Download Presentation

Skill Acquisition via Transfer Learning and Advice Taking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Skill Acquisition via Transfer Learningand Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University of Minnesota-Duluth, USA

  2. Transfer Learning Agent encounters related Task B So far the user provides this info to the agent Task A is the source. Task B is the target. Agent discovers how tasks are related Agent uses knowledge from Task A to learn Task B faster Agent learns Task A

  3. Transfer Learning The goal for the target task: with transfer without transfer performance training

  4. Reinforcement Learning Overview Described by a set of features Observe world state Take an action Policy: choose the action with the highest Q-value in the current state Receive a reward Use the rewards to estimate the Q-values of actions in states

  5. Transfer in Reinforcement Learning • What knowledge will we transfer from the source? • Q-functions (Taylor & Stone 2005) • Policies (Torrey et al. 2005) • Skills (this work) • How will we extract that knowledge from the source? • From Q-functions (Torrey et al. 2005) • From observed behavior (this work) • How will we apply that knowledge in the target? • Model reuse (Taylor & Stone 2005) • Advice taking (Torrey et al. 2005, this work)

  6. Advice Taking • Apply advice as soft constraints(KBKR, 2005) For each action, find the Q-function that minimizes: Error on Training Data Disagreement with Advice Complexity of Q-function + + • Advice: instructions for the learner In these states Qaction1 > Qaction2 IF: condition THEN: prefer action

  7. Experimental Domain: RoboCup MoveDownfield (MD) KeepAway (KA/MKA) Keep the ball Stone & Sutton 2001 Cross the line Torrey et al. 2006 BreakAway (BA) Score a goal Maclin et al. 2005 Different objectives, but a transferable skill: passing to teammates

  8. A Challenge for Skill Transfer I’m open and far from you. Pass to me! I’m open and near the goal. Pass to me! • Shared skills are not exactly the same • Skills have general and specific aspects • Aspects of the pass skill in RoboCup • General: teammate must be open • Game-specific: where teammate should be located • Player-specific: whether teammate is nearest or furthest

  9. Addressing the Challenge • We focus on learning generalskillaspects • These should transfer better • We learn skills that apply tomultiple players • This generalizes over player-specific aspects • We allow humans to provide information • They can point out game-specific aspects

  10. Human-Provided Information • User provides a mapping to show task similarities • May also provide user advice about task differences Pass Ø Ø Pass towards goal Move towards goal Shoot at goal

  11. Our Transfer Algorithm Observe source task games to learn skills Translate learned skills into transfer advice Create advice for the target task If there is user advice, add it in Learn target task with KBKR

  12. Learning Skills By Observation State 1: distBetween(me,teammate2) = 15 distBetween(me,teammate1) = 10 distBetween(me,opponent1) = 5 ... action = pass(teammate2) outcome = caught(teammate2) • Source-task games are sequences: (state, action) • Learning skills is like learning to classify states by their correct actions • We use Inductive Logic Programming to learn classifiers

  13. Advantages of ILP • Can produce first-order rules for skills • Capture only the essential aspects of the skill • We expect these aspects to transfer better • Can incorporate background knowledge pass(teammate1) . pass(Teammate) vs. . . pass(teammateN)

  14. Preparing Datasets for ILP action = pass(Teammate) ? yes no no Q(pass) is high? no yes Q(other) is high? no Q(pass) is highest? yes no yes Q(pass) is lower? no outcome = caught(Teammate) ? yes yes Positive example for pass(Teammate) Reject example Negative example for pass(Teammate)

  15. Example of a Skill Learned pass(Teammate) :- distBetween(me, Teammate) > 14, passAngle(Teammate) > 30, passAngle(Teammate) < 150, distBetween(me, Opponent) < 7.

  16. Technical Challenges • KBKR requires propositional advice • We instantiate each rule head • Variables in rule bodies create disjunctions • We use tile features to translate them • Variables can appear multiple times • We create new features to translate them

  17. Two Experimental Scenarios Pass Ø Ø Pass towards goal Move towards goal Shoot at goal 3-on-2 BA 4-on-3 MKA Pass MoveAhead Ø Pass MoveAhead Shoot at goal 3-on-2 BA 3-on-2 MD

  18. Skill Transfer Results From MKA Without transfer From MD

  19. Breakdown of MKA Results

  20. What if User Advice is Bad?

  21. Related Work • Q-function transfer in RoboCup • Taylor & Stone (AAMAS 2005, AAAI 2005) • Transfer via policy reuse • Fernandez & Veloso (AAMAS 2006, ICML workshop 2006) • Madden & Howley (AI Review 2004) • Transfer via relational RL • Driessens et al. (ICML workshop 2006)

  22. Summary of Contributions • Transfer of shared skills in high-level logic • Despite differences in shared skills • Demonstration of the value of user guidance • Easy to give and beneficial • Effective transfer in the RoboCup domain • Challenging and dissimilar tasks

  23. Future Work • Learn more general skills by combining multiple source tasks • Compare several transfer methods on RoboCup scenarios of varying difficulty • Reach similar levels of transfer with less user input

  24. Acknowledgements • DARPA Grant HR0011-04-1-0007 • US Naval Research Laboratory Grant N00173-06-1-G002 Thank You

  25. User Advice IF: distBetween(me,goal) < 10 AND angle(goal, me, goalie) > 40 THEN: prefer shoot This is the part that came from transfer IF: distBetween(me,goal) > 10 THEN: prefer move_ahead IF: [transferred conditions] AND distBetween(Teammate,goal) < distBetween(me,goal) THEN: prefer pass(Teammate)

  26. Feature Tiling Original feature max value min value Tiling #1 Tiling #2 … … Tiling #8 (16 tiles) Tiling #9 Tiling #10 (8 tiles) Tiling #11 (8 tiles)

  27. Propositionalizing Rules • Step 1: rule head pass(Teammate) :- distBetween(me, Teammate) > 14, … pass(teammate1) :- distBetween(me, teammate1) > 14, … pass(teammateN) :- distBetween(me, teammateN) > 14, … …

  28. Propositionalizing Rules • Step 2: single-variable disjunctions distBetween(me, Opponent) < 7 distBetween(me,opponent1) < 7 OR … OR distBetween(me,opponentN) < 7 distBetween(me,opponent1)[0,7] + … + distBetween(me,opponentN )[0,7]≥ 1

  29. Propositionalizing Rules • Step 3: linked-variable disjunctions distBetween(me, Player) > 14, distBetween(Player, goal) < 10 newFeature(player1) + … + newFeature(playerN) ≥ 1 newFeature(Player) :- Dist1 is distBetween(me, Player), Dist2 is distBetween(Player, goal), Dist1 > 14, Dist2 < 10. Add to target task feature space:

More Related