1 / 29

Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another

Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another. Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison, USA Richard Maclin University of Minnesota-Duluth, USA. Our Goal. Transfer knowledge… … between reinforcement learning tasks

Download Presentation

Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another Lisa Torrey, Trevor Walker, Jude Shavlik University of Wisconsin-Madison, USA Richard Maclin University of Minnesota-Duluth, USA

  2. Our Goal Transfer knowledge… … between reinforcement learning tasks … employing SVM function approximators … using advice

  3. Transfer • Exploit previously learned models Learn first task Learn related task knowledge acquired • Improve learning of new tasks with transfer without transfer performance experience

  4. Reinforcement Learning • Q-function: value of taking action from state • Policy: take action with max Qaction(state) +2 0 -1 state…action…reward…new state

  5. Advice for Transfer Based on what worked in Task A, I suggest… • Advice improves RL performance • Advice can be refined or even discarded Task B Learner I’ll try it, but if it doesn’t work I’ll do something else. Task A Solution

  6. Task A experience Task B experience Transfer Process Task A experience Advice from user (optional) Task A Q-functions Mapping from user Task A  Task B Transfer Advice Task B experience Advice from user (optional) Task B Q-functions

  7. RoboCup Soccer Tasks KeepAway BreakAway Keep ball from opponents [Stone & Sutton, ICML 2001] Score a goal [Maclin et al., AAAI 2005]

  8. RL in RoboCup Tasks KeepAway BreakAway Features (time left) Actions Rewards

  9. Transfer Process Task A experience Task A Q-functions Mapping from user Task A  Task B Transfer Advice Task B experience Task B Q-functions

  10. Approximating Q-Functions • Given examples State features Si= <f1 , … , fn> Estimated values y  Qaction(Si) • Learn linear coefficients y = w1f1 + … + wnfn + b • Non-linearity from Boolean tile features tilei,lower,upper = 1 if lower ≤ fi < upper

  11. Support Vector Regression Q-estimate y state S Linear Program minimize ||w||1 + |b| + C||k||1 such that y -k Sw + b y + k

  12. Transfer Process Task A experience Task A Q-functions Mapping from user Task A  Task B Transfer Advice Task B experience Task B Q-functions

  13. Advice Example • Need only follow advice approximately • Add soft constraints to linear program if distance_to_goal  10 and shot_angle  30 then prefer shoot over all other actions

  14. Incorporating AdviceMaclin et al., AAAI 2005 • Advice and Q-functions have same language • Linear expressions of features if v11 f1 + … + v1n fn  d1 … and vm1 f1 + … + vmn fn dn then Qshoot > Qother for all other

  15. Transfer Process Task A experience Task A Q-functions Mapping from user Task A  Task B Transfer Advice Task B experience Task B Q-functions

  16. Expressing Policy with Advice Qhold_ball(s) Qpass_near(s) Qpass_far(s) Old Q-functions Advice expressing policy ifQhold_ball(s)>Qpass_near(s) and Qhold_ball(s)>Qpass_far(s) then prefer hold_ball over all other actions

  17. Mapping Actions Qhold_ball(s) Qpass_near(s) Qpass_far(s) hold_ball  move pass_near  pass_near pass_far  Old Q-functions Mapping from user Mapped policy ifQhold_ball(s) >Qpass_near(s) and Qhold_ball(s) >Qpass_far(s) then prefer move over all other actions

  18. Mapping Features Mapping from user Q-function mapping Qhold_ball(s) = w1 (dist_keeper1)+ w2 (dist_taker2)+ … Q´hold_ball(s) = w1 (dist_attacker1)+ w2 (MAX_DIST)+ …

  19. Old model Mapped model Qx = wx1f1 + wx2f2 + bx Qy = wy1f1+by Qz = wz2f2 + bz Q´x = wx1f´1 + wx2f´2 + bx Q´y = wy1f´1+ by Q´z = wz2f´2 + bz Advice Advice (expanded) ifwx1f´1 + wx2f´2 + bx > wy1f´1 + by andwx1f´1 + wx2f´2 + bx > wz2 f´2 + bz then prefer x´ to all other actions ifQ´x>Q´y andQ´x>Q´z then prefer x´ Transfer Example

  20. Transfer Experiment • Between RoboCup subtasks • From 3-on-2 KeepAway • To 2-on-1 BreakAway • Two simultaneous mappings • Transfer passing skills • Map passing skills to shooting

  21. Experiment Mappings • Play a moving KeepAway game • Pass  Pass, Hold  Move • Pretend teammate is standing in the goal • Pass  Shoot imaginary teammate

  22. Experimental Methodology • Averaged over 10 BreakAway runs • Transfer: advice from one KeepAway model • Control: runs without advice

  23. Results

  24. Analysis • Transfer advice helps BreakAway learners • 7% more likely to score a goal after learning • Improvement is delayed • Advantage begins after 2500 games • Some advice rules apply rarely • Preconditions for shoot advice not often met

  25. Related Work: Transfer • Remember action subsequences [Singh, ML 1992] • Restrict action choices [Sherstov & Stone, AAAI 2005] • Transfer Q-values directly in KeepAway [Taylor & Stone, AAMAS 2005]

  26. Related Work: Advice • “Take action A now” [Clouse & Utgoff, ICML 1992] • “In situations S, action A has value X ”[Maclin & Shavlik, ML 1996] • “In situations S, prefer action A over B”[Maclin et al., AAAI 2005]

  27. Future Work • Increase speed of linear-program solving • Decrease sensitivity to imperfect advice • Extract advice from kernel-based models • Help user map actions and features

  28. Conclusions • Transfer exploits previously learned models to improve learning of new tasks • Advice is an appealing way to transfer • Linear regression approach incorporates advice straightforwardly • Transferring a policy accommodates different reward structures

  29. Acknowledgements • DARPA grant HR0011-04-1-0007 • United States Naval Research Laboratory grant N00173-04-1-G026 • Michael Ferris • Olvi Mangasarian • Ted Wild

More Related