Transfer learning via advice taking
Download
1 / 43

Transfer Learning Via Advice Taking - PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on

Transfer Learning Via Advice Taking. Jude Shavlik University of Wisconsin-Madison. Acknowledgements. Lisa Torrey, Trevor Walker, & Rich Maclin DARPA IPTO Grant HR0011-04-1-0007 NRL Grant N00173-06-1-G002 DARPA IPTO Grant FA8650-06-C-7606. What Would You Like to Say to This Penguin?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Transfer Learning Via Advice Taking' - mari-compton


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Transfer learning via advice taking

Transfer Learning Via Advice Taking

Jude Shavlik

University of Wisconsin-Madison


Acknowledgements
Acknowledgements

  • Lisa Torrey, Trevor Walker, & Rich Maclin

  • DARPA IPTO Grant HR0011-04-1-0007

  • NRL Grant N00173-06-1-G002

  • DARPA IPTO Grant FA8650-06-C-7606


What would you like to say to this penguin
What Would You Like to Say to This Penguin?

IF a Bee is (Near and West) &

an Ice is (Nearand North)

Then

Begin

Move East

Move North

END


Empirical Results

With advice

Without advice


Our approach to transfer learning
Our Approach to Transfer Learning

Mapping

Extracted

Knowledge

Transferred

Knowledge

Extraction

Refinement

Target Task

Source Task


Potential benefits of transfer
Potential Benefits of Transfer

steeper slope

higher asymptote

higher start

performance

withtransfer

without transfer

training


Outline
Outline

  • Reinforcement Learning w/ Advice

  • Transfer via Rule Extraction & Advice Taking

  • Transfer via Macros

  • Transfer via Markov Logic Networks(time permitting)

  • Wrap Up


Reinforcement learning rl overview
Reinforcement Learning (RL) Overview

Described by a set offeatures

Sense state

Choose action

Policy: choose the action with the highest Q-value in the current state

Receive reward

Use the rewards to estimate the Q-values of actions in states



Robocup subtasks
RoboCup Subtasks

MoveDownfield

Mobile KeepAway

BreakAway

Variant of Stone & Sutton, ICML 2001


Q learning watkins phd 1989
Q Learning (Watkins PhD, 1989)

policy(state) =

argmaxaction

state

Q function

value

action

For large state spaces, need function approximation


Learning the q function
Learning the Q Function

distance(me,teammate1)

distance(me,opponent1)

angle(opponent1, me, teammate1)

0.2

-0.1

0.9

A Std Approach: Linear support-vector regression

Q-value =

Feature vector

Weight vector ●

T

Set weights to minimize

Model size + C × Data misfit


Advice in rl
Advice in RL

  • Advice provides constraints on Q values under specified conditions

    IF an opponent is near me

    AND a teammate is open

    THEN Q(pass(teammate)) > Q(move(ahead))

  • Apply as soft constraints in optimization

    Model size + C × Data misfit + μ× Advice misfit


Aside generalizing the idea of a training example for support vector machines svms
Aside: Generalizing the Idea of a Training Example for Support Vector Machines (SVMs)

Can extend the SVM

linear program to

handle “regions as

training examples”

Fung, Mangasarian, & Shavlik: NIPS 2003, COLT 2004


Specifying advice for support vector regression
Specifying Advice for Support Vector Regression

B x ≤ d  y ≥ h’ x + β

If input (x) is in region specified by Bandd

then output (y) should be above

some line (h’x + β)

y

x


Sample advice

Advice format

Bx≤d f(x) ≥ hx +

Sample Advice

If distanceToGoal≤ 10 and

shotAngle≥ 30

Then Q(shoot) ≥ 0.9

0.9


Sample advice taking results
Sample Advice-Taking Results

if distanceToGoal  10

and shotAngle  30

then prefer shoot over all other actions

Q(shoot) > Q(pass)

Q(shoot) > Q(move)

advice

2 vs 1 BreakAway, rewards +1, -1

std RL


Outline1
Outline

  • Reinforcement Learning w/ Advice

  • Transfer via Rule Extraction & Advice Taking

  • Transfer via Macros

  • Transfer via Markov Logic Networks

  • Wrap Up


Close transfer scenarios
Close-Transfer Scenarios

4-on-3 BreakAway

2-on-1 BreakAway

3-on-2 BreakAway


Distant transfer scenarios
Distant-Transfer Scenarios

3-on-2 KeepAway

3-on-2 BreakAway

3-on-2 MoveDownfield


Our first transfer learning approach exploit fact that models and advice in same language
Our First Transfer-Learning Approach:Exploit fact that models and advice in same language

Source Q functions

Mapped Q functions

Q´x = wx1f´1 + wx2f´2 + bx

Q´y = wy1f´1+ by

Q´z = wz2f´2 + bz

Qx = wx1f1 + wx2f2 + bx

Qy = wy1f1+by

Qz = wz2f2 + bz

Advice

Advice (expanded)

ifwx1f´1 + wx2f´2 + bx > wy1f´1 + by

andwx1f´1 + wx2f´2 + bx > wz2 f´2 + bz

then prefer x´ to y´ and z´

ifQ´x>Q´y

andQ´x>Q´z

then prefer x´


User advice in skill transfer
User Advice in Skill Transfer

  • There may be new skills in the target that cannot be learned from the source

  • We allow (human) users to add their own advice about these skills

User Advice for KeepAway to BreakAway

IF: distance(me, GoalPart) < 10 AND

angle(GoalPart, me, goalie) > 40

THEN: prefer shoot(GoalPart)


Sample human interaction
Sample Human Interaction

“Use what you learned in KeepAway, and add in this new action SHOOT.”

“Here is some advice about shooting …”

“Now go practice for awhile.”


Policy transfer to 3 on 2 breakaway
Policy Transfer to 3-on-2 BreakAway

Torrey, Walker, Shavlik & Maclin: ECML 2005


Our second approach use inductive logic programming ilp on source to extract advice
Our Second Approach: Use Inductive Logic Programming (ILP) on SOURCE to extract advice

good_action(pass(t1), state1)

good_action(pass(t2), state3)

good_action(pass(t1), state2)

good_action(pass(t2), state2)

good_action(pass(t1), state3)

Given

  • Positive and negative examples for each action

    Do

  • Learn first-order rules that describe most positive examples but few negative examples

good_action(pass(Teammate), State) :-

distance(me, Teammate, State) > 10,

distance(Teammate, goal, State) < 15.


Searching for an ilp clause top down search using a
Searching for an ILP Clause(top-down search using A*)


Skill transfer to 3 on 2 breakaway
Skill Transfer to 3-on-2 BreakAway

Torrey, Shavlik, Walker & Maclin: ECML 2006, ICML Workshop 2006


Approach 3 relational macros
Approach #3: Relational Macros

pass(Teammate) ←

isOpen(Teammate)

hold ← true

isClose(Opponent)

allOpponentsFar

  • A relational macro is a finite-state machine

  • Nodes represent internal states of agent in which independent policies apply

  • Conditions for transitions and actions are learned via ILP


Step 1 learning macro structure
Step 1: Learning Macro Structure

move(ahead)

pass(Teammate)

shoot(GoalPart)

  • Objective: find (via ILP) an action pattern that separates good and bad games

macroSequence(Game, StateA) ←

actionTaken(Game, StateA, move, ahead, StateB),

actionTaken(Game, StateB, pass, _, StateC),

actionTaken(Game, StateC, shoot, _, gameEnd).


Step 2 learning macro conditions
Step 2: Learning Macro Conditions

For the transition from move to pass

transition(State) ←

distance(Teammate, goal, State) < 15.

For the policy in the pass node

action(State, pass(Teammate)) ←

angle(Teammate, me, Opponent, State) > 30.

move(ahead)

pass(Teammate)

shoot(GoalPart)

  • Objective: describe when transitions and actions should be taken


Learned 2 on 1 breakaway macro
Learned 2-on-1 BreakAway Macro

pass(Teammate)

move(Direction)

shoot(goalRight)

shoot(goalLeft)

Player with BALL executes the macro

This shot is apparently a leading pass


Transfer via demonstration
Transfer via Demonstration

Demonstration

  • Execute the macro strategy to get Q-value estimates

  • Infer low Q values for actions not taken by macro

  • Compute an initial Q function with these examples

  • Continue learning with standard RL

    Advantage: potential for large immediate jump in performance

    Disadvantage: risk that agent will blindly follow an inappropriate strategy


Macro transfer to 3 on 2 breakaway
Macro Transfer to 3-on-2 BreakAway

Variant of Taylor & Stone

Torrey, Shavlik, Walker & Maclin: ILP 2007


Macro transfer to 4 on 3 breakaway
Macro Transfer to 4-on-3 BreakAway

Torrey, Shavlik, Walker & Maclin: ILP 2007


Outline2
Outline

  • Reinforcement Learning w/ Advice

  • Transfer via Rule Extraction & Advice Taking

  • Transfer via Macros

  • Transfer via Markov Logic Networks

  • Wrap Up


Approach 4 markov logic networks richardson and domingos mlj 2003
Approach #4: Markov Logic Networks(Richardson and Domingos, MLj 2003)

dist2 < 10

ang1 > 45

dist1 > 5

0.5 ≤ Q < 1.0

0 ≤ Q < 0.5

IF dist2 < 10

AND ang1 > 45

THEN 0.5 ≤ Q < 1.0

Wgt = 1.7

IF dist1 > 5

AND ang1 > 45

THEN 0 ≤ Q < 0.5

Wgt = 2.1


Using mlns to learn a q function
Using MLNs to Learn a Q Function

Q

  • Perform hierarchical clustering

    to find set of good Q-value bins

  • Use ILP to learn rules that

    classify examples into bins

  • Use MNL weight-learning methodsto choose weights for these formulas

IF dist1 > 5

AND ang1 > 45

THEN 0 ≤ Q < 0.1


Mln transfer to 3 on 2 breakaway
MLN Transfer to 3-on-2 BreakAway

Torrey, Shavlik, Natarajan, Kuppili & Walker: AAAI TL Workshop 2008


Outline3
Outline

  • Reinforcement Learning w/ Advice

  • Transfer via Rule Extraction & Advice Taking

  • Transfer via Macros

  • Transfer via Markov Logic Networks

  • Wrap Up


Summary of our transfer methods
Summary of Our Transfer Methods

  • Directly reuse weighted sums as advice

  • Use ILP to learn generalized advice for each action

  • Use ILP to learn macro-operators

  • Use Markov Logic Networks to learn probability distributions for Q functions


Our desiderata for transfer in rl
Our Desiderata for Transfer in RL

  • Transfer knowledge in first-order logic

  • Accept advice from humans expressed naturally

  • Refine transferred knowledge

  • Improve performance in related target tasks

  • Major challenge: Avoid negative transfer


Related work in rl transfer
Related Work in RL Transfer

  • Value-function transfer (Taylor & Stone 2005)

  • Policy reuse (Fernandez & Veloso 2006)

  • State abstractions (Walsh et al. 2006)

  • Options (Croonenborghs et al. 2007)

    Torrey and Shavlik survey paper on line


Conclusion
Conclusion

  • Transfer learning important perspective for machine learning- move beyond isolated learning tasks

  • Appealing ways to do transfer learning are via the advice-taking and demonstration perspectives

  • Long-term goal: instructable computing- teach computers the same way we teach humans


ad