Learning from how dogs learn - PowerPoint PPT Presentation

Learning from how dogs learn l.jpg
Download
1 / 40

  • 233 Views
  • Updated On :
  • Presentation posted in: Pets / Animals

Learning from how dogs learn. Prof. Bruce Blumberg The Media Lab, MIT bruce@media.mit.edu www.media.mit.edu/~bruce. About me…. About me…. Practical & compelling real-time learning. Easy for interactive characters to learn what they ought to be able to learn

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Learning from how dogs learn

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Learning from how dogs learn l.jpg

Learning from how dogs learn

Prof. Bruce Blumberg

The Media Lab, MIT

bruce@media.mit.edu

www.media.mit.edu/~bruce


About me l.jpg

About me…


About me3 l.jpg

About me…


Practical compelling real time learning l.jpg

Practical & compelling real-time learning

  • Easy for interactive characters to learn what they ought to be able to learn

  • Easy for a human trainer to guide learning process

  • A compelling user experience

  • Provide heuristics and practical design principles


My bias focus l.jpg

My bias & focus

  • Learning occurs within an innate structure that biases…

    • Attention

    • Motivation

    • Innate frequency, form and organization of behavior

    • When certain things are most easily learned

  • What are the catalytic components of the scaffolding that make learning possible?


Sheep dog trial by eire l.jpg

sheep|dog:trial by eire

See sheep|dog video on my website


Object persistence l.jpg

Object persistence

See object persistence video on my website


Temporal representation l.jpg

Temporal representation

See temporal representation (aka Goatzilla) video on my website


Alpha wolf l.jpg

Alpha Wolf

See alpha wolf video on my website


Rover@home l.jpg

Rover@home

See rover@home video on my website or go to Scientific American Frontiers website


Dobie t coyote goes to school l.jpg

Dobie T. Coyote Goes to School

See Dobie video on my website


Why look at dog training l.jpg

Why look at Dog Training?

  • Interactive characters pose unique challenges:

    • State, action and state-action spaces are often continuous and far too big to search exhaustively

    • To be compelling characters must

      • Learn “obvious” contingencies between state, actions and consequences quickly

      • Easy to train without visibility into internal state of character.

      • Learning is only one thing they have to do.

  • Dogs and their trainers seem to solve these problems easily


Invaluable resources l.jpg

Invaluable resources

  • Doing it, and talking to people who do it.

  • Wilkes, Pryor, Ramirez

  • Lindsay, Burch & Bailey, Mackintosh

  • Lorenz, Leyhausen, Coppinger & Coppinger


The problem facing dogs real and synthetic l.jpg

The problem facing dogs (real and synthetic)

Set of all motivational goals

Set of all possible stimuli

Set of all possible actions

What do I do, when, in order to best satisfy my motivational goals?


The space of possible stimuli is wicked big l.jpg

Modality of Stimuli

Smells

Sounds

Dog sounds

Motion

Set of all possible stimuli

Speech

Whistles

The space of possible stimuli is wicked big

State Space

Time of Occurence


The space of possible actions is also very big l.jpg

Left ear twitch

Shake

High -5

Low shake

Down

Beg

Figure -8

The space of possible actions is also very big

Action

Set of all possible actions

Action Space

Time of Performance


Who gets credit for good things happening l.jpg

Sounds

Low shake

Dog sounds

Motion

Speech

Whistles

Who gets credit for good things happening?

Yumm..

Modality of Stimuli

Action

Left ear twitch

Shake

High -5

Down

Beg

Figure -8


Who gets credit for good things happening18 l.jpg

orient

chase

eye

grab-bite

kill-bite

stalk

Who gets credit for good things happening?

Yumm..

Time


Conventional idea back propagation from goal l.jpg

Conventional idea: back propagation from goal

Yumm..

grab-bite

stalk

orient

kill-bite

eye

chase

Time

Credit flows backward


Conventional idea back propagation from goal20 l.jpg

Conventional idea: back propagation from goal

Yumm..

grab-bite

stalk

orient

kill-bite

eye

chase

Time

Credit flows backward


Conventional idea back propagation from goal21 l.jpg

Conventional idea: back propagation from goal

Yumm..

grab-bite

stalk

orient

kill-bite

eye

chase

Time

Credit flows backward


The problem l.jpg

The problem

  • If each element in sequence has 3 variants, there are 729 possible combinations of which 1 may work (ignoring stimuli)

  • If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations of stimuli-action pairs to explore.

  • Don’t know if it is the right sequence until goal is reached

  • What happens if “variant” needs to be learned?


Leyhausen s suggestion l.jpg

Leyhausen’s suggestion…

motivation & reward

motivation & reward

motivation & reward

motivation & reward

motivation & reward

motivation & reward

grab-bite

stalk

orient

kill-bite

eye

chase

Time

Each element is innately self-motivating and has innate reward metric


Leyhausen s suggestion24 l.jpg

Leyhausen’s suggestion…

motivation & reward

motivation & reward

motivation & reward

motivation & reward

motivation & reward

motivation & reward

grab-bite

stalk

orient

kill-bite

eye

chase

Time

Each element is innately self-motivating and has innate reward metric


Coppinger s suggestion l.jpg

Coppinger’s suggestion…

grab-bite

stalk

orient

kill-bite

eye

chase

Time

Varying innate tendency to follow behavior with “next” in sequence


Functional goal plays incidental role l.jpg

Functional goal plays incidental role

Yumm..

grab-bite

stalk

orient

kill-bite

eye

chase

Time

Propagated value from functional goal plays incidental role


Big idea innate biases make learning possible l.jpg

Big idea: innate biases make learning possible

  • Biases include…

    • Temporal Proximity implies causality

    • Attend more readily to certain classes of stimuli than to others (motion vs. speech)

    • Lazy discovery (pay attention once you have a reason to pay attention)

    • Elements may be “innately” self-motivating and have local metric of “goodness”


Good trainers actively guide dog s exploration l.jpg

Good trainers actively guide dog’s exploration

  • Behavioral

    • Train behavior, then cue

    • Differential rewards encourage variability

  • Motor

    • Shaping

      • Rewarding successive approximations

    • Luring

      • Pose, e.g. “down”

      • Trajectory, e.g. “figure-8”


Dogs constrain search for causal agents l.jpg

Dogs constrain search for causal agents

Attention Window:

Cue given immediately before or as dog is moving intodesired pose

Consequences Window:

Trainer “clicks” signaling reward is coming.

When reward is actually received

Sit

Approach

Eat

Time

Dogs make the problem tractable by constraining search for causal agents to narrow temporal windows


Dogs use implicit feedback to guide perceptual learning l.jpg

Dogs use implicit feedback to guide perceptual learning

“sit-utterance” perceived.

“click” perceived.

Sit

Approach

Eat

Time

Dog decides to sit

Build & update perceptual model of “sit-utterance”

Dogs use rewarded action to identify potentially promising state to explore and to guide formation of perceptual models


Dogs give credit where credit is due l.jpg

Dogs give credit where credit is due…

  • Trainer repeatedly lures dog through a trajectory or into a pose

  • Eventually, dog performs behavior spontaneously

  • Implication

    • Dog associates reward with resulting body configuration or trajectory and not just with “follow-your nose”


Observation dogs give credit where credit is due l.jpg

Observation: dogs give credit where credit is due

“sit-utterance” perceived.

“click” perceived.

Sit

Approach

Eat

Time

Dog decides to sit

Credit sitting in presence of “sit-utterance”

Build & update perceptual model of “sit-utterance”


D l take advantage of predictable regularities l.jpg

D.L.: Take Advantage of Predictable Regularities

  • Constrain search for causal agents by taking advantage of temporal proximity & natural hierarchy of state spaces

    • Use consequences to bias choice of action

    • But vary performance and attend to differences

  • Explore state and action spaces on “as-needed” basis

    • Build models on demand


D l make use of all feedback explicit implicit l.jpg

D.L.: Make Use of All Feedback: Explicit & Implicit

  • Use rewarded action as context for identifying

    • Promising state space and action space to explore

    • Good examples from which to construct perceptual models, e.g.,

      • A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit.


D l make them easy to train l.jpg

D.L.: Make Them Easy to Train

  • Respond quickly to “obvious” contingencies

  • Support Luring and Shaping

    • Techniques to prompt infrequently expressed or novel motor actions

  • “Trainer friendly” credit assignment

    • Assign credit to candidate that matches trainer’s expectation


The system l.jpg

The System


Dobie t coyote l.jpg

Dobie T. Coyote…

See dobie video on my website


Limitations and future work l.jpg

Limitations and Future Work

  • Important extensions

    • Other kinds of learning (e.g., social or spatial)

    • Generalization

    • Sequences

    • Expectation-based emotion system

  • How will the system scale?


Useful insights l.jpg

Useful Insights

  • Use

    • Temporal proximity to limit search.

    • Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration

    • “trainer friendly” credit assignment

  • Luring and shaping are essential


Acknowledgements l.jpg

Acknowledgements

  • Members of the Synthetic Characters Group, past, present & future

  • Gary Wilkes

  • Funded by the Digital Life Consortium


  • Login