Learning from demonstration atkeson and schaal
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

Learning From Demonstration Atkeson and Schaal PowerPoint PPT Presentation


  • 55 Views
  • Uploaded on
  • Presentation posted in: General

Learning From Demonstration Atkeson and Schaal. Dang, RLAB Feb 28 th , 2007. Goal. Robot Learning from Demonstration Small number of human demonstrations Task level learning (learn intent, not just mimicry) Explore Parametric vs. nonparametric learning role of a priori knowledge.

Download Presentation

Learning From Demonstration Atkeson and Schaal

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Learning from demonstration atkeson and schaal

Learning From DemonstrationAtkeson and Schaal

Dang, RLAB

Feb 28th, 2007


Learning from demonstration atkeson and schaal

Goal

  • Robot Learning from Demonstration

    • Small number of human demonstrations

    • Task level learning (learn intent, not just mimicry)

  • Explore

    • Parametric vs. nonparametric learning

    • role of a priori knowledge

Dang, RLAB


Known task

Known Task

  • Pendulum swing-up task

    • Like pole balancing, but more complex

    • Difficult, but easy to evaluate success

  • Simplified

    • Restricted to horz. motion

    • Impt. variables picked out

      • Pendulum angle

      • Pendulum angular velocity

      • Hand location

      • Hand velocity

      • Hand acceleration

Dang, RLAB


Implementation details

Implementation details

  • SARCOS 7DOF arm

  • Stereo Vision, colored ball indicators

  • 0.12s delay overcome with Kalman filter

    • Idealized pendulum dynamics

  • Redundant inverse kinematics and real-time inverse dynamics for control

Dang, RLAB


Learning

Learning

  • Task composed of two subtasks

    • Believe that subtask learning accelerates new task learning

  • 1 Pole Swing up

    • open-loop

  • 2 Upright Balance

    • Feedback

  • Focus here on swing-up

    • Balancing already learned

  • Dang, RLAB


    First approach

    First approach

    • Directly mimic human hand movement

      • Fails

        • Differences in human and robot capabilities

        • Improper demonstration (not horizontal)

        • Imprecise mimicry

    Dang, RLAB


    Approach the second

    Approach the second

    • Learn reward

    • Learn a model

    • Use human demonstration as seed so a planner can find a good policy

    Dang, RLAB


    Learn task model

    Learn Task Model

    • Parametric:

      • learn parameters via linear regression

    • Nonparametric

      • Use Locally Weighted Learning

      • Given desired variable and a set of possibly relevant input variables

        • Cross validation to tune meta-parameters

    Dang, RLAB


    Swing up

    Swing up

    • Transition to balance occurs at ± 0.5 radians with angular vel. < 3 rad/sec

    • Reward function set to make robot want to be like demonstrator

    Dang, RLAB


    Parametric

    Parametric

    • Parameters learned from failure data

    • Trajectory optimized using human trajectory as seed

    • SUCCESS

    Dang, RLAB


    Nonparametric

    Nonparametric

    • Slower, but still successful

    Dang, RLAB


    Harder task

    Harder Task

    • Double pump swing up

      • Approach fails

        • Believed to be due to improper modeling of the system

        • Solved by

    Dang, RLAB


    Direct task level learning

    Direct task-level learning

    • Learn a correction term to add to the target angle

      • Now target ± (0.5+∆)rad

      • Use binary search

    • Worked for parametric

    • Didn’t for nonparametric

      • Left region of validity of local models

      • So, tweak velocity all over

        • Binary search for coefficient

    Dang, RLAB


    Results

    Results

    Dang, RLAB


    Summary of technique

    Summary of Technique

    Succeeds for

    Math

    Watch demo, mimic hand

    None

    Learn model,

    optimize demo trajectory

    Parametric, single

    Tune model, reoptimize

    Nonparametric, single

    Binary search for delta

    Parametric, double

    Binary search for c

    Nonparametric, double

    Dang, RLAB


    Discussion points

    Discussion points

    • Reward function was given or learned?

    • Does task-level direct learning make sense?

      • Only useful in this task / implementation?

      • I in PID?

    • Nonparametrics don’t avoid all modeling errors

      • Poor planner?

      • Not enough data?

    • A priori knowledge

      • human selects inputs, outputs, control system, perception, model selection, reward function, task segmenting, task factors

    • It Works!

    Dang, RLAB


  • Login