slide1 l.
Skip this Video
Loading SlideShow in 5 Seconds..
Test Recap PowerPoint Presentation
Download Presentation
Test Recap

Loading in 2 Seconds...

play fullscreen
1 / 53

Test Recap - PowerPoint PPT Presentation

  • Uploaded on

Learning Locomotion: Extreme Learning For Extreme Terrain CMU: Chris Atkeson Drew Bagnell, James Kuffner, Martin Stolle, Hanns Tappeiner, Nathan Ratliff, Joel Chestnutt, Michael Dille, Andrew Maas CMU Robotics Institute Test Recap Test 1: Everything went well.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Test Recap' - Faraday

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Learning Locomotion: Extreme Learning For Extreme TerrainCMU: Chris AtkesonDrew Bagnell, James Kuffner, Martin Stolle, Hanns Tappeiner, Nathan Ratliff, Joel Chestnutt, Michael Dille, Andrew MaasCMU Robotics Institute

test recap
Test Recap
  • Test 1: Everything went well.
  • Test 2: Straight approach worked well.

Side approach: bad plan, bug.

  • Test 3: Dependence on power supply.

Big step up.

  • Test 4: Need to handle dog variations.
hierarchical approach
Hierarchical Approach
  • Footstep planner
  • Body and leg trajectory planner
  • Execution
footstep planner in action
Footstep Planner in Action


Cost map:


global footstep path planning
Global Footstep Path Planning
  • Use A* to plan a safe sequence of footsteps from the current robot configuration to the goal.
  • Try to stay as close to that plan as possible during the trial, replan when
    • We measure that we have deviated from the planned path by a certain amount.
    • We tip over while taking a step.
a details
A* Details
  • Cost for each foot location and orientation is pre-computed at startup (usually while the robot is calibrating).
  • Cost includes angle of terrain, flatness of terrain, distance to any drop-offs, and a heuristic measure of whether the knee will hit any terrain at position and orientation.
  • Heuristic is currently Euclidean distance to the goal.
action model
Action Model
  • A base foot location is based on the body position and orientation.
  • From that base location, a reference set of actions is applied.
  • 8 actions for each front foot, 1 action for each rear foot.
  • The front feet “lead” the robot, with the rear feet just following along.

Robot Body

(from above)

Base Foot Location

Reference Actions

adapting the reference actions
Adapting the Reference Actions

A local search is performed for each action to find a safe location that is near the reference action and still within the reachability of the robot.

Reference Actions

foot and body trajectories
Foot and Body Trajectories
  • Foot trajectory based on convex hull of intervening terrain.
  • Body trajectory is newly created at each step, based on the next two steps in the path, and has two stages:
    • Move into triangle of support before lifting the foot.
    • Stay within the polygon of support and move forward while foot is in flight.
foot contact detection
Foot Contact Detection
  • Foot sensor (not reliable).
  • Predicted foot below terrain.
  • Z velocity approximately zero and Z acceleration positive.
  • Compliance?
  • IMU signals? Not for us.
  • Motor torque?

IMU rx, ry


Blue = Actual

Red = Desired



test 3
Test 3
  • Software was set for external power rather than battery.
  • Initial step up was higher than expected (initial board level).

Varying Speed

Front left hip ry

Blue = Actual

Red = Desired



Front left hip ry

Blue = Actual

Red = Desired

Blue = Motor

Red = Is_Saturated


Slow Speed:

Front left hip ry

Blue = Actual

Red = Desired

Blue = Motor

Red = Saturated?

  • Manipulate clock (works for static walking)
  • Bang-bang-servo (allows dynamic locomotion).
test 4 what we learned
Test 4: What we learned
  • Need to be robust to vehicle variation:
    • Fore/aft effective center of mass (tipping back)
    • Side/side effective center of mass (squatting)
    • Leg swing
plans for future development
Plans For Future Development
  • Learn To Make Better Plans
  • Learn To Plan From Observation
  • Memory-Based Policy Learning
  • Dynamic Locomotion
planning what can be learned
Planning: What can be learned?
  • Better primitives to plan with
  • Better robot/environment models
  • Planning parameters
  • Better models of robot capabilities
  • Better terrain and action cost functions
  • Better failure models and models of risk
  • Learn how to plan: bias to plan better and faster
  • How: Policy search, parameter optimization, …
learn to make better plans
Learn To Make Better Plans
  • It takes several days to manually tune a planner.
  • We will use policy search techniques to automate this tuning process.
  • The challenge is to do it efficiently.
learn to plan from observation
Learn To Plan From Observation
  • Key issue: Do we learn cost functions or value functions?
learn cost functions maximum margin planning mmp algorithm
Learn Cost Functions: Maximum Margin Planning (MMP) Algorithm
  • Assumption: cost function is formed as a linear combination of feature maps
  • Training examples: Run current planner through a number of terrains and take resulting body trajectories as example paths
linear combination of features
Linear combination of features

tree detector

open space

smoothed trees






mmp algorithm
MMP Algorithm

Until convergence do

  • Compute cost maps as linear combination of features
  • Technical step: slightly increase the cost of cells you want the planner to plan through
    • Makes it more difficult for the planner to be right
    • Train planner on harder problems to ensure good test performance
  • Plan through these cost maps using D*
  • Update based on mistakes:
    • If planned path doesn't match example then
      • Raise cost of features found along planned path
      • Lower cost of features found along example path
mmp algorithm properties
MMP Algorithm Properties
  • Algorithm equivalent to a convex optimization problem => no local minima
  • Rapid (linear) convergence
  • Maximum margin interpretation (via the "loss-augmentation" step 2)
  • Machine learning theoretic guarantees in online and offline settings
  • Can use boosting to solve feature selection
why values
Why Values?
  • Captures the entire cost-to-go: follow the value-function with just one-step look ahead for optimality (no planning necessary)
  • Learnable in principle: use regression techniques to approximate cost-to-go
why plans
Why Plans?
  • In practice, very hard to learn useful value-functions
    • High dimensional: curse of dimensionality
    • Value features don’t generalize well to new environments
    • Hard to build in useful domain knowledge
  • Instead, can take planning approach
    • Lots of domain knowledge
    • Costs *do* generalize
    • But:
      • computationally hard-- curse of dimensionality strikes back
hybrid algorithm
Hybrid Algorithm

Space of values:

high dimensional

  • A new extension of Maximum Margin Planning to do structured regression: predict values with a planner in the loop


Linear combination


“Value” Features

Learned Space of costs

proto results
  • Demonstrated an earlier algorithm (MMPBoost) on learning a heuristic
    • Provided orders of magnitude faster planning
  • Adapting now to the higher dimensional space of footsteps instead of heuristic
    • Hope to bridge the gap: reinforcement learning/value-function approximation with the key benefits of planning and cost-functions
memory based policy learning
Memory-Based Policy Learning
  • 1. Remember plans (all have same goal).
  • 2. Remember refined plans.
  • 3. Remember plans (many goals) – need planner to create policy.
  • 4. Remember experiences – need planner to create policy.
  • We are currently investigating option 1. We will explore options 2, 3, and 4.

Plan Libraries

  • Combine local plans to create global policy.
  • Local planners: decoupled planner, A*, RRT, DDP, DIRCOL.
  • Remember refined plans, experiences
forward planning to generate trajectory library
Forward Planning To GenerateTrajectory Library

Trajectory Library

Single A* search

tasks we have trouble on
Tasks We Have Trouble On
  • Not slipping / maintaining footing
  • More terrain tilt / rock climbing footholds.
  • Big step ups and step downs.
  • Dynamic maneuvers (jump over ditch).
  • Dynamic locomotion (trot, pace, bound).
future tests52
1) Longer and/or wider course with more choices.- could be done with test facility with more extensive mocap system.- could be done by using onboard vision to detect marked obstacles.

2) More trials per test (10?) so can demonstrate learning during test.Score only 3? best. Cut testing off after an hour with whatever trialshave been performed.

3) New terrain boards with harder obstacles, and/or obstacles thatrequire dynamic moves. Big step ups and step downs

4) Put surprises in terrain (terrain file errors) such as terrain a littlehigher or lower than expected. Test quality of control systems.

5) Revisit the evaluation function: should falls be penalized more? How much does speed matter over robustness? It seems that failing fast is currently a winning strategy.

6) One concern is that our movement strategies do notcompete with Rhex like approaches, i.e., clever open loop robustness.We need to demonstrate that the super "cognitive" dog is possible thatis ALWAYS in control. Need to think more about how to do this.

7) Trotting/pacing/bounding on rough terrain would really push ourability to control the dog. Not clear how a test would encourage thatother than just mandating the gait to be used.

8)Simulate Perception: provide point cloud at a fixed radius on each tick, perhaps with distance weighted random noise.

Future Tests
what to learn
What to learn?
  • Plan better.
  • Plan faster.
  • Robustness.
  • Special cases.
  • Utilize vehicle dynamics.