1 / 53

Test Recap

Learning Locomotion: Extreme Learning For Extreme Terrain CMU: Chris Atkeson Drew Bagnell, James Kuffner, Martin Stolle, Hanns Tappeiner, Nathan Ratliff, Joel Chestnutt, Michael Dille, Andrew Maas CMU Robotics Institute Test Recap Test 1: Everything went well.

Faraday
Download Presentation

Test Recap

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Locomotion: Extreme Learning For Extreme TerrainCMU: Chris AtkesonDrew Bagnell, James Kuffner, Martin Stolle, Hanns Tappeiner, Nathan Ratliff, Joel Chestnutt, Michael Dille, Andrew MaasCMU Robotics Institute

  2. Test Recap • Test 1: Everything went well. • Test 2: Straight approach worked well. Side approach: bad plan, bug. • Test 3: Dependence on power supply. Big step up. • Test 4: Need to handle dog variations.

  3. Test 0: Establish Learning Context

  4. Test 1: 5 trials (x10 speedup)

  5. Reinforcement Learning: Reward

  6. Hierarchical Approach • Footstep planner • Body and leg trajectory planner • Execution

  7. Footstep Planner in Action Terrain: Cost map: FootstepPlan:

  8. Global Footstep Path Planning • Use A* to plan a safe sequence of footsteps from the current robot configuration to the goal. • Try to stay as close to that plan as possible during the trial, replan when • We measure that we have deviated from the planned path by a certain amount. • We tip over while taking a step.

  9. A* Details • Cost for each foot location and orientation is pre-computed at startup (usually while the robot is calibrating). • Cost includes angle of terrain, flatness of terrain, distance to any drop-offs, and a heuristic measure of whether the knee will hit any terrain at position and orientation. • Heuristic is currently Euclidean distance to the goal.

  10. Action Model • A base foot location is based on the body position and orientation. • From that base location, a reference set of actions is applied. • 8 actions for each front foot, 1 action for each rear foot. • The front feet “lead” the robot, with the rear feet just following along. Robot Body (from above) Base Foot Location Reference Actions

  11. Adapting the Reference Actions A local search is performed for each action to find a safe location that is near the reference action and still within the reachability of the robot. Reference Actions

  12. Decreasing Safety Margins 2 cm 0 cm

  13. Effect on Paths 2 cm 0 cm

  14. Foot and Body Trajectories • Foot trajectory based on convex hull of intervening terrain. • Body trajectory is newly created at each step, based on the next two steps in the path, and has two stages: • Move into triangle of support before lifting the foot. • Stay within the polygon of support and move forward while foot is in flight.

  15. Interface

  16. Foot Contact Detection • Foot sensor (not reliable). • Predicted foot below terrain. • Z velocity approximately zero and Z acceleration positive. • Compliance? • IMU signals? Not for us. • Motor torque?

  17. Test 2: 5 trials (x10 speedup)

  18. Why did we fail?

  19. IMU rx, ry fl_rx Blue = Actual Red = Desired fr_rx hr_rx

  20. Why did we fail?

  21. Reinforcement Learning: Punishment

  22. Test 3 • Software was set for external power rather than battery. • Initial step up was higher than expected (initial board level).

  23. Varying Speed Front left hip ry Blue = Actual Red = Desired

  24. Saturation: Front left hip ry Blue = Actual Red = Desired Blue = Motor Red = Is_Saturated

  25. Slow Speed: Front left hip ry Blue = Actual Red = Desired Blue = Motor Red = Saturated?

  26. Fixes • Manipulate clock (works for static walking) • Bang-bang-servo (allows dynamic locomotion).

  27. Power Supply Axed To Avoid Further Errors (Secondary Reinforcer For Dog)

  28. Test 4

  29. Test 4

  30. Test 4: What we learned • Need to be robust to vehicle variation: • Fore/aft effective center of mass (tipping back) • Side/side effective center of mass (squatting) • Leg swing

  31. Plans For Future Development • Learn To Make Better Plans • Learn To Plan From Observation • Memory-Based Policy Learning • Dynamic Locomotion

  32. Planning: What can be learned? • Better primitives to plan with • Better robot/environment models • Planning parameters • Better models of robot capabilities • Better terrain and action cost functions • Better failure models and models of risk • Learn how to plan: bias to plan better and faster • How: Policy search, parameter optimization, …

  33. Learn To Make Better Plans • It takes several days to manually tune a planner. • We will use policy search techniques to automate this tuning process. • The challenge is to do it efficiently.

  34. Learn To Plan From Observation • Key issue: Do we learn cost functions or value functions?

  35. Learn Cost Functions: Maximum Margin Planning (MMP) Algorithm • Assumption: cost function is formed as a linear combination of feature maps • Training examples: Run current planner through a number of terrains and take resulting body trajectories as example paths

  36. Linear combination of features tree detector open space smoothed trees slope w1 w4 w2 w3

  37. MMP Algorithm Until convergence do • Compute cost maps as linear combination of features • Technical step: slightly increase the cost of cells you want the planner to plan through • Makes it more difficult for the planner to be right • Train planner on harder problems to ensure good test performance • Plan through these cost maps using D* • Update based on mistakes: • If planned path doesn't match example then • Raise cost of features found along planned path • Lower cost of features found along example path

  38. MMP Algorithm Properties • Algorithm equivalent to a convex optimization problem => no local minima • Rapid (linear) convergence • Maximum margin interpretation (via the "loss-augmentation" step 2) • Machine learning theoretic guarantees in online and offline settings • Can use boosting to solve feature selection

  39. Learned Cost Maps

  40. Learn the Value function Build a Planner Key Issue:Two Approaches to Control

  41. Why Values? • Captures the entire cost-to-go: follow the value-function with just one-step look ahead for optimality (no planning necessary) • Learnable in principle: use regression techniques to approximate cost-to-go

  42. Why Plans? • In practice, very hard to learn useful value-functions • High dimensional: curse of dimensionality • Value features don’t generalize well to new environments • Hard to build in useful domain knowledge • Instead, can take planning approach • Lots of domain knowledge • Costs *do* generalize • But: • computationally hard-- curse of dimensionality strikes back

  43. Hybrid Algorithm Space of values: high dimensional • A new extension of Maximum Margin Planning to do structured regression: predict values with a planner in the loop Learned Linear combination Planner “Value” Features Learned Space of costs

  44. Proto-results • Demonstrated an earlier algorithm (MMPBoost) on learning a heuristic • Provided orders of magnitude faster planning • Adapting now to the higher dimensional space of footsteps instead of heuristic • Hope to bridge the gap: reinforcement learning/value-function approximation with the key benefits of planning and cost-functions

  45. Memory-Based Policy Learning • 1. Remember plans (all have same goal). • 2. Remember refined plans. • 3. Remember plans (many goals) – need planner to create policy. • 4. Remember experiences – need planner to create policy. • We are currently investigating option 1. We will explore options 2, 3, and 4.

  46. Plan Libraries • Combine local plans to create global policy. • Local planners: decoupled planner, A*, RRT, DDP, DIRCOL. • Remember refined plans, experiences

  47. Forward Planning To GenerateTrajectory Library Trajectory Library Single A* search

  48. A Plan Library For Little Dog

  49. Commands Remembering Refined Plans Errors Before After

  50. Future Tests

More Related