1 / 30

Autonomous Helicopter Flight Via Reinforcement Learning

Autonomous Helicopter Flight Via Reinforcement Learning. Yanjie Li Harbin Institute of Technology Shenzhen Graduate School. Outline. Introduction Autonomous Helicopter Model Identification Reinforcement Learning (RL) Learning to Hover Flying Competition Maneuvers. Introduction.

MikeCarlo
Download Presentation

Autonomous Helicopter Flight Via Reinforcement Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Autonomous Helicopter Flight Via Reinforcement Learning Yanjie Li Harbin Institute of Technology Shenzhen Graduate School

  2. Outline • Introduction • Autonomous Helicopter • Model Identification • Reinforcement Learning (RL) • Learning to Hover • Flying Competition Maneuvers

  3. Introduction • Helicopter: a challenging control problem • High dimensional • Asymmetric • Nonlinear • A successful application of RL

  4. Yamaha R-50

  5. Autonomous Helicopter • Helicopter: Yamaha R-50 (3.6m 20kg) • Inertial Navigation System (INS) • 3 accelerometers and 3 gyroscopes • A Differential GPS (a resolution of 2cm) • An onboard navigation computer • Kalman filter : GPS,INS, digital compass • Control Inputs:

  6. Model Identification • Preparation: • Ask a human pilot to fly the helicopter for several minutes (339s for model identification and 140s for testing) • Record 12-dimensional state and 4-dimensional control inputs

  7. Model Identification • Symmetries • not spatial coordinates • Body coordinates • Model

  8. Weighted linear regression Cross validation Smoothing parameter Gauss weight function:

  9. Several refinement • There are many determined terms in (0, 1/50, gravity) • Three extra variables (unobserved)

  10. Reinforcement Learning (RL) • MDP : • State space • Initial state • State transition probabilities • Reward function • Discount factor • Family of policies • Objective: Find a policy to maximize the utility

  11. Simulation: 1, 2, • Monte Carlo: Failed estimation

  12. Common Random Number • PEGASUS RL Initial distribution Good estimation

  13. Discretized action (Huge policy space) • Derivative Estimation Gradient-based Optimization

  14. Learning to hover • Given hovering position and orientation • Policy class: Neural network tunable parameters: • Quadratic cost function: Weight: Scale each of terms to be roughly the same order of magnitude

  15. Parallel simulation Expensive Monte Carlo Evaluation Repeat again and again Parallel implementation

  16. Flying Competition Maneuvers • Academy of Model Aeronautics • RC helicopter competition (Class I-Class III) • Accurately flow through a number of maneuvers

  17. How does one design a controller for flying trajectories? -axis flight We need a family of policies that take as input a trajectory

  18. Flying trajectories and not only hovering • Take in account more of coupling

  19. Trajectory Following • One simple choice (with time varying) Not good! • Trajectory following: the hovering controller is always trying to “catch up” to the moving X-axis

  20. Change the reward function: Not use but instead Trajectory Projection Potential- based shaping reward

  21. One Trick Allow to evolve in a way that is different from the path of the desired trajectory but in a way that allows the helicopter to follow the actual desired trajectory more accurately.

  22. Bowed-out trajectory • Trajectories that have both a vertical and horizontal component • To climb: increase the collective pitch control, which causes the helicopter to start accelerating upward

  23. How to correct this? • Slow down the z-response, i.e., delay the changes to by t seconds • t is another policy parameter

  24. Thanks Q & A

  25. Cross Validation • To determine Define where Choose

  26. Variance Estimation

More Related