Autonomous Helicopter Flight Via Reinforcement Learning

Autonomous Helicopter Flight Via Reinforcement Learning Yanjie Li Harbin Institute of Technology Shenzhen Graduate School

Outline • Introduction • Autonomous Helicopter • Model Identification • Reinforcement Learning (RL) • Learning to Hover • Flying Competition Maneuvers

Introduction • Helicopter: a challenging control problem • High dimensional • Asymmetric • Nonlinear • A successful application of RL

Yamaha R-50

Autonomous Helicopter • Helicopter: Yamaha R-50 (3.6m 20kg) • Inertial Navigation System (INS) • 3 accelerometers and 3 gyroscopes • A Differential GPS (a resolution of 2cm) • An onboard navigation computer • Kalman filter : GPS,INS, digital compass • Control Inputs:

Model Identification • Preparation: • Ask a human pilot to fly the helicopter for several minutes (339s for model identification and 140s for testing) • Record 12-dimensional state and 4-dimensional control inputs

Model Identification • Symmetries • not spatial coordinates • Body coordinates • Model

Weighted linear regression Cross validation Smoothing parameter Gauss weight function:

Several refinement • There are many determined terms in (0, 1/50, gravity) • Three extra variables (unobserved)

Reinforcement Learning (RL) • MDP : • State space • Initial state • State transition probabilities • Reward function • Discount factor • Family of policies • Objective: Find a policy to maximize the utility

Simulation: 1, 2, • Monte Carlo: Failed estimation

Common Random Number • PEGASUS RL Initial distribution Good estimation

Discretized action (Huge policy space) • Derivative Estimation Gradient-based Optimization

Learning to hover • Given hovering position and orientation • Policy class: Neural network tunable parameters: • Quadratic cost function: Weight: Scale each of terms to be roughly the same order of magnitude

Parallel simulation Expensive Monte Carlo Evaluation Repeat again and again Parallel implementation

Flying Competition Maneuvers • Academy of Model Aeronautics • RC helicopter competition (Class I-Class III) • Accurately flow through a number of maneuvers

How does one design a controller for flying trajectories? -axis flight We need a family of policies that take as input a trajectory

Flying trajectories and not only hovering • Take in account more of coupling

Trajectory Following • One simple choice (with time varying) Not good! • Trajectory following: the hovering controller is always trying to “catch up” to the moving X-axis

Change the reward function: Not use but instead Trajectory Projection Potential- based shaping reward

One Trick Allow to evolve in a way that is different from the path of the desired trajectory but in a way that allows the helicopter to follow the actual desired trajectory more accurately.

Bowed-out trajectory • Trajectories that have both a vertical and horizontal component • To climb: increase the collective pitch control, which causes the helicopter to start accelerating upward

How to correct this? • Slow down the z-response, i.e., delay the changes to by t seconds • t is another policy parameter

Thanks Q & A

Cross Validation • To determine Define where Choose

Variance Estimation

Autonomous Helicopter Flight Via Reinforcement Learning

Autonomous Helicopter Flight Via Reinforcement Learning

Presentation Transcript

Machine Learning Techniques For Autonomous Aerobatic Helicopter Flight

Learning Pastoralists Preferences via Inverse Reinforcement Learning (IRL)

Learning Parameterized Maneuvers for Autonomous Helicopter Flight

Autonomous Helicopter

Learning Pastoralists Preferences via Inverse Reinforcement Learning (IRL)

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Autonomous Helicopter

Reinforcement Learning

REINFORCEMENT LEARNING

An Application of Reinforcement Learning to Autonomous Helicopter Flight

Apprenticeship Learning for Robotics, with Application to Autonomous Helicopter Flight

Flight Time Allocation Using Reinforcement Learning

Apprenticeship Learning via Inverse Reinforcement Learning

Autonomous Inter-Task Transfer in Reinforcement Learning Domains

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning