350 likes | 694 Views
Autonomous Helicopter Flight Via Reinforcement Learning. Yanjie Li Harbin Institute of Technology Shenzhen Graduate School. Outline. Introduction Autonomous Helicopter Model Identification Reinforcement Learning (RL) Learning to Hover Flying Competition Maneuvers. Introduction.
E N D
Autonomous Helicopter Flight Via Reinforcement Learning Yanjie Li Harbin Institute of Technology Shenzhen Graduate School
Outline • Introduction • Autonomous Helicopter • Model Identification • Reinforcement Learning (RL) • Learning to Hover • Flying Competition Maneuvers
Introduction • Helicopter: a challenging control problem • High dimensional • Asymmetric • Nonlinear • A successful application of RL
Autonomous Helicopter • Helicopter: Yamaha R-50 (3.6m 20kg) • Inertial Navigation System (INS) • 3 accelerometers and 3 gyroscopes • A Differential GPS (a resolution of 2cm) • An onboard navigation computer • Kalman filter : GPS,INS, digital compass • Control Inputs:
Model Identification • Preparation: • Ask a human pilot to fly the helicopter for several minutes (339s for model identification and 140s for testing) • Record 12-dimensional state and 4-dimensional control inputs
Model Identification • Symmetries • not spatial coordinates • Body coordinates • Model
Weighted linear regression Cross validation Smoothing parameter Gauss weight function:
Several refinement • There are many determined terms in (0, 1/50, gravity) • Three extra variables (unobserved)
Reinforcement Learning (RL) • MDP : • State space • Initial state • State transition probabilities • Reward function • Discount factor • Family of policies • Objective: Find a policy to maximize the utility
Simulation: 1, 2, • Monte Carlo: Failed estimation
Common Random Number • PEGASUS RL Initial distribution Good estimation
Discretized action (Huge policy space) • Derivative Estimation Gradient-based Optimization
Learning to hover • Given hovering position and orientation • Policy class: Neural network tunable parameters: • Quadratic cost function: Weight: Scale each of terms to be roughly the same order of magnitude
Parallel simulation Expensive Monte Carlo Evaluation Repeat again and again Parallel implementation
Flying Competition Maneuvers • Academy of Model Aeronautics • RC helicopter competition (Class I-Class III) • Accurately flow through a number of maneuvers
How does one design a controller for flying trajectories? -axis flight We need a family of policies that take as input a trajectory
Flying trajectories and not only hovering • Take in account more of coupling
Trajectory Following • One simple choice (with time varying) Not good! • Trajectory following: the hovering controller is always trying to “catch up” to the moving X-axis
Change the reward function: Not use but instead Trajectory Projection Potential- based shaping reward
One Trick Allow to evolve in a way that is different from the path of the desired trajectory but in a way that allows the helicopter to follow the actual desired trajectory more accurately.
Bowed-out trajectory • Trajectories that have both a vertical and horizontal component • To climb: increase the collective pitch control, which causes the helicopter to start accelerating upward
How to correct this? • Slow down the z-response, i.e., delay the changes to by t seconds • t is another policy parameter
Thanks Q & A
Cross Validation • To determine Define where Choose