Autonomous Helicopter Flight Via Reinforcement Learning

### Autonomous Helicopter Flight Via Reinforcement Learning

Yanjie Li

Harbin Institute of Technology

Shenzhen Graduate School

Outline

- Introduction
- Autonomous Helicopter
- Model Identification
- Reinforcement Learning (RL)
- Learning to Hover
- Flying Competition Maneuvers

Introduction

- Helicopter: a challenging control problem
- High dimensional
- Asymmetric
- Nonlinear
- A successful application of RL

Autonomous Helicopter

- Helicopter: Yamaha R-50 (3.6m 20kg)
- Inertial Navigation System (INS)
- 3 accelerometers and 3 gyroscopes
- A Differential GPS (a resolution of 2cm)
- An onboard navigation computer
- Kalman filter : GPS,INS, digital compass
- Control Inputs:

Model Identification

- Preparation:
- Ask a human pilot to fly the helicopter for several minutes (339s for model identification and 140s for testing)
- Record 12-dimensional state and 4-dimensional control inputs

Model Identification

- Symmetries
- not spatial coordinates
- Body coordinates
- Model

Several refinement

- There are many determined terms in

(0, 1/50, gravity)

- Three extra variables (unobserved)

Reinforcement Learning (RL)

- MDP :
- State space
- Initial state
- State transition probabilities
- Reward function
- Discount factor
- Family of policies
- Objective:

Find a policy to maximize the utility

Learning to hover

- Given hovering position and orientation
- Policy class: Neural network

tunable parameters:

- Quadratic cost function:

Weight: Scale each of terms to be roughly the same order of magnitude

Flying Competition Maneuvers

- Academy of Model Aeronautics
- RC helicopter competition (Class I-Class III)
- Accurately flow through a number of maneuvers

How does one design a controller for flying trajectories?

-axis flight

We need a family of policies that take as input a trajectory

Flying trajectories and not only hovering

- Take in account more of coupling

Trajectory Following

- One simple choice (with time varying)

Not good!

- Trajectory following: the hovering controller is always trying to “catch up” to the moving

X-axis

One Trick

Allow to evolve in a way that is different from the path of the desired trajectory but in a way that allows the helicopter to follow the actual desired trajectory more accurately.

Bowed-out trajectory

- Trajectories that have both a vertical and horizontal component
- To climb: increase the collective pitch control, which causes the helicopter to start accelerating upward

How to correct this?

- Slow down the z-response, i.e., delay the changes to

by t seconds

- t is another policy parameter

Thanks

Q & A

