autonomous helicopter flight via reinforcement learning
Download
Skip this Video
Download Presentation
Autonomous Helicopter Flight Via Reinforcement Learning

Loading in 2 Seconds...

play fullscreen
1 / 30

Autonomous Helicopter Flight Via Reinforcement Learning - PowerPoint PPT Presentation


  • 216 Views
  • Uploaded on

Autonomous Helicopter Flight Via Reinforcement Learning. Yanjie Li Harbin Institute of Technology Shenzhen Graduate School. Outline. Introduction Autonomous Helicopter Model Identification Reinforcement Learning (RL) Learning to Hover Flying Competition Maneuvers. Introduction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Autonomous Helicopter Flight Via Reinforcement Learning' - MikeCarlo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
autonomous helicopter flight via reinforcement learning

Autonomous Helicopter Flight Via Reinforcement Learning

Yanjie Li

Harbin Institute of Technology

Shenzhen Graduate School

outline
Outline
  • Introduction
  • Autonomous Helicopter
  • Model Identification
  • Reinforcement Learning (RL)
  • Learning to Hover
  • Flying Competition Maneuvers
introduction
Introduction
  • Helicopter: a challenging control problem
    • High dimensional
    • Asymmetric
    • Nonlinear
  • A successful application of RL
autonomous helicopter
Autonomous Helicopter
  • Helicopter: Yamaha R-50 (3.6m 20kg)
    • Inertial Navigation System (INS)
      • 3 accelerometers and 3 gyroscopes
    • A Differential GPS (a resolution of 2cm)
    • An onboard navigation computer
        • Kalman filter : GPS,INS, digital compass
  • Control Inputs:
model identification
Model Identification
  • Preparation:
    • Ask a human pilot to fly the helicopter for several minutes (339s for model identification and 140s for testing)
    • Record 12-dimensional state and 4-dimensional control inputs
model identification7
Model Identification
  • Symmetries
    • not spatial coordinates
    • Body coordinates
  • Model
slide8
Weighted linear regression

Cross validation

Smoothing parameter

Gauss weight function:

slide9
Several refinement
    • There are many determined terms in

(0, 1/50, gravity)

    • Three extra variables (unobserved)
reinforcement learning rl
Reinforcement Learning (RL)
  • MDP :
    • State space
    • Initial state
    • State transition probabilities
    • Reward function
    • Discount factor
    • Family of policies
  • Objective:

Find a policy to maximize the utility

slide13
Simulation:

1,

2,

  • Monte Carlo:

Failed estimation

slide14
Common Random Number
  • PEGASUS RL

Initial distribution

Good estimation

slide15
Discretized action (Huge policy space)
  • Derivative Estimation

Gradient-based Optimization

learning to hover
Learning to hover
  • Given hovering position and orientation
  • Policy class: Neural network

tunable parameters:

  • Quadratic cost function:

Weight: Scale each of terms to be roughly the same order of magnitude

parallel simulation
Parallel simulation

Expensive Monte Carlo Evaluation

Repeat again and again

Parallel implementation

flying competition maneuvers
Flying Competition Maneuvers
  • Academy of Model Aeronautics
    • RC helicopter competition (Class I-Class III)
    • Accurately flow through a number of maneuvers
slide21
How does one design a controller for flying trajectories?

-axis flight

We need a family of policies that take as input a trajectory

slide22
Flying trajectories and not only hovering
    • Take in account more of coupling
trajectory following
Trajectory Following
  • One simple choice (with time varying)

Not good!

  • Trajectory following: the hovering controller is always trying to “catch up” to the moving

X-axis

slide24
Change the reward function:

Not use but instead

Trajectory

Projection

Potential- based shaping reward

slide25
One Trick

Allow to evolve in a way that is different from the path of the desired trajectory but in a way that allows the helicopter to follow the actual desired trajectory more accurately.

slide26
Bowed-out trajectory
  • Trajectories that have both a vertical and horizontal component
    • To climb: increase the collective pitch control, which causes the helicopter to start accelerating upward
slide27
How to correct this?
    • Slow down the z-response, i.e., delay the changes to

by t seconds

    • t is another policy parameter
slide28
Thanks

Q & A

cross validation
Cross Validation
  • To determine

Define

where

Choose

ad