reinforcement learning for soaring cdmrg 24 may 2010 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Reinforcement Learning for Soaring CDMRG – 24 May 2010 PowerPoint Presentation
Download Presentation
Reinforcement Learning for Soaring CDMRG – 24 May 2010

Loading in 2 Seconds...

play fullscreen
1 / 19

Reinforcement Learning for Soaring CDMRG – 24 May 2010 - PowerPoint PPT Presentation


  • 103 Views
  • Uploaded on

Reinforcement Learning for Soaring CDMRG – 24 May 2010. Nick Lawrance. Reinforcement Learning for Soaring. What I want to do Have a good understanding of the dynamics involved in aerodynamic soaring in known conditions but:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Reinforcement Learning for Soaring CDMRG – 24 May 2010


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
reinforcement learning for soaring
Reinforcement Learning for Soaring
  • What I want to do
    • Have a good understanding of the dynamics involved in aerodynamic soaring in known conditions but:
      • Dynamic soaring requires energy loss actions for net energy gain cycles which can be difficult using traditional control or path generation methods
      • Wind is difficult to predict; guidance and nav must be done on-line whilst simultaneously maintaining reasonable energy levels and safety requirements
      • Classic exploration-exploitation problem with the added catch that exploration requires energy gained through exploitation
reinforcement learning for soaring1
Reinforcement Learning for Soaring
  • Why reinforcement learning
    • Previous work focused on understanding soaring and examining alternatives for generating energy-gain paths.
    • Always have the issue of balancing exploration and exploitation, my code ended up being long sequences of heuristic rules
    • Reinforcement learning could provide the link from known good paths towards optimal paths
monte carlo td sarsa q learning
Monte Carlo, TD, Sarsa & Q-learning
  • Monte Carlo – Learn an average reward for actions taken during series of episodes
  • Temporal Difference – Simultaneously estimate expected reward and value function
  • Sarsa – using TD for on-policy control
  • Q-learning – off-policy TD control
slide6

Figure 6.13: The cliff-walking task. Off-policy Q-learning learns the optimal policy, along the edge of the cliff, but then keeps falling off because of the -greedy action selection. On-policy Sarsa learns a safer policy taking into account the action selection method. These data are from a single run, but smoothed.

eligibility traces
Eligibility Traces
  • TD(0) is effectively one-step backup of Vπ(reward only counts to previous action)
  • Eligibility traces extend this to reward the sequence of actions that lead to the current reward.
sarsa
Sarsa(λ)
  • Initialize Q(s,a) arbitrarily and e(s,a) = 0, for all s, a
  • Repeat (for each episode):
    • Initialize s, a
    • Repeat (for each step of episode):
      • Take action a, observe r, s’
      • Choose a’ from s’ using policy derived from Q (ε-greedy)
      • For all s,a:
    • until s is terminal
simplest soaring attempt
Simplest soaring attempt
  • Square grid, simple motion, energy sinks and sources
  • Movement cost, turn cost, edge cost
hex grid dynamic soaring
Hex grid, dynamic soaring
  • Energy based simulation
  • Drag movement cost, turn cost
  • Constant speed
  • No wind motion (due to limited states)
slide19
Next
  • Reinforcement learning has advantages to offer our group, but our contribution should probably be focused in well defined areas
  • For most of our problems, the state spaces are very large and usually continuous; we need estimation methods
  • We usually have a good understanding of at least some aspects of the problem; how can/should we use this information to give better solutions?