slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
TD(0) prediction Sarsa , On-policy learning Q-Learning, Off-policy learning PowerPoint Presentation
Download Presentation
TD(0) prediction Sarsa , On-policy learning Q-Learning, Off-policy learning

Loading in 2 Seconds...

play fullscreen
1 / 27

TD(0) prediction Sarsa , On-policy learning Q-Learning, Off-policy learning - PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on

TD(0) prediction Sarsa , On-policy learning Q-Learning, Off-policy learning. Actor-Critic. Unified View. N-step TD Prediction. Forward View. Random Walk. 19-state random walk. n-step method is simple version of TD( λ ) Example: backup average of 2-step and 4-step returns.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'TD(0) prediction Sarsa , On-policy learning Q-Learning, Off-policy learning' - lerato


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

TD(0) prediction

  • Sarsa, On-policy learning
  • Q-Learning, Off-policy learning
slide9

n-step method is simple version of TD(λ)

  • Example: backup average of 2-step and 4-step returns
forward view td
Forward View, TD(λ)
  • Weigh all n-step return backups by λn-1

(time since visitation)

  • λ-return:
  • Backup using λ-return:
slide17

Update rule:

  • As before, λ = 0 means TD(0)
  • Now, when λ = 1, you get MC, but
    • Can apply to continuing tasks
    • Works incrementally and on-line!
watkin s q
Watkin’s Q(λ)
  • Why isn’t Q-learning as easy as Sarsa?
watkin s q1
Watkin’s Q(λ)
  • Why isn’t Q-learning as easy as Sarsa?
slide22

Accumulating traces

    • Eligibilities can be greater than 1
    • Could cause convergence problems
  • Replacing traces
implementation issues
Implementation Issues
  • Could require significant amounts of computation
    • But most traces are very close to zero…
    • We can actually throw them out when they get very small
  • Will want to use some type of efficient data structure
  • In practice, increases computation only by a small multiple
slide27

AnonymousFeedback.net

Send to taylorm@eecs.wsu.edu

  • What’s been most useful to you (and why)?
  • What’s been least useful (and why)?
  • What could students do to improve the class?
  • What could Matt do to improve the class?