Temporal Difference Learning

Temporal Difference Learning Mark Romero – 11/03/2011

Introduction • Temporal Difference Learning combines idea from the Monte Carlo Methods and Dynamic Programming • Still sample the environment based on some policy • Determine current estimate based on previous estimates • Predictions are adjusted as time goes on to match other more accurate predications • Temporal Difference Learning is popular for its simplicity and on-line applications

MC vs TD • Constant-α MC: R(t) – actual return (reward) α – constant step-sized parameter Because the actual return is used, we must wait until the end of the episode to determine the update to V.

MC vs TD • TD(0): rt+1 – observed award γ– discount rate TD method only waits for the next time step. At time t+1 a target can be formed and an update made using the observed reward, rt+1 , and estimate, V(st+1). In effect, TD(0) targets rt+1 + γV(st+1) instead of R(t) in the MC method Called bootstrapping because update is based on previous estimate

Psuedo Code Initialize V(s) arbitrarily, and π to the policy to be evaluated Repeat (for each episode): Initialize s Repeat (for each step of episode): α <- action given by π for s Take action α observe reward r, and next state, s’ V(s) <- V(s) + α[r + γV(s’) – V(s)] s <- s’ until s is terminal

Advantages over MC • Lends itself naturally to on-line applications • MC must wait until end of the episode to adjust reward, TD only needs one time step • Turns out this is critical consideration • Some applications have long episodes or no episodes at all • TD learns from every transition • MC methods generally discount or throw out episodes where an experimental action was taken • TD converges faster than constant-α MC in practice • No formal proof has been developed

Soundness • Is TD sound? • Yes, for any fixed policy the TD algorithm has been proven to Vπ, provided a sufficiently small constant step-size parameter, or if the step-size parameter decreases according to the usual stochastic approximation conditions.

Temporal Difference Learning

Temporal Difference Learning

Presentation Transcript

Foreground Motion Detection by Difference-Based Spatial Temporal Entropy Image

The Implementation of Artificial Intelligence and Temporal Difference Learning Algorithms in a Computerized Chess Progra

The Implementation of Artificial Intelligence and Temporal Difference Learning Algorithms in a Computerized Chess Progra

Temporal coding and Temporal resolution

CMSC 471 Fall 2009 Temporal Difference Learning

Learning Long-Term Temporal Features

Machine Learning of Temporal Relations

Chapter 6: Temporal Difference Learning

Lecture 18: Temporal-Difference Learning

Temporal-Difference Learning Week #6

Regularization and Feature Selection in Least-Squares Temporal Difference Learning

Temporal Difference Learning with Expectimax Search for the CGI-Threes

Temporal Difference Learning with Expectimax Search for the Threes-bot

Learning to make a difference

CMSC 471 Fall 2009 Temporal Difference Learning

Randomized Strategies and Temporal Difference Learning in Poker

Chapter 6: Temporal Difference Learning

Difference between machine learning and deep learning with examples

What’s The Difference Between Deep Learning and Machine Learning