1 / 30

Chapter 8: Generalization and Function Approximation

Chapter 8: Generalization and Function Approximation. Look at how experience with a limited part of the state set be used to produce good behavior over a much larger part. Overview of function approximation (FA) methods and how they can be adapted to RL. Objectives of this chapter:.

olympe
Download Presentation

Chapter 8: Generalization and Function Approximation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 8: Generalization and Function Approximation • Look at how experience with a limited part of the state set be used to produce good behavior over a much larger part. • Overview of function approximation (FA) methods and how they can be adapted to RL Objectives of this chapter:

  2. Value Prediction with FA As usual: Policy Evaluation (the prediction problem): for a given policy p, compute the state-value function In earlier chapters, value functions were stored in lookup tables. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  3. Adapt Supervised Learning Algorithms Training Info = desired (target) outputs Supervised Learning System Inputs Outputs Training example = {input, target output} Error = (target output – actual output) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  4. Backups as Training Examples As a training example: input target output R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  5. Any Function Approximation Method? • In principle, yes: • artificial neural networks • decision trees • multivariate regression methods • etc. • But RL has some special requirements: • usually want to learn while interacting • ability to handle nonstationarity • other? R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  6. Gradient Descent Methods transpose R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  7. Performance Measures • Many are applicable but… • a common and simple one is the mean-squared error (MSE) over a distribution P to weight various errors: • Why P ? • Why minimize MSE? • Let us assume that P is always the distribution of states at which backups are done. • The on-policy distribution: the distribution created while following the policy being evaluated. Stronger results are available for this distribution. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  8. Gradient Descent Iteratively move down the gradient: R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  9. Gradient Descent Cont. For the MSE given above and using the chain rule: R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  10. Gradient Descent Cont. Assume that states appear with distribution P Use just the sample gradient instead: Since each sample gradient is an unbiased estimate of the true gradient, this converges to a local minimum of the MSE if a decreases appropriately with t. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  11. But We Don’t have these Targets R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  12. What about TD(l) Targets? R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  13. On-Line Gradient-Descent TD(l) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  14. Linear Methods R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  15. Nice Properties of Linear FA Methods • The gradient is very simple: • For MSE, the error surface is simple: quadratic surface with a single minumum. • Linear gradient descent TD(l) converges: • Step size decreases appropriately • On-line sampling (states sampled from the on-policy distribution) • Converges to parameter vector with property: best parameter vector (Tsitsiklis & Van Roy, 1997) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  16. Coarse Coding R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  17. Learning and Coarse Coding R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  18. Tile Coding • Binary feature for each tile • Number of features present at any one time is constant • Binary features means weighted sum easy to compute • Easy to compute indices of the features present R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  19. Tile Coding Cont. Irregular tilings Hashing CMAC “Cerebellar model arithmetic computer” Albus 1971 R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  20. Radial Basis Functions (RBFs) e.g., Gaussians R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  21. Can you beat the “curse of dimensionality”? • Can you keep the number of features from going up exponentially with the dimension? • Function complexity, not dimensionality, is the problem. • Kanerva coding: • Select a bunch of binary prototypes • Use hamming distance as distance measure • Dimensionality is no longer a problem, only complexity • “Lazy learning” schemes: • Remember all the data • To get new value, find nearest neighbors and interpolate • e.g., locally-weighted regression R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  22. Learning state-action values The general gradient-descent rule: Gradient-descent Sarsa(l) (backward view): Training examples of the form: Control with Function Approximation R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  23. GPI with Linear Gradient Descent Sarsa(l) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  24. GPI Linear Gradient Descent Watkins’ Q(l) R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  25. Mountain-Car Task R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  26. Mountain-Car Results R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  27. Baird’s Counterexample Reward is always zero, so the true value is zero for all s The approximate of state value function is shown in each state Updating Is unstable from some initial conditions R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  28. Baird’s Counterexample Cont. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  29. Should We Bootstrap? R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

  30. Summary • Generalization • Adapting supervised-learning function approximation methods • Gradient-descent methods • Linear gradient-descent methods • Radial basis functions • Tile coding • Kanerva coding • Nonlinear gradient-descent methods? Backpropation? • Subtleties involving function approximation, bootstrapping and the on-policy/off-policy distinction R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

More Related