1 / 83

Structural Return Maximization for Reinforcement Learning

Structural Return Maximization for Reinforcement Learning. Josh Joseph Alborz Geramifard Javier Velez Jonathan How Nicholas Roy. How should we act in the presence of complex , u nknown dynamics?. How should we act in the presence of complex, unknown dynamics?.

anisa
Download Presentation

Structural Return Maximization for Reinforcement Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structural Return Maximization for Reinforcement Learning Josh Joseph AlborzGeramifard Javier Velez Jonathan How Nicholas Roy

  2. How should we act in the presence of complex, unknown dynamics?

  3. How should we act in the presence of complex, unknown dynamics?

  4. How should we act in the presence of complex, unknown dynamics?

  5. How should we act in the presence of complex, unknown dynamics?

  6. What do I mean by complex dynamics? • Can’t derive from first principles / intuition • Any dynamics model will be approximate • Limited data • Otherwise just do nearest neighbors • Batch data • Trying to keep it as simple as possible for now • Fairly straightforward to extend to active learning

  7. What do I mean by complex dynamics? • Can’t derive from first principles / intuition • Any dynamics model will be approximate • Limited data • Batch data • Fairly straightforward to extend to active learning

  8. How does RL solve these problems? • Assume some representation class for: • Dynamics model • Value function • Policy • Collect some data • Find the “best” representation based on the data

  9. How does RL solve these problems? • Assume some representation class for: • Dynamics model • Value function • Policy • Collect some data • Find the “best” representation based on the data

  10. How does RL solve these problems? • The “best” representation based on the data • This defines the best policy…not the best representation Policy Starting state Value (return) reward unknown dynamics model

  11. How does RL solve these problems? • The “best” representation based on the data • This defines the best policy…not the best representation Policy Starting state Value (return) reward unknown dynamics model

  12. How does RL solve these problems? • The “best” representation based on the data • This defines the best policy…not the best representation Policy Starting state Value (return) reward unknown dynamics model

  13. …but does RL actually solve this problem? • Policy Search • Policy directly parameterized by

  14. …but does RL actually solve this problem? • Policy Search • Policy directly parameterized by

  15. …but does RL actually solve this problem? • Policy Search • Policy directly parameterized by Empirical estimate Number of episodes

  16. …but does RL actually solve this problem? • Policy Search • Policy directly parameterized by Empirical estimate Number of episodes

  17. …but does RL actually solve this problem? • Model-based RL • Dynamics model =

  18. …but does RL actually solve this problem? • Model-based RL • Dynamics model =

  19. …but does RL actually solve this problem? • Model-based RL • Dynamics model =

  20. …but does RL actually solve this problem? Maximizing likelihood != maximizing return • Model-based RL • Dynamics model =

  21. …but does RL actually solve this problem? Maximizing likelihood != maximizing return …similar story for value-based methods • Model-based RL • Dynamics model =

  22. ML model selection in RL • So why do we do it? • It’s easy • It sometimes works really well • Intuitively it feels like finding the most likely model should result in a high performing policy • Why does it fail? • Chooses an “average” model based on the data • Ignores reward function • What do we do then?

  23. ML model selection in RL • So why do we do it? • It’s easy • It sometimes works really well • Intuitively it feels like finding the most likely model should result in a high performing policy • Why does it fail? • Chooses an “average” model based on the data • Ignores reward function • What do we do then?

  24. ML model selection in RL • So why do we do it? • It’s easy • It sometimes works really well • Intuitively it feels like finding the most likely model should result in a high performing policy • Why does it fail? • Chooses an “average” model based on the data • Ignores reward function • What do we do then?

  25. Our Approach • Model-based RL • Dynamics model =

  26. Our Approach • Model-based RL • Dynamics model = Empirical estimate

  27. Our Approach • Model-based RL • Dynamics model = Empirical estimate

  28. Planning with Misspecified Model Classes Us

  29. Our Approach • Model-based RL • Dynamics model = Empirical estimate

  30. Our Approach • Model-based RL • Dynamics model = We can do the same thing in a value-based setting. Empirical estimate

  31. …but • We are indirectly choosing a policy representation • The win of this indirect representation is that it can be “small” • Small = less data? • Intuitively you’d think so • Empirical evidence from toy problems • But all of our guarantees rely on infinite data • …maybe there’s a way to be more concrete

  32. …but • We are indirectly choosing a policy representation • The win of this indirect representation is that it can be “small” • Small = less data? • Intuitively you’d think so • Empirical evidence from toy problems • But all of our guarantees rely on infinite data • …maybe there’s a way to be more concrete

  33. What we want • How does the representation space relate to true return? • …they’ve been doing this in classification since the 60s • Relationship between the bound and “size” of the representation space / amount of data ?

  34. What we want • How does the representation space relate to true return? • …they’ve been doing this in classification since the 60s • Relationship between the bound and “size” of the representation space / amount of data ?

  35. What we want • How does the representation space relate to true return? • …they’ve been doing this in classification since the 60s • Relationship between the “size” of the representation space and the amount of data ?

  36. How to get there Model-based, value-based, policy search

  37. How to get there Model-based, value-based, policy search Empirical Risk Minimization Map RL to classification

  38. How to get there Model-based, value-based, policy search Empirical Risk Minimization Map RL to classification Measuring function class size Bound on true risk

  39. How to get there Model-based, value-based, policy search Empirical Risk Minimization Map RL to classification Measuring function class size Bound on true risk

  40. How to get there Model-based, value-based, policy search Empirical Risk Minimization Map RL to classification Measuring function class size Bound on true risk Structural risk minimization Structure of function classes

  41. How to get there Model-based, value-based, policy search Empirical Risk Minimization Map RL to classification Measuring function class size Bound on true risk Structural risk minimization Structure of function classes

  42. Classification

  43. Classification

  44. Classification f

  45. Classification Risk

  46. Classification Loss (cost) Unknown data distribution Risk

  47. Empirical Risk Minimization Unknown data distribution

  48. Empirical Risk Minimization Unknown data distribution Empirical estimate Number of samples

  49. Mapping RL toClassification

  50. Mapping RL toClassification

More Related