1 / 1

ABSTRACT:

Sparse Q-learning with Mirror Descent Sridhar Mahadevan and Bo Liu , University of Massachusetts Amherst Autonomous Learning Laboratory , {mahadeva, boliu}@cs.umass.edu. Convergence Comparison with LARS-TD:. ABSTRACT:

kin
Download Presentation

ABSTRACT:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sparse Q-learning with Mirror Descent Sridhar Mahadevan and Bo Liu, University of Massachusetts Amherst Autonomous Learning Laboratory, {mahadeva, boliu}@cs.umass.edu Convergence Comparison with LARS-TD: • ABSTRACT: • This paper explores a new framework for reinforcement learning (RL) based on online convex optimization, in particular mirror descent and related algorithms. • A new class of proximal-gradient based temporal difference (TD) methods are presented based on different Bregman divergences, which are more powerful than regular TD learning. • A new family of first-order sparse RL methods are proposed, which are able to find sparse fixed-point of an L1-regularized Bellman equation at significantly less computational cost than previous second-order methods. ALGORITMS: Less difference between successive weights Less running time at each iteration • BACKGROUND • Mirror Descentis an enhanced gradient method, which can be viewed as a proximal algorithm where the distance generating function used is a Bregman divergence. • ERROR BOUND ANALYSIS: Variance Comparison with Q-learning: Less variance compared with Q-learning The error bound is controlled by 1Expressiveness of -subspace 2 Sparsity parameter 3 Quality of empirical l_1 solver Control Learning • DISCUSSIONS AND FUTRE WORK: • Comparison of p-norm with Exponentiated Gradient (EG): EG is not able to generate sparse solutions; Besides, EG-based methods are prone to cause overflow of coefficients. • P-norm link function provides an interpolation between additive and multiplicative gradient update and is thus more flexible and robust to various basis functions. • The regret bound w.r.t different link function in RL setting is yet to be further discovered. • Introducing mirror descent into off-policy TD learning and policy gradient algorithms. • Scaling to large MDPs, including hierarchical mirror descent RL, in particular extending to Semi-MDP Q-learning. EXPERIMENTAL RESULT: Decaying p-norm: Iterative soft-thresholdingfor sparsity • MOTIVATION • This is a two-step Nested Optimization problem: • Projection Step: • Fixed-point Step: Proceedings of the Conference on Uncertainty in AI (UAI), August 15-17, 2012, Catalina Island, CA For more information, please contact: Prof. Sridhar Mahadevan, Dept. Computer Science, University of Massachusetts Amherst, Email: Mahadeva@cs.umass.edu

More Related