1 / 6

Practical Reinforcement Learning in Continuous Space

Practical Reinforcement Learning in Continuous Space. William D. Smart Brown University Leslie Pack Kaelbling MIT. Presented by: David LeRoux. Goals of Paper. Practical RL approach Handles continuous state and action spaces Safely approximates value function

raheem
Download Presentation

Practical Reinforcement Learning in Continuous Space

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Practical Reinforcement Learning in Continuous Space William D. Smart Brown University Leslie Pack Kaelbling MIT Presented by: David LeRoux

  2. Goals of Paper • Practical RL approach • Handles continuous state and action spaces • Safely approximates value function • On-line learning bootstrapped with human-provided data

  3. Approaches to Continuous State or Action Space • Discretize • If too course, problem with hidden states • If too fine, cannot generalize • Curse of dimensionality • Function Approximators • Use to estimate the Value Function • Errors tend to propagate • Tendency to over-estimate (hidden extrapolation)

  4. Proposed Approach - Hedger • Instance-Based Approach • To predict Q(s,a): • Find neighborhood of (s,a) in corpus • Calculate kernel weights for neighbors • Do locally weighted regression, LWR, to estimate Q(s,a) • If not sufficient number of points in neighborhood, or (s,a) is not in within the Independent Variable Hull, return conservative default value for Q(s,a)

  5. Hedger Training – given an observation (s,a,r,s’) • qnew  qold+(r+ qnext- qold), where • qold  Qpredict(s,a) • qnext  maxa’ Qpredict(s’,a’) • Use this to update Q(s,a) • Use the updated value of Q(s,a) to update Q(si,ai) in neighborhood of (s,a) • May be used in Batch or On-line

  6. Potential Problems using Instance-Based Reinforcement Learning • Determining appropriate metric • Obtaining training paths achieving rewards • Keeping the size of the corpus manageable • Finding neighbors efficiently See Representations for Learning Control Policies – Forbes & Andre

More Related