Practical Reinforcement Learning in Continuous Space

Practical Reinforcement Learning in Continuous Space William D. Smart Brown University Leslie Pack Kaelbling MIT Presented by: David LeRoux

Goals of Paper • Practical RL approach • Handles continuous state and action spaces • Safely approximates value function • On-line learning bootstrapped with human-provided data

Approaches to Continuous State or Action Space • Discretize • If too course, problem with hidden states • If too fine, cannot generalize • Curse of dimensionality • Function Approximators • Use to estimate the Value Function • Errors tend to propagate • Tendency to over-estimate (hidden extrapolation)

Proposed Approach - Hedger • Instance-Based Approach • To predict Q(s,a): • Find neighborhood of (s,a) in corpus • Calculate kernel weights for neighbors • Do locally weighted regression, LWR, to estimate Q(s,a) • If not sufficient number of points in neighborhood, or (s,a) is not in within the Independent Variable Hull, return conservative default value for Q(s,a)

Hedger Training – given an observation (s,a,r,s’) • qnew  qold+(r+ qnext- qold), where • qold  Qpredict(s,a) • qnext  maxa’ Qpredict(s’,a’) • Use this to update Q(s,a) • Use the updated value of Q(s,a) to update Q(si,ai) in neighborhood of (s,a) • May be used in Batch or On-line

Potential Problems using Instance-Based Reinforcement Learning • Determining appropriate metric • Obtaining training paths achieving rewards • Keeping the size of the corpus manageable • Finding neighbors efficiently See Representations for Learning Control Policies – Forbes & Andre

Practical Reinforcement Learning in Continuous Space

Practical Reinforcement Learning in Continuous Space

Presentation Transcript

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

REINFORCEMENT LEARNING

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning