Sample-based Planning for Continuous Action Markov Decision Processes [on robots]. Ari Weinstein. Reinforcement Learning (RL). Agent takes an action in the world, gets information including numerical reward; how does it learn to maximize that reward?.
Composing pieces in this manner is novel
<s,a>, get <r,s’>
R((p,v), a) = -(p2+a2)
+/- 0.05 units uniformly distributed noise on actions
Order is red, green, blue, yellow, magenta, cyan