Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

Adaptive Intelligent Mobile Robotics • Leslie Pack Kaelbling • Artificial Intelligence Laboratory • MIT

Progress to Date • Erik the Red • Video game environment • Optical flow implementation • Fast bootstrapped reinforcement learning

Erik the Red • RWI B21 robot • camera, sonars, laser range-finder, infrareds • 3 Linux machines • ported our framework for writing debuggable code

Erik the Red

Crystal Space • Public-domain video-game environment • complex graphics • other agents • highly modifiable

Crystal Space

Optical Flow • Get range information visually by computing optical flow field • nearer objects cause flow of higher magnitude • expansion pattern means you’re going to hit • rate of expansion tells you when • elegant control laws based on center and rate of expansion (derived from human and fly behavior)

Optical Flow in Crystal Space

Making RL Really Work • Typical RL methods require far too much data to be practical in an online setting. Address the problem by • strong generalization techniques • using human input to bootstrap

JAQL • Learning a value function in a continuous state and action space • based on locally weighted regression (fancy version of nearest neighbor) • algorithm knows what it knows • use meta-knowledge to be conservative about dynamic-programming updates

Incorporating Human Input • Humans can help a lot, even if they can’t perform the task very well. • Provide some initial successful trajectories through the space • Trajectories are not used for supervised learning, but to guide the reinforcement-learning methods through useful parts of the space • Learn models of the dynamics of the world and of the reward structure • Once learned models are good, use them to update the value function and policy as well.

Simple Experiment • The “hill-car” problem in two continuous dimensions • Regular RL methods take thousands of trials to learn a reasonable policy • JAQL takes 11 inefficient but eventually successful trails generated by humans to get 80% performance • 10 more subsequent trials generate high quality performance in the whole space

Success Percentage

Trial Length (200 max) 54-step optimum

Next Steps • Implement optical-flow control algorithms on robot • Apply RL techniques to tune parameters in control algorithms on robot in real time • corridor following using sonar and laser • obstacle avoidance using optical flow • Build highly complex simulated environment • Integrate planning and learning in multi-layer system

Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT