1 / 18

Effective Reinforcement Learning for Mobile Robots

Effective Reinforcement Learning for Mobile Robots. Smart, D.L and Kaelbing, L.P. Content. Background Review Q-learning Reinforcement learning on mobile robots Learning framework Experimental results Conclusion Discussion. Background. Hard to code behaviour efficiently and correctly

Download Presentation

Effective Reinforcement Learning for Mobile Robots

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P.

  2. Content • Background • Review Q-learning • Reinforcement learning on mobile robots • Learning framework • Experimental results • Conclusion • Discussion

  3. Background • Hard to code behaviour efficiently and correctly • Reinforcement learning: tell the robot what to do, not how to do it • How well suited is reinforcement learning for mobile robots?

  4. Review Q-learning • Discrete states s and actions a • Learn value function by observing rewards • Actual function Q*(s,a) = E[R(s,a) + g max Q*(s’,a’)] • Learn by Q(st,at) = (1-a) Q(st,at) + a(rt+1 + g max Q(st+1,a’)) • Sample distribution has no effect on learned policy p*(s) = argmax Q*(s,a)

  5. Reinforcement learning on mobile robots • Sparse reward function • Almost always zero reward R(s,a) • Non-zero reward only when on success or failure • Continuous environment • HEDGER is used as a function approximator • Function approximation can be used when it never extrapolates from the data

  6. Reinforcement learning on mobile robots • Q-learning can only be successful when a state with positive reward can be found • Sparse reward function and continuous environment cause reward states to be hard to find by trial and error • Solution: show robot how to find the reward states

  7. Learning framework • Split learning into two phases: • Phase one: actions are controlled by exterior force, learning algorithm only passively observes • Phase two: learning algorithm learns optimal policy • By ‘showing’ the robot where the interesting states are, learning should be quicker

  8. Experimental setup • Two experiments on B21r mobile robot • Movement speed is fixed by outside force • Rotation speed has to be learned • Settings a = 0.2, g = 0.99 or 0.90 • Performance is measured after every 5 runs • Robot does not learn from these test • Starting position and orientation similar, not identical

  9. Experimental Results:Corridor Following Task • State space: • distance to end of corridor • distance to left wall as fraction of corridor width • angle q to target point

  10. Experimental Results:Corridor Following Task • Computer controlled teacher • Rotation speed is a fraction a of the angle q

  11. Experimental Results:Corridor Following Task • Human controlled teacher • Different corridor than computer controlled teacher

  12. Experimental Results:Corridor Following Task Results • Decrease in performance after training • Phase 2 supplies more novel experiences • Sloppy human controller causes faster convergence than rigid computer controller • Fewer phase 1 and phase 2 runs • Human controller supplies more varied data

  13. Experimental Results:Corridor Following Task Results • Simulated performance without advantage of teacher examples

  14. Experimental Results:Obstacle Avoidance Task • State space: • direction and distance to obstacles • direction and distance to target

  15. Experimental Results:Obstacle Avoidance Task Results • Human controlled teacher • Robot starts 3m from target, random orientation

  16. Experimental Results:Obstacle Avoidance Task Results • Simulation without teacher examples • No obstacles present; robot only must reach goal • Simulated robot starts in the right orientation • 3 meters from target: 18.7% reached target in one week of simulated time, taking 6.54 hours on average

  17. Conclusion • Passive observation of appropriate state-action behaviour can speed up Q-learning • Knowledge about the robot or the learning algorithm is not necessary • Any solution will work, providing a good solution is not necessary

  18. Discussion

More Related