1 / 14

Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning

Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning. Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented by: Subarna Sadhukhan. Reinforced learning.

lily
Download Presentation

Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented by: Subarna Sadhukhan

  2. Reinforced learning • Vision based reinforced learning by which a robot learns to shoot a ball into a goal. Develop a method which automatically acquires strategies for this. • The robot and its environment are modeled by two synchronized finite state automatons interacting in discrete time cyclical processes. • Robot : senses current state and selects an action Environment makes decision to transition to a new state and generates reward back to the robot • Robot learns through purposive behavior to achieve a given goal

  3. Environment – Ball, Goal • Robot- Mobile and has a camera • Nothing about the system is known • Assume robot can discriminate the set S of states and take A actions on the world

  4. Q-learning Let Q*(s,a) be the expected return for taking action a in situation s. Where T(s,a,s’) be probability of transition from s to s’, r(s,a) is the reward for state-action pair s-a γ is discounting factor Since T and r are not known we can write Where r is the actual reward for taking a. s’ is the next state and α is the learning rate

  5. State Set • 9*27+27+9 states (3*3 of ball*3*3*3 of goal+no goal+no ball)

  6. Action set • Two motors • Each motor – forward, stop, back • 9 actions in all. • State-action deviation problem- Small change near observer results in large change in image, large change far from observer small change in image

  7. Learning from Early Missions • Delayed reinforcement problem due to no explicit teacher signal, since reward received only after ball is kicked to the goal. r(s,a) = 1 only in goal state • Construct the learning schedule so that robot can learn in easy situations at early stages and later on learn in more difficult situations – Learning from Easy missions

  8. Complexity analysis • K states, m possible actions • Q-learning for first , for second hence • LEM m*k : Get reward at each step

  9. Implementing LEM Rough ordering of easy situations Small -> medium -> large (sizes of ball roughly means reaching the goal) State space is categorized into sub-states such as ball size, position and so on. n = size of state space, m = number of ordered sets Apply LEM with m ordered states takes As opposed to

  10. When to shift • S1 is nearest to goal, next is S2 and so on. • Shifting occurs when Where Δ t indicates a time interval for number of steps to change. We suppose that the current state set S(k-1) can transit only to its neighbors

  11. From previous Q-learning equation if Q converges • Thus

  12. LEM

  13. Experiments

More Related