1 / 24

Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

A fundamental study on representation of reward for reinforcement learning in dynamic environments. + an introduction of rescue simulation. Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp. Outline. Reinforcement learning A interactive learning framework in soft computing

bell
Download Presentation

Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A fundamental study on representation of reward for reinforcement learning in dynamic environments + an introduction of rescue simulation Takeshi Shibuya University of Tsukuba shibuya@iit.tsukuba.ac.jp

  2. Outline • Reinforcement learning • A interactive learning framework in soft computing • a method to learn in dynamic environment • RoboCup Rescue: Overview • an application of soft computing Reinforcement learning (theoritical side) Learning in dynamic environment (application side) Rescue simulation

  3. Contents: ・Reinforcement Learning in psychology ・Learning in dynamic environments Reinforcement learning

  4. Reinforcement Learning in psychology Kyoto University If he finishes to push numbers orderly,he gets a peanut as reward.

  5. notable thingsin Reinforcement Learning • The learner • acquires suitable behavior from the only reward. • The trainer • Does not have to tell the learner how to behave step by step.

  6. 1 2 What is reinforcement learning(RL)? State reward Value Environment Agent Actions Action • The agent enhances values that bring rewards. • The agent selects the action whose value is highest.

  7. Research themelearning in dynamic environment: • How to learn behavior when suitable action is changed? ? Action 1 Great reward time Action 2

  8. Research themelearning in dynamic environment: • Dividing reward into two part: • Time-dependent part: to be designed. • Time-independent part: to be learnt

  9. Research themelearning in dynamic environment: Probability of selecting EAST action increases. Proposed method enables the agent to adapt the change of the environment The probability of selecting action switches after the change of environment

  10. Contents: ・Overview of Robocup rescue ・demonstration RobocupRescue

  11. Leagues in RoboCup Ultimate goal of the RoboCup: • By mid-21st century, a team of fully autonomous • humanoid robot soccer players shall win • the soccer game, comply with the official rule of the FIFA, • against the winner of the most recent World Cup. • (from official site) • Soccer • Robot leagues • Simulation leagues • Rescue • Robot leagues • Simulation leagues • 2D • 3D

  12. RoboCup Rescue • The purpose: • (1) to develop simulators that form the infrastructure of the simulation system and emulate realistic phenomena predominant in disasters. (2) to develop intelligent agents and robots that are given the capabilities of the main actors in a disaster response scenario.(from official site) Agent simulation Virtual Robots simulation (Powered by USARSim)

  13. RoboCup Rescue: The agent simulation Buildings: Fire, Collapse Roads : Traffic movement Blocked roads due to rubble etc Emergency services: Fire brigades Ambulance teams Police forces

  14. Agent’s observation and Action

  15. Demonstration/ movie

  16. RoboCup Rescue + RL (Team MRL) • Reinforcement learning is employed for controlling agent. • The details are not shown in the paper. • Team MRL is the champion of RoboCup 2007. (total: 8 teams) OmidAghazadeh+, Implementing Parametric Reinforcement Learning in Robocup Rescue Simulation , RoboCup 2007: Robot Soccer World Cup XI Lecture Notes in Computer Science, 2008, Volume 5001/2008, 409-416, DOI: 10.1007/978-3-540-68847-1_42

  17. Summary • Following topics are overviewed: • Reinforcement learning • The framework and some research theme • RoboCup Rescue • Aims in some leagues and demonstrations

  18. 学習の対象 未知の一定量 既知の変化量

  19. Reinforcement LearningAs an engineering approach State reward Environment Agent (learner) Action

  20. Deviding reward into two part: • Time-dependent part: to be designed. • Time-independent part: to be learnt

  21. Research Theme 1:learning in Partially observable environment: • If agent can observe four states(angle and angular velocity of each joint ), the agent can control it. • If the agent can not use velocity information,the agent can not determine the direction to be torqued. Torque Angular velocity

  22. 100% 50% 1 1 2 2 -50% -100% Research Theme 1: learning in Partially observable environment: • Complex-valued reinforcement learning enables the agent to overcome the problem by using context of behavior. Swing up

  23. reward function

More Related