1 / 15

Continous-Action Q-Learning

Continous-Action Q-Learning. Jose Del R.Millan et al, Machine Learning 49, 247-265 (2002). Summarized by Seung-Joon Yi. ITPM(Incremental Topology Preserving Map). Consists of units and edges between pairs of units. Maps current sensory situation x onto action a.

bernie
Download Presentation

Continous-Action Q-Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Continous-Action Q-Learning Jose Del R.Millan et al, Machine Learning 49, 247-265 (2002) Summarized by Seung-Joon Yi (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  2. ITPM(Incremental Topology Preserving Map) • Consists of units and edges between pairs of units. • Maps current sensory situation x onto action a. • Units are created incrementally and incorporates bias • After being created, the units’ sensory component is tuned by self-organizing rules • Their action component is updated through reinforcement learning. (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  3. ITPM • Units and bias • Initially the ITPM has no units and they are created as the robot uses built-in reflexes. • Units in the network have overlapping localized receptive fields. • When the neural controller makes incorrect generalizations, reflexes get control of the robot and it adds a new unit to the ITPM. (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  4. ITPM • Self-organizing rules (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  5. ITPM • Advantages • Automatically allocates units in the visited parts of the input space. • Adjusts dynamically the necessary resolution in different regions. • Experiments show that in everage every unit is connected to 5 others at the end of learning episodes. (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  6. ITPM • General learning algorithm (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  7. Discrete-action Q-Learning • Action selection rule • Ε-greedy policy • Q-value update rule (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  8. Continous-action Q-Learning • Action selection rule • An average of the discrete actions of the nearest unit weighted by their Q-values • Q-value of the selected continous action a is: (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  9. Continous-action Q-Learning • Q-value update rule (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  10. Average-Reward RL • Q-value update rule (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  11. Experiments • Wall following task • Reward (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  12. Experiments • Performance comparison between discrete and continous discountd-rewarded RL (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  13. Experiments • Performance comparison between discrete and continous average-rewarded RL (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  14. Experiments • Performance comparison between discounted and average-rewarded RL,discrete-action case (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

  15. Conclusion • Presented a simple Q-learning that works in continous domains. • ITPM represents continous input space • Compared discounted-rewarded RL against average-awarded RL (C) 2003, SNU Biointelligence Lab, http://bi.snu.ac.kr/

More Related