1 / 133

Amir massoud Farahmand Majid Nili Ahmadabadi, Babak N. Araabi, Caro Lucas SoloGen

Investigations on Automatic Behavior-based System Design + [A Survey on] Hierarchical Reinforcement Learning. Amir massoud Farahmand Majid Nili Ahmadabadi, Babak N. Araabi, Caro Lucas www.SoloGen.net SoloGen@SoloGen.net. [a non-uniform] Outline. Brief History of AI

onaona
Download Presentation

Amir massoud Farahmand Majid Nili Ahmadabadi, Babak N. Araabi, Caro Lucas SoloGen

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Investigations on Automatic Behavior-based System Design+[A Survey on]Hierarchical Reinforcement Learning Amir massoud Farahmand Majid Nili Ahmadabadi, Babak N. Araabi, Caro Lucas www.SoloGen.net SoloGen@SoloGen.net

  2. [a non-uniform]Outline • Brief History of AI • Challenges and Requirements of Robotic Applications • Behavior-based Approach to AI • The Problem of Behavior-based System Design • MDP and Standard Reinforcement Learning Framework • A Survey on Hierarchical Reinforcement Learning • Behavior-based System Design • Learning in BBS • Structure Learning • Behavior Learning • Behavior Evolution and Hierarchy Learning in Behavior-based Systems

  3. Happy birthday to Artificial Intelligence • 1941 Konrad Zuse, Germany, general purpose computer • 1943 Britain (Turing and others) Collossus, for decoding • 1945 ENIAC, US. John von Neumann a consultant • 1946 The Logic Theorist on JOHNNIAC--Newell, Shaw and Simon • 1956 Dartmouth Conference organized by John McCarthy (inventor of LISP) • The term Artificial Intelligence coined at Dartmouth---intended as a two month, ten man study!

  4. HP to AI (2) ‘It is not my aim to surprise or shock you----but the simplest way I can summarize is to say that there are now in the world machines that think, that learn and that create. Moreover, their ability to these things is going to increase rapidly until........…’ (Herb Simon 1957) Unfortunately, Simon was too optimistic!

  5. What AI have done for us? • Rather good OCR (Optical Character Recognition) and Speech recognition softwares • Robots make cars in all advanced countries • Reasonable machine translation is available for a large range of foreign web pages • Systems land 200 ton jumbo jets unaided every few minutes • Search systems like Google are not perfect but very effective information retrieval • Computer games and autogenerated cartoons are advancing at an astonishing rate and have huge markets • Deep blue beat Kasparov in 1997. The world Go champion is a computer. • Medical expert systems can outperform doctors in many areas of diagnosis (but we aren’t allowed to find out easily!)

  6. AI: What is it? • What is AI? • Different definitions • The use of computer programs and programming techniques to cast light on the principles of intelligence in general and human thought in particular (Boden) • The study of intelligence independent of its embodiment in humans, animals or machines (McCarthy) • AI is the study of how to do things which at the moment people do better (Rich & Knight) • AI is the science of making machines do things that would require intelligence if done by men. (Minsky) (fast arithmetic?) • Is it definable?! • Turing test, Weak and Strong AI and …

  7. AI: Basic assumption • Symbol System Hypothesis: it is possible to construct a universal symbol system that thinks • Strong Symbol System Hypothesis: the only way a system can think is through symbolic processing • Happy birthday Symbolic (Traditional – Good old-fashioned) AI

  8. Symbolic AI: Methods • Knowledge representation (Abstraction) • Search • Logic and deduction • Planning • Learning

  9. Symbolic AI: Was it efficient? • Chess [OK!] • Block-worlds [OK!] • Daily Life Problems • Robots [~OK!] • Commonsense [~OK!] • … [~OK]

  10. Symbolic AI and Robotics World Modelling Motor control sensors actuators • Functional decomposition • Sequential flow • Correct perceptions is assumed to be done by vision-researched in a “a-good-and-happy-will-come-day”! • Get a logic-based or formal description of percepts • Apply search operators or logical inference or planning operators

  11. Challenges Sensor and Effector Uncertainty Partial Observability Non-Stationarity Requirements (among many others) Multi-goal Robustness Multiple Sensors Scalability Automatic design [Adaptation (Learning/Evolution)] Challenges and Requirements of Robotic Systems

  12. Behavior-based approach to AI • Behavioral (activity) decomposition [against functional decomposition] • Behavior: Sensor->Action (Direct link between perception and action) • Situatedness • Embodiment • Intelligence as Emergence of …

  13. Behavioral decomposition manipulate the world build maps sensors actuators explore avoid obstacles locomote

  14. Situatedness • No world modelling and abstraction • No planning • No sequence of operations on symbols • Direct link between sensors and actions • Motto: The world is its own best model

  15. Embodiment • Only an embodied agent is validated as one that can deal with real world. • Only through a physical grounding can any internal symbolic system be given meaning

  16. Emergence as a Route to Intelligence • Emergence: interaction of some simple systems which results in something more than sum of those systems • Intelligence as emergent outcome of dynamical interaction of behaviors with the world

  17. Behavior-based design • Robust • not sensitive to failure of particular part of the system • no need for precise perception as there is no modelling there • Reactive: Fast response as there is no long route from perception to action • No representation

  18. A Simple problem • Goal: make a mobile robot controller that collects balls from the field and move them to home • What we have: • Differentially controlled mobile robot • 8 sonar sensors • Vision system that detects balls and home

  19. Basic design avoid obstacles move toward ball move toward home exploration

  20. A Simple Shot

  21. ? How should we DESIGN a behavior-based system?!

  22. Behavior-based System Design Methodologies • Hand Design • Common in almost everywhere. • Complicated: may be even infeasible in complex problems • Even if it is possible to find a working system, it is not optimal probably. • Evolution • Good solutions can be found • Biologically feasible • Time consuming • Not fast in making new solutions • Learning • Biologically feasible • Learning is essential for life-time survival of the agent.

  23. The Importance of Adaptation (Learning/Evolution) • Unknown environment/body • [exact] Model of environment/body is not known • Non-stationary environment/body • Changing environment (offices, houses, streets, and almost everywhere) • Aging • [cannot be remedied with evolution very easily] • Designer may not know how to benefit from every aspects of her agent/environment • Let’s the agent learn it by itself (learning as optimization) • etc …

  24. Different Learning Methods

  25. Reinforcement Learning • Agent senses state of the environment • Agent chooses an action • Agent receives reward from an internal/external critic • Agent learns to maximize its received rewards through time.

  26. Reinforcement Learning • Inspired from Psychology • Thorndike, Skinner, Hull, Pavlov, … • Very successful applications • Games (Backgammon) • Control • Robotics • Elevator Scheduling • … • Well-defined mathematical formulation • Markov Decision Problems

  27. Markov Decision Problems • Markov Process: Formulating a wide range of dynamical systems • Finding an optimal solution of an objective function • [Stochastic] Dynamics Programming • Planning: Known environment • Learning: Unknown environment

  28. MDP

  29. Reinforcement Learning Revisited (1) • Very important Machine Learning method • An approximate online solution of MDP • Monte Carlo method • Stochastic Approximation • [Function Approximation]

  30. Reinforcement Learning Revisited (2) • Q-Learning and SARSA are among the most important solution of RL

  31. Some Simple Samples 1D Grid World Map of the Environment Policy Value Function

  32. Some Simple Samples 2D Grid World Map Value Function Policy Value Function (3D view)

  33. Some Simple Samples 2D Grid World Map Value Function Policy Value Function (3D view)

  34. Curses of DP It is not easy to use DP (and RL) in robotic tasks. • Curse of Modeling • RL solves this problem • Curse of Dimensionality (e.g. robotic tasks have a very big state space) • Approximating Value function • Neural Networks • Fuzzy Approximation • Hierarchical Reinforcement Learning

  35. A Sample of Learning in a Robot Hajime Kimura, Shigenobu Kobayashi, “Reinforcement Learning using Stochastic Gradient Algorithm and its Application to Robots,” The Transaction of the Institute of Electrical Engineers of Japan, Vol.119, No.8 (1999) (in Japanese!)

  36. Hierarchical Reinforcement Learning

  37. ATTENTION Hierarchical reinforcement learning methods are not specially designed for behavior-based systems. Covering them in this presentation with this depth should not be interpreted as their high amount of relation to behavior-based system design.

  38. Hierarchical RL (1) • Use some kind of hierarchy in order to … • Learn faster • Need less values to be updated (smaller storage dimension) • Incorporate a priori knowledge by designer • Increase reusability • Have a more meaningful structure than a mere Q-table

  39. Hierarchical RL (2) • Is there any unified meaning of hierarchy? NO! • Different methods: • Temporal abstraction • State abstraction • Behavioral decomposition • …

  40. Hierarchical RL (3) • Feudal Q-Learning [Dayan, Hinton] • Options [Sutton, Precup, Singh] • MaxQ [Dietterich] • HAM [Russell, Parr, Andre] • ALisp [Andre, Russell] • HexQ [Hengst] • Weakly-Coupled MDP [Bernstein, Dean & Lin, …] • Structure Learning in SSA [Farahmand, Nili] • Behavior Learning in SSA [Farahmand, Nili] • …

  41. Feudal Q-Learning • Divide each task to a few smaller sub-tasks • State abstraction method • Different layers of managers • Each manager gets orders from its super-manager and orders to its sub-managers

  42. Feudal Q-Learning • Principles of Feudal Q-Learning • Reward Hiding:Managers must reward sub-managers for doing their bidding whether or not this satisfies the commands of the super-managers. Sub-managers should just learn to obey their managers and leave it up to them to determine what it is best to do at the next level up. • Information Hiding:Managers only need to know the state of the system at the granularity of their own choices of tasks. Indeed, allowing some decision making to take place at a coarser grain is one of the main goals of the hierarchical decomposition. Information is hidden both downwards - sub-managers do not know the task the super-manager has set the manager - and upwards -a super-manager does not know what choices its manager has made to satisfy its command.

  43. Feudal Q-Learning

  44. Feudal Q-Learning

  45. Options: Introduction • People make decisions at different time scales • Traveling example • People perform actions with different time scales • Kicking a ball • Becoming a soccer player • It is desirable to have a method to support this temporally-extended actions over different time scales

  46. Options: Concept • Macro-actions • Temporal abstraction method of Hierarchical RL • Options are temporally extended actions which each of them is consisted of a set of primitive actions • Example: • Primitive actions: walking NSWE • Options: go to {door, cornet, table, straight} • Options can be Open-loop or Closed-loop • Semi-Markov Decision Process Theory [Puterman]

  47. Options: Formal Definitions

  48. Options: Rise of SMDP! • Theorem: MDP + Options = SMDP

  49. Options: Value function

  50. Options:Bellman-like optimality condition

More Related