1 / 24

Efficient Implementation of Reinforcement Learning In Co-ordinated Group Activities

Efficient Implementation of Reinforcement Learning In Co-ordinated Group Activities. By Ashwinkumar Ganesan CMSC 601. Agenda. Reinforcement Learning Problem Statement Proposed Method Conclusions. What is Reinforcement Learning?. Method for learning by experience

anthonya
Download Presentation

Efficient Implementation of Reinforcement Learning In Co-ordinated Group Activities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Implementation of Reinforcement Learning In Co-ordinated Group Activities By AshwinkumarGanesan CMSC 601

  2. Agenda • Reinforcement Learning • Problem Statement • Proposed Method • Conclusions

  3. What is Reinforcement Learning? • Method for learning by experience • There agent or bot learns by interacting with the environment. • There is reward attached for action taken in a particular state. • GOAL : MAXIMIZE THE REWARD

  4. Bellman Equation (RL in a bit more detail) • Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision [1] • U(s) is Utility Of a state • R(s) is the reward of the state. • T(s,a,s’) is the transition from s to s’

  5. Markov Decision Process • It is a discrete time control process • S’ is the next state • S is the current state • A is the action

  6. Q-Learning(A Reinforcement Learning Algorithm) • Q – learning is method of maintaining reward information without learning the policy. • Q(a,s) is the current action state pair. • R(s’) is the reward for the next state.

  7. What is Co-ordinated Group Action? • Co-ordinated Group Action is the situation a set of agents perform a single task. • GOAL: To maximize the output or the reward globally.

  8. Agenda • Reinforcement Learning • Problem Statement • Proposed Method • Conclusions

  9. Some Problems In Multi-Agent systems… • Communication i.e. what should the agent communicate and how much should it communicate with other agents. • Optimal Policy i.e. defining an optimal policy of the entire group. Is an optimal policy a set of optimal individual policies for each agent? • How much of the individual policy information of a certain agent is available to the entire group.

  10. What Am I Proposing? • Create a method for implementing reinforcement learning on co-ordinated group activity efficiently • Modify Reinforcement Learning algorithm to implement group action • Implement the proposed method and measure its efficiency in World of Warcraft.

  11. Problem Environment • The proposal is to research co-operative group learning under the following conditions: • The environment is assumed to partially observable. • Each agent in the system is knows the final action taken by the other agent. • Agents do not have access the to state information generated by other agents while selecting an action. • Agents do not have access to the policies of other agents, to make the decision. • The rewards are not linearly separable.

  12. World Of Warcraft • World of Warcraft is a large multi-player online game by Blizzard Entertainment • It is game where every player has his own and roams the virtual world, fighting demons, observing the landscape, buy and selling items and interacting with other players • In short, it is a large game with lots of options for tools, skills and levels, making it challenging for Bots.

  13. World Of Warcraft http://world-of-warcraft.en.softonic.com/

  14. Motivation • Today games especially like World Of Warcraft, have a large number of human players who play in groups. • Single as well as multiplayer games have AI engines, but attacks and actions in these games by opponents is still one at a time. • Reinforcement Learning can be implemented real time in these games, to improve the AI over a period of time and customize the games for users • Other applications – Robotics,defense

  15. Related Work • QUICR Method: This method calculates the counterfactual action, which is the action an agent did not take at a time[2]. • Least Square Policy Iteration (LSPI) can implemented. The method performs policy iterations using samples instead of an actual policy[3]. • FMQ Algorithm. The algorithm is helpful for environments where agents have partial or no observability[4].

  16. Inverse Reinforcement Learning • Inverse reinforcement learning is the exact opposite of reinforcement learning. • Input is the optimal policy or behavior that is expected from the agent • The agent learns to find the reward function based on observed values in the environment

  17. Agenda • Reinforcement Learning • Problem Statement • Proposed Method • Conclusions

  18. Proposed Method • Implement Reinforcement Learning in 2 parts: • Implement Inverse Reinforcement Learning • Implement a modified Reinforcement Learning on the rewards learned in step1. • Observe Expert • Calculate Reward • Observe other agents • Calculate the new Q(a,s) value • Calculate Policy based on reward

  19. Challenges • Calculating the iteration when the agent is known to have optimal policy • Finding the point when to switch from Inverse Learning to Reinforcement Learning • Working with the reward function got from observing the expert and rewards obtained from the environment • Finding a method to observe other agents and “experts”.

  20. Evaluation Metrics • The metrics will be evaluated against the known methods • Metrics are: • Number of states generated • Number of iterations required to reach optimal policy • Rewards in terms of points earned (in the game) • Rate of Convergence to optimal Policy

  21. Agenda • Reinforcement Learning • Problem Statement • Proposed Method • Conclusions

  22. Conclusion • We can use Inverse reinforcement learning with reinforcement learning methods to speed up the learning time required for bots. • Improve bot reward functions over time.

  23. References • R Bellman, On the Theory of Dynamic Programming, Proceedings of the National Academy of Sciences, 1952 • Adrian K. Agogino and Kagan Turner, QUICKER Q-Learning in Multi-agent systems. • Lihong Li, Micheal L. Littman, Christopher R. Mansley, Online Exploration in Least – Squares policy Iteration. • La¨etitiaMatignon, Guillaume J. Laurent and Nadine Le Fort-Piat, A study of FMQ heuristic in cooperative multi-agent games • Acknowledgement to Prof. Tim Oates for helping with the literature survey.

  24. QUESTIONS?

More Related