1 / 28

Managing State x Action Growth

Adaptive Game AI, Unreal Tournament, Reinforcement Learning ... Our contribution: Online Learning of Sub-symbolic Game AI for Team based First-Person ...

Download Presentation

Managing State x Action Growth

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. RETALIATE: Learning Winning Policies in First-Person Shooter Games Dept. of Computer Science & Engineering Lehigh University

    2. Outline Introduction Adaptive Game AI Domination games in Unreal Tournament Reinforcement Learning Adaptive Game AI with Reinforcement Learning RETALIATE architecture and algorithm Empirical Evaluation Final Remarks Main Lessons

    3. Introduction Adaptive Game AI, Unreal Tournament, Reinforcement Learning

    4. Adaptive AI in Games Symbolic Compositional syntax where atoms have meaning in themselves. (fol, other interpreted logical theory) Sub-symbolic lack atomic elements that are themsleves meaningful representations (pixels GA as search technique, not learning technique Symbolic Compositional syntax where atoms have meaning in themselves. (fol, other interpreted logical theory) Sub-symbolic lack atomic elements that are themsleves meaningful representations (pixels GA as search technique, not learning technique

    5. Adaptive Game AI and Learning Learning Motivation Combinatorial explosion of possible situations Tactics (e.g., competing teams tactics) Game worlds (e.g., map where the game is played) Game modes (e.g., domination, capture the flag) Little time for development Learning the Cons Difficult to control and predict Game AI Difficult to test

    6. Reinforcement Learning Agents learn policies through rewards and punishments Policy - Determines what action to take from a given state (or situation) Agents goal is to maximize some reward Tabular vs. Generalization Techniques We maintain a Q-Table: Q-table: State Action ? value Supervisor -> labeled training; access to correct output Curse of dimensionality Tabular limited to small numbers of states and actions; not just memory, but time and data needed to fill accurately Generalization uses limited subset of state space to produce good approximation over much larger subsetSupervisor -> labeled training; access to correct output Curse of dimensionality Tabular limited to small numbers of states and actions; not just memory, but time and data needed to fill accurately Generalization uses limited subset of state space to produce good approximation over much larger subset

    7. Unreal Tournament (UT) Online FPS developed by Epic Games Inc. 1999 Six gameplay modes including team deathmatch and domination games Gamebots: a client-server architecture for controlling bots started by U.S.C. Information Sciences Institute (ISI) Direct competetor to Quake III arena U so cal.Direct competetor to Quake III arena U so cal.

    8. UT Domination Games A number of fixed domination locations. Ownership: the team of last player to step into location Scoring: a team point awarded for every five seconds location remains controlled Winning: first team to reach pre-determined score (50) Top down view of the mapTop down view of the map

    9. Adaptive Game AI with RL RETALIATE (Reinforced Tactic Learning in Agent-Team Environments) Tactic versus strategy?Tactic versus strategy?

    10. The RETALIATE Team Controls two or more UT bots Commands bots to execute actions through the GameBots API The UT server provides sensory (state and event) information about the UT world and controls all gameplay Gamebots acts as middleware between the UT server and the Game AI Emphasize that bots are plug-ins. We learn strategies, not individual bot tactics.Emphasize that bots are plug-ins. We learn strategies, not individual bot tactics.

    11. The RETALIATE Algorithm

    12. Initialization

    13. Rewards and Utilities

    14. Rewards and Utilities ???But will not necessarily converge to an optimal policy ???But will not necessarily converge to an optimal policy

    15. State Information and Actions Curse of dimensionalityCurse of dimensionality

    16. Managing (State x Action) Growth Our Table: States: ( {E,F,N}, {E,F,N}, {E,F,N} ) = 27 Actions: ( {L1, L2, L3}, ) = 27 27 x 27 = 729 Generally, 3#loc x #loc#bot Adding health, discretized (high, med, low) States: (, {h,m,l}) = 27 x 3 = 81 Actions: ( {L1, L2, L3, Health}, ) = 43 = 64 81 x 64 = 5184 Generally, 3(#loc+1) x (#loc+1)#bot Number of Locations, size of team frequently varies.

    17. Empirical Evaluation Opponents, Performance Curves, Videos

    18. The Competitors Htn is previous work that beat the other three. We didnt modify the HTN bots .. And its KB was successful. Therefore HTN bots is NOT a straw-man for proving retaliate, just a good opponent that IS currently FREELY available (since 2005)Htn is previous work that beat the other three. We didnt modify the HTN bots .. And its KB was successful. Therefore HTN bots is NOT a straw-man for proving retaliate, just a good opponent that IS currently FREELY available (since 2005)

    19. Summary of Results Against the opportunistic, possessive, and greedy control strategies, RETALIATE won all 3 games in the tournament. within the first half of the first game, RETALIATE developed a competitive strategy. Possibly get rid of OLD, and replace with the graphs depicting performance on the static opponents Make a point to say HTNBots won all games against non RL opponents. Maybe include the raph that cycles through opponents w/ retained q-table because it motivates that the strategy needs to remain dynamic!Possibly get rid of OLD, and replace with the graphs depicting performance on the static opponents Make a point to say HTNBots won all games against non RL opponents. Maybe include the raph that cycles through opponents w/ retained q-table because it motivates that the strategy needs to remain dynamic!

    20. Summary of Results: HTNBots vs RETALIATE (Round 1) Epsilon isnt changing so cant talk about exploring vs. exploiting What I really mean is how well the exploitation is working.Epsilon isnt changing so cant talk about exploring vs. exploiting What I really mean is how well the exploitation is working.

    21. Summary of Results: HTNBots vs RETALIATE (Round 2) Same thing about exploit vs. explore be carefulSame thing about exploit vs. explore be careful

    22. Video: Initial Policy Add caption on left. What is red, blue? Draw wedge of a circle, explain apex is center, and rest is angle of viewAdd caption on left. What is red, blue? Draw wedge of a circle, explain apex is center, and rest is angle of view

    23. Video: Learned Policy Keep same key on this slide.Keep same key on this slide.

    24. Final Remarks Lessons Learned, Future Work

    25. Final Remarks (1) From our work with RETALIATE we learned the following lessons, beneficial to any real-world application of RL for these kinds of games: Separate individual bot behavior from team strategies. Model the problem of learning team tactics through a simple state formulation. The use of non-discounted rewards works well in this domain. - Need to say that we originally tried much more state information, but learned through experiments to keep the state information small. - Explain how we learned we should separate strat from plug-in bots (ie state info again)- Need to say that we originally tried much more state information, but learned through experiments to keep the state information small. - Explain how we learned we should separate strat from plug-in bots (ie state info again)

    26. Final Remarks (2) It is very hard to predict all strategies beforehand As a result, RETALIATE was able to find a weakness and exploit it to produce a winning strategy that HTNBots could not counter On the other hand HTNBots produce winning strategies against the other opponents from the beginning while it took RETALIATE half a game in some situations Tactics emerging from RETALIATE might be difficult to predict, a game developer will have a hard time maintaining the Game AI Future Work: This suggest that a combination of HTN Planning to lay down initial strategies and using Reinforcement Learning to tune these strategies should address individual weaknesses from both approaches Future Work: This suggest that a combination of HTN Planning to lay down initial strategies and using Reinforcement Learning to tune these strategies should address individual weaknesses from both approaches

    27. Thank you! Questions?

    28. REMBER Emphasis should be given on main lessons: simple domain representation, gamma = 1, etc. I think we outline a response to John and we included some of this in the book chapter, so we should include this in the presentation. There might be questions about other RL approaches and whether we didn't try them. We should think of an answer to that. Also a question if we tried gamma less than one (my recollection is that we did and Megan reported that it was converging too slowly towards a competitive policy)

More Related