1 / 30

UT^2: Human-like Behavior via Neuroevolution of Combat Behavior and Replay of Human Traces

UT^2: Human-like Behavior via Neuroevolution of Combat Behavior and Replay of Human Traces. Jacob Schrum (schrum2@cs.utexas.edu) Igor V. Karpov (ikarpov@cs.utexas.edu) Risto Miikkulainen (risto@cs.utexas.edu). Our Approach: UT^2. Human traces to get unstuck and navigate

ull
Download Presentation

UT^2: Human-like Behavior via Neuroevolution of Combat Behavior and Replay of Human Traces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UT^2: Human-like Behavior via Neuroevolution of Combat Behavior and Replay of Human Traces Jacob Schrum (schrum2@cs.utexas.edu) Igor V. Karpov (ikarpov@cs.utexas.edu) Risto Miikkulainen (risto@cs.utexas.edu)

  2. Our Approach: UT^2 • Human traces to get unstuck and navigate • Filter data to get general-purpose traces • Evolve skilled combat behavior • Restrictions/filters maintain humanness • Observe and judge like a human • Necessary to account for the judging game

  3. Bot Architecture

  4. Human Trace Replay

  5. Record and Index Human Games Indexed by nearest navpoint Synthetic pose data Replay nearest trace when needed

  6. Unstuck Controller Mix scripted responses and human traces Previous UT^2 used only human traces Human traces also used after repeated failures

  7. Explorative Retrace Explore the level like a human Collisions allowed when using RETRACE Humans often bump walls with no problem If RETRACE fails No trace available, or trace gets bot stuck Fall through to PATH module (Nav graph)

  8. Evolved Battle Controller

  9. 6 movement outputs Advance Retreat Strafe left Strafe right Move to nearest item Stand still Additional output Jump? Battle Controller Outputs

  10. Battle Controller Inputs Pie slice sensors for enemies Ray traces for walls/level geometry Other misc. sensors for current weapon properties, nearby item properties, etc.

  11. Battle Controller Inputs • Opponent movement sensors • Opponent performing movement action X? • Opponents modeled as moving like bot • Approximation used

  12. Constructive Neuroevolution • Genetic Algorithms + Neural Networks • Build structure incrementally (complexification) • Good at generating control policies • Three basic mutations (no crossover used) Perturb Weight Add Connection Add Node

  13. Evolving Battle Controller • Used NSGA-II* with 3 objectives • Damage dealt • Damage received (negative) • Geometry collisions (negative) • Evolved in DM-1on1-Albatross • Small level to encourage combat • One native bot opponent • High score favored in selection of final network • Final combat behavior highly constrained *K. Deb et al. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. Evol. Comp. 2002

  14. Action Filtering • Network choice not always used • Forced to stand still sometimes • Sniping, not threatened, high ground • Prevented from jumping while still • Prevented from jumping near walls/opponents • Prevented from going to unwanted items • Prevented from strafing/retreating into walls • Etc… • Forced lower accuracy • Forced delays to simulate human response time • Evolution constrained to look human

  15. Importance of Observing • Humans don’t just want max score • Human goal is to judge correctly • Requires observation w/o fighting • Observe module • Bot hasn’t judged opponent • Avoids crowds • Judging module • Lengthy observation leads to judging

  16. Observation Behavior Still Approach Use Battle Controller Retreat

  17. Human Subject Evaluation • BotPrize tests humanness without saying what is human-like vs. bot-like • Idea: BotPrize style experiment in which players are extensively interviewed • IRB Human Subject Study w/cash prizes • Performed at UT: • 6 human volunteers • 3 human interviewers • 4 versions of UT^2 • Native bots

  18. Justify Judgments • Record each match and replay to human • Human explains rationale for judgments • Downsides: • Humans forget • Humans make things up • Humans change their minds • Still, many common themes emerged:

  19. Humans Aren’t Killing Machines • Accuracy affected by movement/distraction • Pause before responding to surprises • Humans don’t fire non-stop • Waiting for opportune shot • Saving ammo • Few weapon switches • Pause to observe

  20. Humans Aren’t Stupid • Humans rapidly correct mistakes • Get unstuck quickly • Move/dodge when fired upon • Don’t stare at walls • Humans know their limitations • Prefer weapons requiring less accuracy • Don’t fight with a weak weapon

  21. Complex Human Movements • Do • Chase opponents tenaciously • Retreat while firing on opponent • Move in and out from cover • Don’t • Perform many rapid movements too quickly • Turn around too quickly

  22. Cognitive Issues • Theory of Mind • Behavior transitions • A chasing human expects to fight • Humans expect to be chased (traps) • Communication via judging • Human knows that its action will be recognized as human-like by humans • Emotion • Revenge on humans more satisfying • Fear of dangerous opponents

  23. Conclusion • Human trace replay provides human style exploration and gets bot unstuck • Multiobjective neuroevolution provides combat behavior • Simulated observation makes bot seem more human-like • Future work: Incorporate Theory of Mind

  24. Questions? Jacob Schrum (schrum2@cs.utexas.edu) Igor V. Karpov (ikarpov@cs.utexas.edu) Risto Miikkulainen (risto@cs.utexas.edu)

  25. Auxiliary Slides

  26. Multiobjective Optimization High health but did not deal much damage • Game with two objectives: • Damage Dealt • Remaining Health • A dominates B iff A is strictly better in one objective and at least as good in others • Population of points not dominated are best: Pareto Front • Weighted-sum provably incapable of capturing non-convex front Tradeoff between objectives Dealt lot of damage, but lost lots of health

  27. NSGA-II • Evolution: natural approach for finding optimal population • Non-Dominated Sorting Genetic Algorithm II* • Population P with size N; Evaluate P • Use mutation to get P´ size N; Evaluate P´ • Calculate non-dominated fronts of {P È P´} size 2N • New population size N from highest fronts of {P È P´} *K. Deb et al. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. Evol. Comp. 2002

More Related