1 / 67

Playing Machines: Machine Learning Applications in Computer Games

Ralf Herbrich, Thore Graepel Applied Games Group Microsoft Research Cambridge. Playing Machines: Machine Learning Applications in Computer Games. Overview. Overview. Why Machine Learning and Games?. Games can be very hard!. Partially observable stochastic games

clea
Download Presentation

Playing Machines: Machine Learning Applications in Computer Games

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ralf Herbrich, Thore Graepel Applied Games Group Microsoft Research Cambridge Playing Machines: Machine Learning Applications in Computer Games

  2. Overview

  3. Overview

  4. Why Machine Learning and Games?

  5. Games can be very hard! • Partially observable stochastic games • States only partially observed • Multiple agents choose actions • Stochastic pay-offs and state transitions depend on state and all the other agents’ actions • Goal: Optimise long term pay-off (reward) • Just like life: complex, adversarial, uncertain, and we are in it for the long run!

  6. Approximations

  7. Overview

  8. Space Invaders 1977 Non-Player Character 2001 Agents Human Player

  9. Game Industry Surpasses Hollywood

  10. Games Industry Drives IT Progress

  11. Creatures (1996, Steve Grand) • Objective is to nurture creatures called norns • Model incorporates artificial life features • Norns had neural network brains • Their development can be influenced by player feedback

  12. Black & White (2001, Richard Evans) • Peter Molineux’s famous “God Game” • Player determines fate of villagers as their “God” (seen as a hand) • Creature can be taught complex behaviour • Good and Evil - actions have consequence

  13. Colin McRae Rally 2.0 (2001, Jeff Hannan) • First car racing game to use neural networks • Variety of tracks, drivers and road conditions • Racing line provided by author, neural network keeps car on racing line • Multilayer perceptrons trained with RPROP • Simple rules for recovery and overtaking

  14. Other Games using Machine Learning Source: http://www.gameai.com/games.html

  15. Drivatar™ (2004, Michael Tipping et al.)

  16. TrueSkill™ (2005, Graepel & Herbrich)

  17. TrueSkill™: Applications

  18. Overview

  19. Reinforcement Learning Agent game state parameter update action Learning Algorithm reward / punishment game state Game action

  20. Q Learning (off-policy) SARSA (on-policy) Q and SARSA Learning • Q(s,a) is expected reward for action a in state s. • α is rate of learning • a is action chosen • r is reward resulting from a • s is current state • s’ is state after executing a • γ is discount factor for future rewards

  21. Tabular Q-Learning +10.0 actions 13.2 10.2 -1.3 3 ft 5 ft game states 3.2 6.0 4.0

  22. Results (visual) Reinforcement Learner • Game state features • Separation (5 binned ranges) • Last action (6 categories) • Mode (ground, air, knocked) • Proximity to obstacle • Available Actions • 19 aggressive (kick, punch) • 10 defensive (block, lunge) • 8 neutral (run) In-Game AI Code • Q-Function Representation • One layer neural net (tanh) • Linear

  23. Learning Aggressive Fighting Reward for decrease in Wulong Goth’s health Early in the learning process … … after 15 minutes of learning

  24. Learning “Aikido” Style Fighting Punishment for decrease in either player’s health Early in the learning process … … after 15 minutes of learning

  25. Reinforcement Learning for Car Racing: AMPS (Kochenderfer, 2005) • Collect Experience • Learn transition probabilities and rewards • Revise Value Function and Policy • Revise state-action abstraction • Return to 1 and collect more experience Left Distance Speed

  26. Balancing Abstraction Complexity Just Right! Too Fine Too Coarse Representational Complexity

  27. Adapting the Representation Split A A A A A A A A Split Merge • Merge

  28. Project Gotham Racing 3 • Real time racing simulation. • Goal: as fast lap times as possible.

  29. Input Features and Reward Laser Range Finder Measurements as Features Progress along Track as Reward

  30. Actions • Coast • Accelerate • Brake • Hard-Left • Hard-Right • Soft-Left • Soft-Right

  31. Learning to Walk: Why? • Current Games have unrealistic physical movement • Moonwalk • Hovering • Only death scenes are realistic • Rag-doll physics • Releases joint constraints

  32. Reinforcement Learning to Walk(Russell Smith 1998) • Compromise between hard-wired and learned controller • Motion Sequencer with corrections • FOX controller: based on cerebellar model articulation controller (CMAC) neural network trained by reinforcement learning • Can follow paths and climb up and down slopes • Trained monopeds (“hopper”) and bipeds

  33. Learning to Walk (Russell Smith, 1998) Hopper Training Hopper Trained

  34. Learning to Walk (Russell Smith, 1998) Biped Training Biped Trained

  35. Overview

  36. Overview

  37. Motion Capture Data • Fix Markers at key body positions • Record their position in 3D during motion • Fundamental technology in animation today • Free download of mo-cap files: www.bvhfiles.com

  38. Gaussian Process Latent Variable Models (Lawrence, 2004) • Generative model for dimensionality reduction • Probabilistic equivalent to PCA which defines a probability distribution over data • Non-linear manifolds based on kernels • Visualisation of high-dimensional data • Back-projection from latent to data space • Can deal with missing data

  39. Generative Model (SPCA vs. GPLVM) Latent variables Weight matrix x W • SPCA: Marginalise over x and optimise W • GPLVM: Marginalise over W and optimise x y Data

  40. GPLVM on Motion Capture Data

  41. Overview

  42. Bayes Nets for Bots(R. Le Hy et al. 2004) • Goal: Learn from skilled players how to act in a first-person shooter (FPS) game • Test Environment: • Unreal Tournament FPS game engine • Gamebots control framework • Idea: Naive Bayes classifier to learn under which circumstances to switch behaviour

  43. Naive Bayes for State Classification • St: bot’s state at t • St+1: bot’s state t+1 • H: health level • W: weapon • OW: opponent’s weapon • HN: hear noise • NE: number of close enemies • PW: weapon close by? • PH: health pack close by?

  44. Supervised Learning from Humans

  45. Illustration of Learned Bots

  46. Drivatar™

  47. Drivatars Unplugged “Built-In” AI BehaviourDevelopment Tool DrivatarLearning System DrivatarRacing LineBehaviour Model Vehicle Interaction and Racing Strategy Recorded Player Driving Controller Car Behaviour Drivatar AI Driving

  48. The Racing Line Model

  49. Drivatars: Main Idea • Two phase process: • Pre-generate possible racing lines prior to the race from a (compressed) racing table. • Switch the lines during the race to add variability. • Compression reduces the memory needs per racing line segment • Switching makes smoother racing lines.

  50. Racing Tables Segments a1 a2 a3 a4

More Related