Planning and Learning in Games

Planning and Learning in Games Michael van Lent Institute for Creative Technologies University of Southern California

Business of Games • 60% of Americans play video games • $25 Billion dollar industry worldwide (2004) • $11 Billion dollars in the US (2004) • $6.1 billion in 1999, $5.5 billion in 1998, $4.4 billion in 1997. • One day sales records • Halo 2: $125 million in a single day • Harry Potter (Half-blood Prince): $140 million single day • Consoles dominate the industry • 90% of sales (Microsoft, Sony, Nintendo) • Average age of game players is 29 • Average age of game buyers is 36 • 59% of game players are men

Game AI: A little context • History of game AI in 5 bullet points • Lots of work on path planning • Hand-coded AI • Finite state machines • Scripted AI • Embed hints in the environment • Things are starting to change • Game environments are getting more complex • Players are getting more sophisticated • Development costs are sky rocketing • Incremental improvements are required to get a publisher • Game developers are adopting new techniques • Game AI is becoming more procedural and more adaptive

Scripted AI: Example 1 ; The AI will attack once at 1100 seconds and then again ; every 1400 sec, provided it has enough defense soldiers. (defrule (game-time > 1100) => (attack-now) (enable-timer 7 1100)) (defrule (timer-triggered 7) (defend-soldier-count >= 12) => (attack-now) (disable-timer 7) (enable-timer 7 1400)) Age of Kings Microsoft

Scripted AI: Example 2 (defrule (true) => (enable-timer 4 3600) (disable-self)) (defrule (timer-triggered 4) => (cc-add-resource food 700) (cc-add-resource wood 700) (cc-add-resource gold 700) (disable-timer 4) (enable-timer 4 2700)) Age of Kings Microsoft

Procedural AI: The Sims The SIMS Maxis

Two Adaptive AI Technologies • Criteria • First-hand experience • Support procedural and adaptive AI • Early stages of adoption by commercial developers

Two Adaptive AI Technologies • Criteria • Deliberative Planning • F.E.A.R. (Monolith/Vivendi Universal for PC) • Condemned (Monolith/Sega for Xbox 2)

Two Adaptive AI Technologies • Criteria • Deliberative Planning • Machine Learning • Long considered “scary voodoo” • Decision tree induction & neural nets in Black & White • Drivatar in Forza Motorsport

Why Planning and Learning? • Improving current games • More variable & replayable • More immersive & engaging • More customized experience • More robust • More challenging • Improved profits • More sales • Marketing • Cheaper development • New elements of game play and whole new genres • Necessary as games advance

Why not Planning and Learning? • Costlier development • Is the expense worth the result? • Greater processor/memory load • AI typically gets 10-20% of the CPU • That time comes in frequent small slices • Harder to control the player’s experience • Harder to do quality assurance • Double the cost of testing • Adds technical risk • Programmers need to spin up on new technologies • Designers need to understand what’s possible • Designers create the AI; Programmers implement it • Marketing backlash • Once game is stable it’s too late to add a major feature

Why Planning and Learning? • Improving current games • More variable & replayable • More immersive & engaging • More customized experience • More robust • More challenging • Improved profits • More sales • Marketing • Cheaper development • New elements of game play and whole new genres • Necessary as games advance

Blah Blah blah Blah? • Blah blah blah • Blah blah & blah • Blah blah & blah • Blah blah blah • Blah blah • Blah blah • Improved profits • Blah blah • Blah • Blah blah • Blah blah blah blah blah blah blah blah blah • Blah blah blah blah

Deliberative Planning • What is deliberative planning? • If you know the current state of the world • and the goal state(s) of the world • and the actions available • When each can be done • How each changes the world • then search for a sequence of actions that changes the current state into a goal state. • Deliberative planning is just a search problem • When to plan? • Off-line: Before/after each game session • Real-time: During the game session • During development: Not part of shipped product

Deliberative Planning • Domain independent planning engine • Abstract problem description • Goal world state (Mission objective) • secure(building1) • clear(building1) & clear(building2) & clear(building3) • captured(OpforLeader) or killed(OpforLeader)

Deliberative Planning • Domain independent planning engine • Abstract problem description • Goal world state (Mission objective) • Operators Team-Move (opfor,L?) (mobile opfor) (opfor at L?) (u1 at L?) (mobile u3) Checkpoint (u1) Checkpoint (u3) (mobile u1) (u3 at L?) Checkpoint (u2) (mobile u2) (u2 at L?)

Deliberative Planning • Domain independent planning engine • Abstract problem description • Goal world state (Mission objective) • Operators Secure-Base-Against-SW-Attack (base-secure) (at-base u?,u?,u?) Defend-Building (u?, b14) (u? at b14) Secure-Perimeter-Against-SW-Attack (opfor) (at-base u?,u?) (perimeter-secure) Patrol (u?, s-path) (u? at s-path) Ambush (u?, sw-region) (u? at sw-region)

Deliberative Planning • Domain independent planning engine • Abstract problem description • Goal world state • Operators • Initial world state • Deliberative Planning: Find a sequence of operators that change the initial world state into a goal world state.

Strategic Planning Example Init Goal (mobile opfor) Team-Move (opfor) (opfor at base) Secure-Base-Against-SW-Attack (base-secure) Checkpoint (u1) Checkpoint (u3) Defend-Building (u1, b14) Checkpoint (u2) (u1 at b14) Secure-Perimeter-Against-SW-Attack (opfor) Patrol (u2, s-path) (u2 at s-path) Ambush (u3, sw-region) (u3 at sw-region)

Plan Execution • Execute atomic actions from plan • Move from abstract planning world to “real world” • Real-time interaction with environment • 10+ sense/think/act cycles per second Ambush (u3, sw-region) Select-ambush-loc Move-to-ambush-loc Wait-to-ambush Ambush-attack Report-success Defend Abandon-ambush Report-failure

Machine Learning: Behavior Capture • Also called: • Behavioral Cloning • Learning by Observation • Learning by Imitation • A form of Knowledge Capture • Learn by watching an expert • Experts are good at performing the task • Experts aren’t always good at teaching/explaining the task • Learn believable, human-like behavior • Mimic the styles of different players • When to learn? • During development • Off-line

Drivatar • “Check out the revolutionary A.I. Drivatar™ technology: Train your own A.I. "Drivatars" to use the same racing techniques you do, so they can race for you in competitions or train new drivers on your team. Drivatar technology is the foundation of the human-like A.I. in Forza Motosport.” • Collaboration between Microsoft Games and Microsoft Research

Learning to Fly • Learn a flight sim autopilot from observing human pilots • 30 “observations” each from 3 experts • 20 features (elevation, airspeed, twist, fuel, thrust…) • 4 controls (elevators, rollers, thrust, flaps) • Take off, level out, fly towards a mountain, return and land • Key idea: Experts react to the same situation in different ways depending on their current goals • Divide a flight sim task into 7 phases • Learn four decision trees for each stage (one per control) • Second key idea: Don’t combine data from multiple experts • Sammut, C. Hurst, S., Kedzier, D., and Michie, D. Learning to fly. In Proceedings of the Ninth International Conference on Machine Learning, pgs. 385-393, 1992.

KnoMic (Knowledge Mimic) • Learn air combat in a flight sim and a deathmatch bot in Quake II • Dynamic behavior against opponents • Can’t divide the task into fixed phases • Key idea: Experts dynamically select which operator they’re working on based on opponent and environment • Also learn when to select operators (pre-conditions) • and what those operators do (effects) • Second key idea: Experts annotation observations with their operator selections • van Lent, M. & Laird, J. E., Learning Procedural Knowledge by Observation. Proceedings of the First International Conference on Knowledge Capture (K-CAP 2001), October 21-23, 2001, Victoria, BC, Canada, ACM, pp 179-186.

The Future

Where to learn more • AI and Interactive Digital Entertainment Conference • Marina del Rey, June 2006 • Journal of Game Development • Charles River Media • Game Developer Magazine • August special issue on AI • Game Developer’s Conference • AI Game Programming Wisdom book series • Historical: • 2005 IJCAI workshop on Reasoning, Representation and Learning in Computer Games • AAAI Spring Symposiums 1999 – 2003 • 2004 AAAI Workshop

Interesting observations • A few of my own: • The most challenging opponent isn’t the most fun. • “Never stupid” is better than “sometimes brilliant.” • Never underestimate the player’s ability to see intelligence where there is none. • Game companies aren’t a source of research funds • A few of Will Wright’s: • Maximize the ratio of internal complexity to perceived intelligence. • The player will build an internal model of your system. If you don’t help them build it, they’ll probably build the wrong one. • The flow of information about a system has a huge impact on the players perception of it’s intelligence. • From the players point of view there is a fine line between complex behavior and random behavior.

Planning and Learning in Games

Planning and Learning in Games

Presentation Transcript

Learning in Games

Learning in games

METAGAMER: An Agent for Learning and Planning in General Games

Reinforcement Learning in Games

Path Planning in Games

Video Games and Learning

Path-Planning in Games

Learning in Games

Digital games and learning:

Bayesian and non-Bayesian Learning in Games

Derivative Action Learning in Games

Games and learning

Games in Learning

Learning Principles in Teaching and Video Games

Motion Planning in Games

Using Games in Learning

CPS 296.3 Learning in games

Learning in Games

Bayesian and non-Bayesian Learning in Games

CPS 296.1 Learning in games