AI in game (IV)

AI in game (IV) Oct. 11, 2006

So far • Artificial Intelligence: A Modern Approach • Stuart Russell and Peter Norvig • Prentice Hall, 2nd ed. • Chapter 1: AI taxonomy • Chapter 2: agents • Chapter 3: uninformed search • Chapter 4: informed search

From now on • Artificial Intelligence: A Modern Approach • Chapter 4. • Chapter 6: adversarial search • Network part • Learning (maybe from the same textbook) • Game AI techniques

Outline • Ch 4. informed search • Online search • Ch 6. adversarial search • Optimal decisions • α-β pruning • Imperfect, real-time decisions

Offline search vs. online search • Offline search agents • Compute a solution before setting foot in the real world • Online search agents • Interleave computation and action • E.g. takes an action and then observes environments and then computes the next action • Necessary for an exploration problem • States and actions are unknown • E.g. robot in a new building, or labyrinth

online search problems • Agents are assumed to know only • Actions(s): returns a list of actions allowed in states • c(s,a,s’): this step-cost cannot be used until the agent knows that s’ is the outcome • Goal-test(s) • The agent cannot access the successors of a state except by actually trying all the actions in that state • Assumptions • The agent can recognize a state that it has visited before • Actions are deterministic • Optionally, an admissible heuristic function

online search problems • If some actions are irreversible, the agent may reach a dead end • If some goal state is reachable from every reachable state, the state space is “safely explorable”

online search agents • Online algorithm can expand only a node that it physically occupies • Offline algorithms can expand any node in fringe • Same principle as DFS

Online DFS function ONLINE_DFS-AGENT(s’) return an action input:s’, a percept identifying current state static: result, a table of the next state, indexed by action and state, initially empty unexplored, a stack that lists, for each visited state, the action not yet tried unbacktracked, a stack that lists, for each visited state, the predecessor states to which the agent has not yet backtracked s, a, the previous state and action, initially null if GOAL-TEST(s’) then return stop ifs’ is a new state thenunexplored[s’]  ACTIONS(s’) ifs is not null thendo result[a,s]  s’ add s to the front of unbackedtracked[s’] ifunexplored[s’] is emptythen ifunbacktracked[s’] is empty then return stop elsea an action b such that result[b, s’]=POP(unbacktracked[s’]) else a  POP(unexplored[s’]) s  s’ returna

Online DFS, example • Assume maze problem on 3x3 grid. • s’ = (1,1) is initial state • result, unexplored (UX), unbacktracked (UB), … are empty • s, a are also empty s’

Online DFS, example • GOAL-TEST((1,1))? • s’ != G thus false • (1,1) a new state? • True • ACTION((1,1)) → UX[(1,1)] • {RIGHT,UP} • s is null? • True (initially) • UX[(1,1)] empty? • False • POP(UX[(1,1)]) → a • a=UP • s = (1,1) • Return a s’=(1,1) s’

Online DFS, example • GOAL-TEST((1,2))? • s’ != G thus false • (1,2) a new state? • True • ACTION((1,2)) → UX[(1,2)] • {DOWN} • s is null? • false (s=(1,1)) • result[UP,(1,1)] ← (1,2) • UB[(1,2)]={(1,1)} • UX[(1,2)] empty? • False • a=DOWN, s=(1,2) • return a s’=(1,2) s’ s

Online DFS, example • GOAL-TEST((1,1))? • s’ != G thus false • (1,1) a new state? • false • s is null? • false (s=(1,2)) • result[DOWN,(1,2)] ← (1,1) • UB[(1,1)] = {(1,2)} • UX[(1,1)] empty? • False • a=RIGHT, s=(1,1) • return a s’=(1,1) s s’

Online DFS, example • GOAL-TEST((2,1))? • s’ != G thus false • (2,1) a new state? • True, UX[(2,1)]={RIGHT,UP,LEFT} • s is null? • false (s=(1,1)) • result[RIGHT,(1,1)] ← (2,1) • UB[(2,1)]={(1,1)} • UX[(2,1)] empty? • False • a=LEFT, s=(2,1) • return a s’=(2,1) s s’

Online DFS, example • GOAL-TEST((1,1))? • s’ != G thus false • (1,1) a new state? • false • s is null? • false (s=(2,1)) • result[LEFT,(2,1)] ← (1,1) • UB[(1,1)]={(2,1),(1,2)} • UX[(1,1)] empty? • True • UB[(1,1)] empty? False • a = an action b such that result[b,(1,1)]=(2,1) • b=RIGHT • a=RIGHT, s=(1,1) • Return a • And so on… s’=(1,1) s’ s

Online DFS • Worst case each node is visited twice. • An agent can go on a long walk even when it is close to the solution. • An online iterative deepening approach solves this problem. • Online DFS works only when actions are reversible.

Online local search • Hill-climbing is already online • One state is stored. • Bad performance due to local maxima • Random restarts impossible. • Solution1: Random walk introduces exploration • Selects one of actions at random, preference to not-yet-tried action • can produce exponentially many steps

Online local search • Solution 2: Add memory to hill climber • Store current best estimate H(s) of cost to reach goal • H(s) is initially the heuristic estimate h(s) • Afterward updated with experience (see below) • Learning real-time A* (LRTA*) The current position of agent

Learning real-time A*(LRTA*) function LRTA*-COST(s,a,s’,H) return an cost estimate if s’is undefined the return h(s) else returnc(s,a,s’) + H[s’] function LRTA*-AGENT(s’) return an action input:s’, a percept identifying current state static: result, a table of next state, indexed by action and state, initially empty H, a table of cost estimates indexed by state, initially empty s, a, the previous state and action, initially null if GOAL-TEST(s’) then return stop ifs’ is a new state (not in H) thenH[s’]  h(s’) unless s is null result[a,s]  s’ H[s]  min LRTA*-COST(s,b,result[b,s],H) b ACTIONS(s) a an action b in ACTIONS(s’) that minimizes LRTA*-COST(s’,b,result[b,s’],H) s  s’ returna

Outline • Ch 4. informed search • Ch 6. adversarial search • Optimal decisions • α-β pruning • Imperfect, real-time decisions

Games vs. search problems • Problem solving agent is not alone any more • Multiagent, conflict • Default: deterministic, turn-taking, two-player, zero sum game of perfect information • Perfect info. vs. imperfect, or probability • "Unpredictable" opponent  specifying a move for every possible opponent reply • Time limits  unlikely to find goal, must approximate * Environments with very many agents are best viewed as economies rather than games

Game formalization • Initial state • A successor function • Returns a list of (move, state) paris • Terminal test • Terminal states • Utility function (or objective function) • A numeric value for the terminal states • Game tree • The state space

Tic-tac-toe: Game tree (2-player, deterministic, turns)

Minimax • Perfect play for deterministic games: optimal strategy • Idea: choose move to position with highest minimax value = best achievable payoff against best play • E.g., 2-ply game: only two half-moves

Minimax algorithm

Problem of minimax search • Number of games states is exponential to the number of moves. • Solution: Do not examine every node • ==> Alpha-beta pruning • Remove branches that do not influence final decision • Revisit example …

Alpha-Beta Example Do DF-search until first leaf Range of possible values [-∞,+∞] [-∞, +∞]

Alpha-Beta Example (continued) [-∞,+∞] [-∞,3]

Alpha-Beta Example (continued) [3,+∞] [3,3]

Alpha-Beta Example (continued) [3,+∞] This node is worse for MAX [3,3] [-∞,2]

Alpha-Beta Example (continued) , [3,14] [3,3] [-∞,2] [-∞,14]

Alpha-Beta Example (continued) , [3,5] [3,3] [−∞,2] [-∞,5]

Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [−∞,2]

Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [-∞,2]

Properties of α-β • Pruning does not affect final result • Good move ordering improves effectiveness of pruning • With "perfect ordering," time complexity = O(bm/2) doubles depth of search

α is the value of the best (i.e., highest-value) choice found so far at any choice point along the path for max If v is worse than α, max will avoid it  prune that branch Define β similarly for min Why is it called α-β?

The α-β pruning algorithm

Resource limits • In reality, imperfect and real-time decisions are required • Suppose we have 100 secs, explore 104 nodes/sec106nodes per move • Standard approach: • cutoff test: • e.g., depth limit • evaluation function • = estimated desirability of position

Evaluation functions • For chess, typically linear weighted sum of features Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s) • e.g., w1 = 9 for queen, w2 = 5 for rook, … wn = 1 for pawn f1(s) = (number of white queens) – (number of black queens), etc.

Cutting off search MinimaxCutoff is identical to MinimaxValue except • Terminal-Test is replaced by Cutoff-Test • Utility is replaced by Eval Does it work in practice? bm = 106, b=35  m ≈ 4 4-ply lookahead is a hopeless chess player! • 4-ply ≈ human novice • 8-ply ≈ typical PC, human master • 12-ply ≈ Deep Blue, Kasparov

Games that include chance chance nodes • Backgammon: move all one’s pieces off the board • Branches leading from each chance node denote the possible dice rolls • Labeled with roll and the probability

Games that include chance • [1,1], [6,6] chance 1/36, all other chance 1/18 • Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16) • Cannot calculate definite minimax value, only expected value

Expected minimax value EXPECTED-MINIMAX-VALUE(n) = UTILITY(n) If n is a terminal maxs  successors(n) MINIMAX-VALUE(s) If n is a max node mins  successors(n) MINIMAX-VALUE(s) If n is a max node s  successors(n) P(s) * EXPECTEDMINIMAX(s) If n is a chance node These equations can be backed-up recursively all the way to the root of the game tree.

Position evaluation with chance nodes • Left, A1 is best • Right, A2 is best • Outcome of evaluation function (hence the agent behavior) may change when values are scaled differently. • Behavior is preserved only by a positive linear transformation of EVAL.

AI in game (IV)

AI in game (IV)

Presentation Transcript

Game Design, Game AI

Introduction to Game AI

Game AI

An AI Game Project

Game AI versus AI: An Introduction to AI Game Programming

AI Review Game 2

Game AI

AI and Game Programming

AI and Game Programming

Game AI

AI-TEM TESTING AI IN COMMERCIAL GAME WITH EMULATOR

An AI Game Project

Game Programming 11 AI in game programming

Game Programming (Game AI Technologies)

Game AI

Advanced Racing Game AI in PURE

Talk in AI Game course

Common Game AI Techniques

Game AI

Game AI versus AI: An Introduction to AI Game Programming