470 likes | 616 Views
AI in game (IV). Oct. 11, 2006. So far. Artificial Intelligence: A Modern Approach Stuart Russell and Peter Norvig Prentice Hall, 2 nd ed. Chapter 1: AI taxonomy Chapter 2: agents Chapter 3: uninformed search Chapter 4: informed search. From now on.
E N D
AI in game (IV) Oct. 11, 2006
So far • Artificial Intelligence: A Modern Approach • Stuart Russell and Peter Norvig • Prentice Hall, 2nd ed. • Chapter 1: AI taxonomy • Chapter 2: agents • Chapter 3: uninformed search • Chapter 4: informed search
From now on • Artificial Intelligence: A Modern Approach • Chapter 4. • Chapter 6: adversarial search • Network part • Learning (maybe from the same textbook) • Game AI techniques
Outline • Ch 4. informed search • Online search • Ch 6. adversarial search • Optimal decisions • α-β pruning • Imperfect, real-time decisions
Offline search vs. online search • Offline search agents • Compute a solution before setting foot in the real world • Online search agents • Interleave computation and action • E.g. takes an action and then observes environments and then computes the next action • Necessary for an exploration problem • States and actions are unknown • E.g. robot in a new building, or labyrinth
online search problems • Agents are assumed to know only • Actions(s): returns a list of actions allowed in states • c(s,a,s’): this step-cost cannot be used until the agent knows that s’ is the outcome • Goal-test(s) • The agent cannot access the successors of a state except by actually trying all the actions in that state • Assumptions • The agent can recognize a state that it has visited before • Actions are deterministic • Optionally, an admissible heuristic function
online search problems • If some actions are irreversible, the agent may reach a dead end • If some goal state is reachable from every reachable state, the state space is “safely explorable”
online search agents • Online algorithm can expand only a node that it physically occupies • Offline algorithms can expand any node in fringe • Same principle as DFS
Online DFS function ONLINE_DFS-AGENT(s’) return an action input:s’, a percept identifying current state static: result, a table of the next state, indexed by action and state, initially empty unexplored, a stack that lists, for each visited state, the action not yet tried unbacktracked, a stack that lists, for each visited state, the predecessor states to which the agent has not yet backtracked s, a, the previous state and action, initially null if GOAL-TEST(s’) then return stop ifs’ is a new state thenunexplored[s’] ACTIONS(s’) ifs is not null thendo result[a,s] s’ add s to the front of unbackedtracked[s’] ifunexplored[s’] is emptythen ifunbacktracked[s’] is empty then return stop elsea an action b such that result[b, s’]=POP(unbacktracked[s’]) else a POP(unexplored[s’]) s s’ returna
Online DFS, example • Assume maze problem on 3x3 grid. • s’ = (1,1) is initial state • result, unexplored (UX), unbacktracked (UB), … are empty • s, a are also empty s’
Online DFS, example • GOAL-TEST((1,1))? • s’ != G thus false • (1,1) a new state? • True • ACTION((1,1)) → UX[(1,1)] • {RIGHT,UP} • s is null? • True (initially) • UX[(1,1)] empty? • False • POP(UX[(1,1)]) → a • a=UP • s = (1,1) • Return a s’=(1,1) s’
Online DFS, example • GOAL-TEST((1,2))? • s’ != G thus false • (1,2) a new state? • True • ACTION((1,2)) → UX[(1,2)] • {DOWN} • s is null? • false (s=(1,1)) • result[UP,(1,1)] ← (1,2) • UB[(1,2)]={(1,1)} • UX[(1,2)] empty? • False • a=DOWN, s=(1,2) • return a s’=(1,2) s’ s
Online DFS, example • GOAL-TEST((1,1))? • s’ != G thus false • (1,1) a new state? • false • s is null? • false (s=(1,2)) • result[DOWN,(1,2)] ← (1,1) • UB[(1,1)] = {(1,2)} • UX[(1,1)] empty? • False • a=RIGHT, s=(1,1) • return a s’=(1,1) s s’
Online DFS, example • GOAL-TEST((2,1))? • s’ != G thus false • (2,1) a new state? • True, UX[(2,1)]={RIGHT,UP,LEFT} • s is null? • false (s=(1,1)) • result[RIGHT,(1,1)] ← (2,1) • UB[(2,1)]={(1,1)} • UX[(2,1)] empty? • False • a=LEFT, s=(2,1) • return a s’=(2,1) s s’
Online DFS, example • GOAL-TEST((1,1))? • s’ != G thus false • (1,1) a new state? • false • s is null? • false (s=(2,1)) • result[LEFT,(2,1)] ← (1,1) • UB[(1,1)]={(2,1),(1,2)} • UX[(1,1)] empty? • True • UB[(1,1)] empty? False • a = an action b such that result[b,(1,1)]=(2,1) • b=RIGHT • a=RIGHT, s=(1,1) • Return a • And so on… s’=(1,1) s’ s
Online DFS • Worst case each node is visited twice. • An agent can go on a long walk even when it is close to the solution. • An online iterative deepening approach solves this problem. • Online DFS works only when actions are reversible.
Online local search • Hill-climbing is already online • One state is stored. • Bad performance due to local maxima • Random restarts impossible. • Solution1: Random walk introduces exploration • Selects one of actions at random, preference to not-yet-tried action • can produce exponentially many steps
Online local search • Solution 2: Add memory to hill climber • Store current best estimate H(s) of cost to reach goal • H(s) is initially the heuristic estimate h(s) • Afterward updated with experience (see below) • Learning real-time A* (LRTA*) The current position of agent
Learning real-time A*(LRTA*) function LRTA*-COST(s,a,s’,H) return an cost estimate if s’is undefined the return h(s) else returnc(s,a,s’) + H[s’] function LRTA*-AGENT(s’) return an action input:s’, a percept identifying current state static: result, a table of next state, indexed by action and state, initially empty H, a table of cost estimates indexed by state, initially empty s, a, the previous state and action, initially null if GOAL-TEST(s’) then return stop ifs’ is a new state (not in H) thenH[s’] h(s’) unless s is null result[a,s] s’ H[s] min LRTA*-COST(s,b,result[b,s],H) b ACTIONS(s) a an action b in ACTIONS(s’) that minimizes LRTA*-COST(s’,b,result[b,s’],H) s s’ returna
Outline • Ch 4. informed search • Ch 6. adversarial search • Optimal decisions • α-β pruning • Imperfect, real-time decisions
Games vs. search problems • Problem solving agent is not alone any more • Multiagent, conflict • Default: deterministic, turn-taking, two-player, zero sum game of perfect information • Perfect info. vs. imperfect, or probability • "Unpredictable" opponent specifying a move for every possible opponent reply • Time limits unlikely to find goal, must approximate * Environments with very many agents are best viewed as economies rather than games
Game formalization • Initial state • A successor function • Returns a list of (move, state) paris • Terminal test • Terminal states • Utility function (or objective function) • A numeric value for the terminal states • Game tree • The state space
Minimax • Perfect play for deterministic games: optimal strategy • Idea: choose move to position with highest minimax value = best achievable payoff against best play • E.g., 2-ply game: only two half-moves
Problem of minimax search • Number of games states is exponential to the number of moves. • Solution: Do not examine every node • ==> Alpha-beta pruning • Remove branches that do not influence final decision • Revisit example …
Alpha-Beta Example Do DF-search until first leaf Range of possible values [-∞,+∞] [-∞, +∞]
Alpha-Beta Example (continued) [-∞,+∞] [-∞,3]
Alpha-Beta Example (continued) [-∞,+∞] [-∞,3]
Alpha-Beta Example (continued) [3,+∞] [3,3]
Alpha-Beta Example (continued) [3,+∞] This node is worse for MAX [3,3] [-∞,2]
Alpha-Beta Example (continued) , [3,14] [3,3] [-∞,2] [-∞,14]
Alpha-Beta Example (continued) , [3,5] [3,3] [−∞,2] [-∞,5]
Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [−∞,2]
Alpha-Beta Example (continued) [3,3] [2,2] [3,3] [-∞,2]
Properties of α-β • Pruning does not affect final result • Good move ordering improves effectiveness of pruning • With "perfect ordering," time complexity = O(bm/2) doubles depth of search
α is the value of the best (i.e., highest-value) choice found so far at any choice point along the path for max If v is worse than α, max will avoid it prune that branch Define β similarly for min Why is it called α-β?
Resource limits • In reality, imperfect and real-time decisions are required • Suppose we have 100 secs, explore 104 nodes/sec106nodes per move • Standard approach: • cutoff test: • e.g., depth limit • evaluation function • = estimated desirability of position
Evaluation functions • For chess, typically linear weighted sum of features Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s) • e.g., w1 = 9 for queen, w2 = 5 for rook, … wn = 1 for pawn f1(s) = (number of white queens) – (number of black queens), etc.
Cutting off search MinimaxCutoff is identical to MinimaxValue except • Terminal-Test is replaced by Cutoff-Test • Utility is replaced by Eval Does it work in practice? bm = 106, b=35 m ≈ 4 4-ply lookahead is a hopeless chess player! • 4-ply ≈ human novice • 8-ply ≈ typical PC, human master • 12-ply ≈ Deep Blue, Kasparov
Games that include chance chance nodes • Backgammon: move all one’s pieces off the board • Branches leading from each chance node denote the possible dice rolls • Labeled with roll and the probability
Games that include chance • [1,1], [6,6] chance 1/36, all other chance 1/18 • Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16) • Cannot calculate definite minimax value, only expected value
Expected minimax value EXPECTED-MINIMAX-VALUE(n) = UTILITY(n) If n is a terminal maxs successors(n) MINIMAX-VALUE(s) If n is a max node mins successors(n) MINIMAX-VALUE(s) If n is a max node s successors(n) P(s) * EXPECTEDMINIMAX(s) If n is a chance node These equations can be backed-up recursively all the way to the root of the game tree.
Position evaluation with chance nodes • Left, A1 is best • Right, A2 is best • Outcome of evaluation function (hence the agent behavior) may change when values are scaled differently. • Behavior is preserved only by a positive linear transformation of EVAL.