Outline • Games of perfect information - perfect play • The minimax strategy • Multiplayer games • Alpha-Beta pruning • Games of imperfect information
Games • Competitive environments • goals of two agents are in conflict– adversarial search • Perfect play • deterministic and fully observable • turn-taking: actions of two players (agents) alternate • zero-sum: the utility values at the end of the game are equal and opposite (adversarial) • e.g., chess, winner (+1) and loser (-1) • Types of games
Define game as a search problem • initial state • the board position, the player to move, etc. • successor function • generates a list of (move, state) pairs • terminal test • decides when the game is over • terminal states: states when the game has ended. • utility function • gives a numeric value for the terminal states. • zero-sum games • game tree • defined by the initial state and the legal moves for each side
Game tree for the game of tic-tac-toe • High values are good for MAX and bad for MIN
Optimal contingent strategy • Optimal strategy • leads to outcomes at least as good as any other strategy when one is playing a infallible opponent – infeasible in practice. • 2-ply game • the tree is one move deep, consisting of two half-moves, each of which is a ply. • MAX’s moves in the states resulting from every possible response by MIN • minimax value of a node: the utility of being the corresponding state • MAX (MIN) prefers to move to a state of maximum (minimum) value. • minimax decision at the root.
The minimax algorithm • computes the minimax decision from the current state • recursion proceeds down to the leaves • minimax values are backed up
The property of the minimax algorithm • Complete? • Optimal? • Time? • Space?
Optimal decisions in multiplayer games • vector form: e.g. utility is <vA = 1, vB = 2, vC = 6> • pick up move (successor) having the highest value
Alpha-Beta Pruning • compute the minimax decision without looking at every node • pruning away branches that cannot possibly influence the final decision • Alpha: value of best choice for MAX • Beta: value of best choice for MIN
Alpha-Beta Pruning (cont’d) • MINIMAX-VALUE (root) = max(min(3,12,8), min(2, x, y), min(14,5,2)) = max(3, min(2,x,y), 2) = max (3, z, 2) where z 2 = 3 • the value of the root (minimax decision) is independent of the values of the pruned leaves x and y. • depends on the order in which the successors are examined
How good is the Alpha-Beta pruning? e.g., try captures first, then threats, then forward moves, and then backward moves effective branching factor becomes
Imperfect decisions • Moves must be made in a reasonable (minutes) amount of time • Using Alpha-Beta pruning, the depth is still not practical if we insist on reaching the terminal states • should cut off the search earlier by applying a heuristicevaluation function to states • evaluation function estimates the utility of the position • use cut off test instead of terminal test • turning nonterminal nodes into terminal leaves
How to design good evaluation functions? • Requirements • order the terminal states in the same way as the true utility function • must not take too long • chances of winning • uncertain about the final outcomes because of the cut off • categories or equivalence classes of states: • the states have the same values, leading to wins, losses, or draws • the value of evaluation function should reflect the proportion of states with each outcome: wins (72%), losses (20%), or draws (8%) • weighted average (expected) value • requires experience and too many categories
How to design good evaluation functions? • In practice • computes separate numerical contributions from each feature and then combines them to find the total value • material value for each piece, e.g., pawn 1, knight/bishop 3, rook 5, queen 9 • weighted linear function • nonlinear combinations of features if the contribution of each feature is depends on values of the other features.