CSC 412: AI Adversarial Search

CSC 412: AI Adversarial Search Bikramjit Banerjee Partly based on material available from the internet

Game search problems • Search problems • Only problem solver actions can change the state of the environment • Game search problems • Multiple problems solvers (players) acting on the same environment • Players’ actions can be • Cooperative: common goal state • Adversarial: • a win for one player is a loss for the other • Example: zero-sum games like chess, tic-tac-toe • A whole spectrum between purely adversarial and purely cooperative games • We first look at adversarial two-player games with turn-taking

Game Playing: State of the art • Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions • Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue examined 200 million positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply • Othello: human champions refuse to compete against computers, which are too good. • Go: human champions refused to compete against computers, which were too bad. In go, b > 300, so most programs used pattern knowledge bases to suggest plausible moves. Google’s Deepmind project created AlphaGo, that beat Lee Sedol (Go World Champion) 4-1 in March 2016.

AlphaGo vs. Lee Sedol

Two Player Games • Max always moves first • Min is the opponent • states? Boards faced by Max/Min • actions? Players’ moves • goal test? Terminal board test • path cost? Utility function for each player Max Vs Min

Max Max Min ... ... ... ... ... ... ... ... ... O 1 -1 Search tree: Alternate move games Terminal States Utility

A1 A2 A3 A31 A21 A11 A12 A32 A22 A23 A33 A13 2 14 3 12 4 5 6 8 2 A simple abstract game An action by one player is called a ply, two ply (an action and a counter action) is called a move.

The Minimax Algorithm • Generate the game tree down to the terminal nodes • Apply the utility function to the terminal nodes • For a S set of sibling nodes, pass up to the parent • the lowest value in S if the siblings are • the largest value in S if the siblings are • Recursively do the above, until the backed-up values reach the initial state • The value of the initial state is the minimum score for Max

3 max A1 A2 A3 min 3 2 2 A21 A11 A31 A12 A32 A22 A13 A23 A33 3 14 2 12 4 5 6 8 2 Minimax Decision MAX • In this game Max’s best move is A1, because he is guaranteed a score of at least 3 MIN MAX

Properties of Minimax • Complete? Yes (if tree is finite) • Optimal? Yes (against an optimal opponent) • Time complexity? O(bm) • Space complexity? O(bm) (depth-first exploration) • For chess, b ≈ 35, m ≈100 for "reasonable" games finding optimal solution using Minimax is infeasible • Potential improvement to Minimax running time • Depth limited search • a-b pruning

cutoff Depth limited Minimax • One possible solution is to do depth limited Minimax search • Search the game tree as deep as you can in the given time • Evaluate the fringe nodes with the utility function • Back up the values to the root • Choose best move, repeat … but we don’t have time, so we will explore it to some manageable depth. We would like to do Minimax on this full game tree...

X O X X O O Example Utility Function Tic Tac Toe Assume Max is using “X” e(n) = if n is win for Max, +  if n is win for Min, -  else (number of rows, columns and diagonals available to Max) - (number of rows, columns and diagonals available to Min) e(n) = 6 - 4 = 2 e(n) = 4 - 3 = 1

Example Utility Function Chess I Assume Max is “White” Assume each piece has the following values pawn = 1; knight = 3; bishop = 3; rook = 5; queen = 9; let w = sum of the value of white pieces let b = sum of the value of black pieces e(n) = w - b w + b Note that this value ranges between 1 and -1

Example Utility Function Chess II The previous evaluation function naively gave the same weight to a piece regardless of its position on the board Let Xi be the number of squares the ith piece attacks e(n) =same as before w=piece1value * X1 + piece2value * X2 + ...

Utility Functions • the ability to play a good game is highly dependant on the evaluation functions • How do we come up with good evaluation functions? • Interview an expert • Machine learning

α-β Pruning • We have seen how to use Minimax search to play an optimal game • We have seen that because of time limitations we may have to use a cutoff depth to make the search tractable. Using a cutoff causes problem because of the “horizon” effect • Is there some way we can search deeper in the same amount of time? • Yes! Use Alpha-Beta Pruning Game winning move. Best move before cutoff... … but all its children are losing moves

max min 2 3 A11 A31 A32 A12 A33 A13 14 3 12 5 2 8 α-β Pruning “If you have an idea that is surely bad, don't take the time to see how truly awful it is” -- Pat Winston 3 A1 A2 A3  2 A21 A22 A23 2

MAX MIN MAX 3 6 2 2 1 9 3 1 5 4 7 5 4 5 α-β pruning: Another example Example courtesy of Dr. Milos Hauskrecht

MAX MIN MAX 3 6 2 2 1 9 3 1 5 4 7 5 4 5 α-β pruning

MAX MIN !! MAX 3 6 2 2 1 9 3 1 5 4 7 5 4 5 α-β pruning

MAX MIN MAX 3 6 2 2 1 9 3 1 5 4 7 4 5 α-β pruning

MAX !! MIN MAX 3 6 2 2 1 9 3 1 5 4 7 5 4 5 α-β pruning

MAX MIN MAX 3 6 2 2 1 9 3 1 5 4 7 5 4 5 nodes that were never explored α-β pruning Higher values first below MAX level Lower values first below MIN level

α-β Pruning • Guaranteed to compute the same value for root as Minimax • In the worst case α-β does NO pruning, examining bd leaf nodes, where each node has b children and a d-ply search is performed • In the best case, α-β will examine only 2bd/2 leaf nodes. Hence if you hold fixed the number of leaf nodes then you can search twice as deep as Minimax • The best case occurs when each player's best move is the leftmost alternative (i.e., the first child generated). So, at MAX nodes the child with the largest value is generated first, and at MIN nodes the child with the smallest value is generated first -> order the operators carefully • In the chess program Deep Blue, they found empirically that α-β pruning meant that the average branching factor at each node was ~6 instead of ~ 35-40

Non Zero-sum games • Similar to minimax: Utilities are now tuples • Each player maximizes their own entry at each node • Propagate (or back up) nodes from children

Stochastic 2-player games • E.g. backgammon • Expectiminimax • Environment is an extra player that moves after each agent • At chance nodes take expectations, otherwise like minimax

Stochastic 2-player games • Dice rolls increase b: 21 possible rolls with 2 dice • Backgammon ≈ 20 legal moves • Depth 4 = 20 x (21 x 20)3 1.2 x 109 • As depth increases, probability of reaching a given node shrinks • So value of lookahead is diminished • So limiting depth is less damaging • But pruning is less possible • TDGammon uses depth-2 search + very good eval function + reinforcement learning: world-champion level play

CSC 412: AI Adversarial Search