Heuristic Search and Game Playing

Heuristic Search andGame Playing CMSC 100Tuesday, November 10, 2009Prof. Marie desJardins

Summary of Topics • What is heuristic search? • Examples of search problems • Search methods • Uninformed search • Informed search • Local search • Game trees

Building Goal-Based Intelligent Agents To build a goal-based agent we need to answer the following questions: • What is the goal to be achieved? • What are the actions? • What relevant information is necessary to encode in order to describe the state of the world, describe the available transitions, and solve the problem? Initial state Goal state Actions

Representing States • What information is necessary to encode about the world to sufficiently describe all relevant aspects to solving the goal? • That is, what knowledge needs to be represented in a state description to adequately describe the current state or situation of the world? • The size of a problem is usually described in terms of the number of states that are possible. • Tic-Tac-Toe has about 39 states. • Checkers has about 1040 states. • Rubik’s Cube has about 1019 states. • Chess has about 10120 states in a typical game.

Real-world Search Problems • Route finding (GPS algorithms, Google Maps) • Touring (traveling salesman) • Logistics • VLSI layout • Robot navigation • Learning

8-Puzzle Given an initial configuration of 8 numbered tiles on a 3 x 3 board, move the tiles into a desired goal configuration of the tiles.

8-Puzzle Encoding • State: 3 x 3 array configuration of the tiles on the board. • 4 Operators: Move Blank Square Left, Right, Up or Down. • This is a more efficient encoding of the operators than one in which each of four possible moves for each of the 8 distinct tiles is used. • Initial State: A particular configuration of the board. • Goal: A particular configuration of the board. • What does the state space look like?

Missionaries and Cannibals There are 3 missionaries, 3 cannibals, and 1 boat that can carry up to two people on one side of a river. • Goal: Move all the missionaries and cannibals across the river. • Constraint: Missionaries can never be outnumbered by cannibals on either side of river; otherwise, the missionaries are killed. • State: Configuration of missionaries and cannibals and boat on each side of river. • Operators: Move boat containing some set of occupants across the river (in either direction) to the other side. • What’s the solution??

M&C States • State: Configuration of missionaries and cannibals and boat on each side of river. • 3 missionaries & 3 cannibals & boat on left == 0 missionaries & 0 cannibals on right • In general, X missionaries & Y cannibals & boat on left == (3-X) missionaries & (3-Y) cannibals on right • Similarly, X missionaries & Y cannibals & no boat on left == (3-X) missionaries & (3-Y) cannibals & boat on right

M & C States (# M, C on left) 3M, 3C 1M, 3C 3M, 2C 1M, 2C 3M, 1C 1M, 1C 3M, 0C 1M, 0C 2M, 3C 0M, 3C 2M, 2C 0M, 2C 2M, 1C 0M, 1C 2M, 0C 0M, 0C

Illegal States 3M, 3C 1M, 3C 3M, 2C 1M, 2C 3M, 1C 1M, 1C 3M, 0C 1M, 0C 2M, 3C 0M, 3C No state with more cannibals than missionaries on either side is legal! 2M, 2C 0M, 2C 2M, 1C 0M, 1C 2M, 0C 0M, 0C

State Transitions (Legal Moves) 3M, 3C 3M, 3C 3M, 2C 3M, 2C 3M, 1C 3M, 1C Boat on left 3M, 0C 3M, 0C Boat on right 2M, 2C 2M, 2C 1M, 1C 1M, 1C 0M, 3C 0M, 3C 0M, 2C 0M, 2C 0M, 1C 0M, 1C 0M, 0C 0M, 0C

State Transitions (Legal Moves) 3M, 3C 3M, 3C 3M, 2C 3M, 2C 3M, 1C 3M, 1C Boat on left Boat on right 3M, 0C 3M, 0C 2M, 2C 2M, 2C 1M, 1C 1M, 1C 0M, 3C 0M, 3C 0M, 2C 0M, 2C 0M, 1C 0M, 1C 0M, 0C 0M, 0C

Missionaries and Cannibals Solution Near sideFar side 0 Initial setup: MMMCCC B - 1 Two cannibals cross over: MMMC B CC 2 One comes back: MMMCC B C 3 Two cannibals go over again: MMM B CCC 4 One comes back: MMMC B CC 5 Two missionaries cross: MC B MMCC 6 A missionary & cannibal return: MMCC B MC 7 Two missionaries cross again: CC B MMMC 8 A cannibal returns: CCC B MMM 9 Two cannibals cross: C B MMMCC 10 One returns: CC B MMMC 11 And brings over the third: - B MMMCCC

Solution Cost • A solution is a sequence of operators that is associated with a path in a state space from a start node to a goal node. • The cost of a solution is the sum of the arc costs on the solution path. • If all arcs have the same (unit) cost, then the solution cost is just the length of the solution (number of steps / state transitions)

Evaluating search strategies • Completeness • Guarantees finding a solution whenever one exists • Time complexity • How long (worst or average case) does it take to find a solution? Usually measured in terms of the number of nodes expanded • Space complexity • How much space is used by the algorithm? Usually measured in terms of the maximum size of the “nodes” list during the search • Optimality/Admissibility • If a solution is found, is it guaranteed to be an optimal one? That is, is it the one with minimum cost?

Types of Search Methods • Uninformed search strategies • Also known as “blind search,” uninformed search strategies use no information about the likely “direction” of the goal node(s) • Variations on “generate and test” or “trial and error” approach • Uninformed search methods: breadth-first, depth-first, uniform-cost • Informed search strategies • Also known as “heuristic search,” informed search strategies use information about the domain to (try to) (usually) head in the general direction of the goal node(s) • Informed search methods: greedy search, (A*) • Local search strategies • Pick a starting solution (that might not be very good) and incrementally try to improve it • Local search methods: hill-climbing, genetic algorithms • Game trees • Search strategies for situations where you have an opponent who gets to make some of the moves • Try to pick moves that will let you win most of the time by “looking ahead” to see what your opponent might do

Uninformed Search

A Simple Search Space S C B A D G E 8 3 1 3 15 7 20 5

Depth-First (DFS) • Enqueue nodes on nodes in LIFO (last-in, first-out) order. That is, nodes used as a stack data structure to order nodes. • Like walking through a maze, always following the rightmost branch • When search hits a dead end, back up one step at a time • Can find long solutions quickly if lucky (and short solutions slowly if unlucky!)

Depth-First Search Solution Expanded node Nodes list { S0 } S0 { A3 B1 C8 } A3 { D6 E10 G18 B1 C8 } D6 { E10 G18 B1 C8 } E10 { G18 B1 C8 } G18 { B1 C8 } Solution path found is S A G, cost 18 Number of nodes expanded (including goal node) = 5

Breadth-First • Enqueue nodes on nodes in FIFO (first-in, first-out) order. • Like having a team of “gremlins” who can keep track of every branching point in a maze • Only one gremlin can move at a time, and breadth-first search moves them each in turn, duplicating the gremlins if they reach a “fork in the road” – one for each fork! • Finds the shortest path, but sometimes very slowly (and at the cost of generating lots of gremlins!)

Breadth-First Search Solution Expanded node Nodes list { S0 } S0 { A3 B1 C8 } A3 { B1 C8 D6 E10 G18 } B1 { C8 D6 E10 G18 G21 } C8 { D6 E10 G18 G21 G13 } D6 { E10 G18 G21 G13 } E10 { G18 G21 G13 } G18 { G21 G13 } Solution path found is S A G , cost 18 Number of nodes expanded (including goal node) = 7

Uniform Cost Search • Enqueue nodes by path cost. That is, let priority = cost of the path from the start node to the current node. Sort nodes by increasing value of cost (try low-cost nodes first) • Called “Dijkstra’s Algorithm” in the algorithms literature; similar to “Branch and Bound Algorithm” from operations research • Finds the very best path, but can take a very long time and use a very large amount of memory!

Uniform-Cost Search Solution Expanded node Nodes list { S0 } S0 { B1 A3 C8 } B1 { A3 C8 G21 } A3 { D6 C8 E10 G18 G21 } D6 { C8 E10 G18 G21 } C8 { E10 G13 G18 G21 } E10 { G13 G18 G21 } G13 { G18 G21 } Solution path found is S C G, cost 13 Number of nodes expanded (including goal node) = 7

Holy Grail Search Expanded node Nodes list { S0 } S0 { C8 A3 B1 } C8 { G13 A3 B1 } G13 { A3 B1 } Solution path found is S C G, cost 13 (optimal) Number of nodes expanded (including goal node) = 3 (as few as possible!) If only we knew where we were headed…

Informed Search

What’s a Heuristic? From WordNet (r) 1.6 heuristic adj 1: (computer science) relating to or using a heuristic rule 2: of or relating to a general formulation that serves to guide investigation [ant: algorithmic] n : a commonsense rule (or set of rules) intended to increase the probability of solving some problem [syn: heuristic rule, heuristic program]

Informed Search: Use What You Know! • Add domain-specific information to select the best path along which to continue searching • Define a heuristic function, h(n), that estimates the “goodness” of a node n. • Most often, h(n) = estimated cost (or distance) of minimal cost path from n to a goal state. • The heuristic function is an estimate, based on domain-specific information that is computable from the current state description, of how close we are to a goal

Heuristic Functions • All domain knowledge used in the search is encoded in the heuristic functionh. • Heuristic search is an example of a “weak method” because of the limited way that domain-specific information is used to solve the problem. • Examples: • Missionaries and Cannibals: Number of people on starting river bank • 8-puzzle: Number of tiles out of place • 8-puzzle: Sum of distances each tile is from its goal position • In general: • h(n)  0 for all nodes n • h(n) = 0 implies that n is a goal node • h(n) = infinity implies that n is a dead end from which a goal cannot be reached

Example n g(n) h(n) f(n) h*(n) S 0 8 8 13 A 3 8 11 15 B 1 4 5 20 C 8 3 11 5 D 6  E 10  G 13 0 13 0 • g(n) is the (lowest observed) cost from the start node to n • H(n) is the estimated cost from n to the goal node • F(n) is the heuristic value (f(n) = g(n) + h(n), estimated total cost from start to goal through n) • h*(n) is the (hypothetical) perfect heuristic • Since h(n)  h*(n) for all n, h is admissible • Optimal path = S C G with cost 13

Greedy Search g2 a h b c d e g i h=2 h=4 h=1 h=1 h=0 h=1 h=1 h=0 • Use as an evaluation function f(n) = h(n), sorting nodes by increasing values of f • Selects node to expand believed to be closest (hence “greedy”) to a goal node (i.e., select node with smallest f value) • Not complete • Not admissible, as in the example. Assuming all arc costs are 1, then greedy search will find goal g, which has a solution cost of 5, while the optimal solution is the path to goal g2 with cost 3

Greedy Search f(n) = h(n) node expanded nodes list { S8 } S { C3 B4 A8 } C { G0 B4 A8 } G { B4 A8 } • Solution path found is S C G, 3 nodes expanded. • See how fast the search is!! But it is not always optimal.

Local Search

Local Search • Another approach to search involves starting with an initial guess at a solution and gradually improving it until it is a legal solution or the best that can be found. • Also known as “incremental improvement” search • Some examples: • Hill climbing • Genetic algorithms

Hill Climbing on a Surface of States Height Defined by Evaluation Function

Hill-Climbing Search • If there exists a successor s for the current state n such that • h(s) < h(n) • h(s)  h(t) for all the successors t of n, • then move from n to s. Otherwise, halt at n. • Looks one step ahead to determine if any successor is better than the current state; if there is, move to the best successor. • Similar to Greedy search in that it uses h, but does not allow backtracking or jumping to an alternative path since it doesn’t “remember” where it has been. • Not complete since the search will terminate at "local minima," "plateaus," and "ridges."

Hill Climbing Example 2 8 3 1 7 4 5 2 8 4 3 4 7 6 5 2 3 1 8 1 6 5 7 7 1 3 6 4 7 6 3 1 8 5 8 6 5 1 4 3 8 4 7 6 5 2 2 2 start h = 0 goal h = -4 -2 -5 -5 h = -3 h = -1 -3 -4 h = -2 h = -3 -4 f(n) = -(number of tiles out of place)

Drawbacks of Hill Climbing • Problems: • Local Maxima: peaks that aren’t the highest point in the space • Plateaus: the space has a broad flat region that gives the search algorithm no direction (random walk) • Ridges: flat like a plateau, but with dropoffs to the sides; steps to the North, East, South and West may go down, but a step to the NW may go up. • Remedies: • Random restart • Problem reformulation • Some problem spaces are great for hill climbing and others are terrible.

Example of a Local Optimum 5 7 4 8 7 3 4 6 2 3 8 7 6 5 1 2 1 1 5 2 4 3 2 5 1 7 4 6 6 3 8 2 5 8 7 4 6 3 8 1 -4 start goal -4 0 -3 -4

Genetic Algorithms • Start with k random states (the initial population) • New states are generated by “mutating” a single state or “reproducing” (combining) two parent states (selected according to their fitness) • Encoding used for the “genome” of an individual strongly affects the behavior of the search • Genetic algorithms / genetic programming are a large and active area of research

Summary: Informed Search • Best-first search is general search where the minimum-cost nodes (according to some measure) are expanded first. • Greedy search uses minimal estimated cost h(n) to the goal state as measure. This reduces the search time, but the algorithm is neither complete nor optimal. • A* search combines uniform-cost search and greedy search: f(n) = g(n) + h(n). A* handles state repetitions and h(n) never overestimates. • A* is complete and optimal, but space complexity is high. • The time complexity depends on the quality of the heuristic function. • Hill-climbing algorithms keep only a single state in memory, but can get stuck on local optima. • Genetic algorithms can search a large space by modeling biological evolution.

Game Playing

Why Study Games? • Clear criteria for success • Offer an opportunity to study problems involving {hostile, adversarial, competing} agents. • Historical reasons • Fun • Interesting, hard problems thatrequire minimal “initial structure” • Games often define very large search spaces • chess 35100 nodes in search tree, 1040 legal states

State of the Art • How good are computer game players? • Chess: • Deep Blue beat Gary Kasparov in 1997 • Garry Kasparav vs. Deep Junior (Feb 2003): tie! • Kasparov vs. X3D Fritz (November 2003): tie! http://www.cnn.com/2003/TECH/fun.games/11/19/kasparov.chess.ap/ • Checkers: Chinook (an AI program with a very large endgame database) is the world champion (checkers is “solved”!) • Go: Computer players are competitive at a professional level • Bridge: “Expert-level” computer players exist (but no world champions yet!) • Poker: Computer team beat a human team, using statistical modeling and adaptation:http://www.cs.ualberta.ca/~games/poker/man-machine/ • Good places to learn more: • http://www.cs.ualberta.ca/~games/ • http://www.cs.unimass.nl/icga

Chinook • Chinook is the World Man-Machine Checkers Champion, developed by researchers at the University of Alberta. • It earned this title by competing in human tournaments, winning the right to play for the (human) world championship, and eventually defeating the best players in the world. • Visit http://www.cs.ualberta.ca/~chinook/ to play a version of Chinook over the Internet. • The developers claim to have fully analyzed the game of checkers, and can provably always win if they play black • “One Jump Ahead: Challenging Human Supremacy in Checkers” Jonathan Schaeffer, University of Alberta (496 pages, Springer. $34.95, 1998).

Ratings of Human and Computer Chess Champions

Typical Game Setting • 2-person game • Players alternate moves • Zero-sum: one player’s loss is the other’s gain • Perfect information: both players have access to complete information about the state of the game. No information is hidden from either player. • No chance (e.g., using dice) involved • Examples: Tic-Tac-Toe, Checkers, Chess, Go, Nim, Othello • Not: Bridge, Solitaire, Backgammon, ...

Let’s Play Nim! • Seven tokens (coins, sticks, whatever) • Each player must take either 1 or 2 tokens • Whoever takes the last token wins • You can go first…

Heuristic Search and Game Playing