Lecture 6 adversarial search games
Download
1 / 43

Lecture 6: - PowerPoint PPT Presentation


  • 283 Views
  • Updated On :

Lecture 6: Adversarial Search & Games. Reading: Ch. 6, AIMA. Adversarial search. So far, single agent search – no opponents or collaborators Multi-agent search: Playing a game with an opponent: adversarial search

Related searches for Lecture 6:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Lecture 6:' - Patman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Lecture 6 adversarial search games l.jpg

Lecture 6:Adversarial Search & Games

Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003


Adversarial search l.jpg
Adversarial search

  • So far, single agent search – no opponents or collaborators

  • Multi-agent search:

    • Playing a game with an opponent: adversarial search

    • Economies: even more complex, societies of cooperative and non-cooperative agents

  • Game playing and AI:

    • Games can be complex, require (?) human intelligence

    • Have to evolve in “real-time”

    • Well-defined problems

    • Limited scope

Rutgers CS440, Fall 2003


Games and ai l.jpg
Games and AI

Rutgers CS440, Fall 2003


Games and search l.jpg
Games and search

  • Traditional search: single agent, searches for its well-being, unobstructed

  • Games: search against an opponent

  • Consider a two player board game:

    • e.g., chess, checkers, tic-tac-toe

    • board configuration: unique arrangement of "pieces"

  • Representing board games as search problem:

    • states: board configurations

    • operators: legal moves

    • initial state: current board configuration

    • goal state: winning/terminal board configuration

Rutgers CS440, Fall 2003


Wrong representation l.jpg

X

X

X

X

X

X

X

X

X

X

O

O

O

X

X

O

X

Wrong representation

  • We want to optimize our (agent’s) goal, hence build a search tree based on possible moves/actions

  • Problem: discounts the opponent

Rutgers CS440, Fall 2003


Better representation game search tree l.jpg

X

X

X

X

X

X

O

X

X

X

X

X

O

O

O

O

O

X

X

O

X

X

X

O

X

X

O

Better representation: game search tree

  • Include opponent’s actions as well

Agent move

Full

move

Opponent move

Agent move

5

10

1

Utilities

(assigned to goal nodes)

Rutgers CS440, Fall 2003


Game search trees l.jpg
Game search trees

  • What is the size of the game search trees?

    • O(bd)

    • Tic-tac-toe: 9! leaves (max depth= 9)

    • Chess: 35 legal moves, average “depth” 100

      • bd ~ 35100 ~10154 states, “only” ~1040 legal states

  • Too deep for exhaustive search!

Rutgers CS440, Fall 2003


Utilities in search trees l.jpg

F

-7

G

-5

H

3

I

9

J

-6

K

0

L

2

M

1

N

3

O

2

Utilities in search trees

  • Assign utility to (terminal) states, describing how much they are valued for the agent

    • High utility – good for the agent

    • Low utility – good for the opponent

computer's possible moves

A

9

opponent'spossible moves

B

-5

C

9

D

2

E

3

terminal states

board evaluation from agent's perspective

Rutgers CS440, Fall 2003


Search strategy l.jpg

B

-7

C

-6

D

0

E

1

B

C

D

E

F

-7

G

-5

H

3

I

9

J

-6

K

0

L

2

M

1

N

3

O

2

Search strategy

  • Worst-case scenario: assume the opponent will always make a best move (i.e., worst move for us)

  • Minimax search: maximize the utility for our agent while expecting that the opponent plays his best moves:

    • High utility favors agent => chose move with maximal utility

    • Low move favors opponent => assume opponent makes the move with lowest utility

A1

A

computer's possible moves

opponent'spossible moves

terminal states

Rutgers CS440, Fall 2003


Minimax algorithm l.jpg

B

-5

B

-5

B

C

-6

C

-6

C

D

0

D

0

D

E

E

1

E

1

A

A1

A

A

A

A

max

min

B

B

B

C

C

C

D

D

D

E

E

E

F

-7

F

-7

F

-7

G

-5

G

-5

G

-5

H

3

H

3

H

3

I

9

I

9

I

9

J

-6

J

-6

J

-6

K

0

K

0

K

0

L

2

L

2

L

2

M

1

M

1

M

1

N

3

N

3

N

3

O

2

O

2

O

2

Minimax algorithm

  • Start with utilities of terminal nodes

  • Propagate them back to root node by choosing the minimax strategy

Rutgers CS440, Fall 2003


Complexity of minimax algorithm l.jpg
Complexity of minimax algorithm

  • Utilities propagate up in a recursive fashion:

    • DFS

  • Space complexity:

    • O(bd)

  • Time complexity:

    • O(bd)

  • Problem: time complexity – it’s a game, finite time to make a move

Rutgers CS440, Fall 2003


Reducing complexity of minimax 1 l.jpg
Reducing complexity of minimax (1)

  • Don’t search to full depth d, terminate early

  • Prune bad paths

  • Problem:

    • Don’t have utility of non-terminal nodes

  • Estimate utility for non-terminal nodes:

    • static board evaluation function (SBE) is a heuristic that assigns utility to non-terminal nodes

    • it reflects the computer’s chances of winning from that node

    • it must be easy to calculate from board configuration

  • For example, Chess:

    SBE = α* materialBalance + β* centerControl + γ* …

    material balance = Value of white pieces - Value of black piecespawn = 1, rook = 5, queen = 9, etc.

Rutgers CS440, Fall 2003


Minimax with evaluation functions l.jpg
Minimax with Evaluation Functions

  • Same as general Minimax, except

    • only goes to depth m

    • estimates using SBE function

  • How would this algorithm perform at chess?

    • if could look ahead ~4 pairs of moves (i.e., 8 ply) would be consistently beaten by average players

    • if could look ahead ~8 pairs as done in a typical PC, is as good as human master

Rutgers CS440, Fall 2003


Reducing complexity of minimax 2 l.jpg
Reducing complexity of minimax (2)

  • Some branches of the tree will not be taken if the opponent plays cleverly. Can we detect them ahead of time?

  • Prune off paths that do not need to be explored

  • Alpha-beta pruning

  • Keep track of while doing DFS of game tree:

    • maximizing level: alpha

      • highest value seen so far

      • lower bound on node's evaluation/score

    • minimizing level: beta

      • lowest value seen so far

      • higher bound on node's evaluation/score

Rutgers CS440, Fall 2003


Alpha beta example l.jpg
Alpha-Beta Example

minimax(A,0,4)

CallStack

max

A

α=

A

A

B

C

D

0

E

F

G

-5

H

3

I

8

J

K

L

2

M

N

4

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

A

W

-3

X

-5

Rutgers CS440, Fall 2003


Alpha beta example16 l.jpg
Alpha-Beta Example

minimax(B,1,4)

CallStack

max

A

α=

min

B

B

Bβ=

C

D

0

E

F

G

-5

H

3

I

8

J

K

L

2

M

N

4

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

B

A

W

-3

X

-5

Rutgers CS440, Fall 2003


Alpha beta example17 l.jpg
Alpha-Beta Example

minimax(F,2,4)

CallStack

max

A

α=

Bβ=

C

D

0

E

min

max

F

Fα=

F

G

-5

H

3

I

8

J

K

L

2

M

N

4

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

F

B

A

W

-3

X

-5

Rutgers CS440, Fall 2003


Alpha beta example18 l.jpg
Alpha-Beta Example

minimax(N,3,4)

max

CallStack

A

α=

min

Bβ=

C

D

0

E

max

Fα=

G

-5

H

3

I

8

J

K

L

2

M

N

N

4

N

4

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

F

B

A

W

-3

X

-5

blue: terminal state

Rutgers CS440, Fall 2003


Alpha beta example19 l.jpg
Alpha-Beta Example

minimax(F,2,4) is returned to

alpha = 4, maximum seen so far

CallStack

A

α=

max

Bβ=

C

D

0

E

min

Fα=4

Fα=

G

-5

H

3

I

8

J

K

L

2

M

max

N

4

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

F

B

A

W

-3

X

-5

blue: terminal state

Rutgers CS440, Fall 2003


Alpha beta example20 l.jpg
Alpha-Beta Example

minimax(O,3,4)

CallStack

A

α=

max

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

O

N

4

Oβ=

O

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

X

-5

blue: terminal state

Rutgers CS440, Fall 2003


Alpha beta example21 l.jpg
Alpha-Beta Example

minimax(W,4,4)

CallStack

max

A

α=

min

Bβ=

C

D

0

E

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

W

O

N

4

Oβ=

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

W

-3

X

-5

blue: terminal state

blue: terminal state (depth limit)

Rutgers CS440, Fall 2003


Alpha beta example22 l.jpg
Alpha-Beta Example

minimax(O,3,4) is returned to

beta = -3, minimum seen so far

CallStack

max

A

α=

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

O

N

4

Oβ=-3

Oβ=

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

X

-5

blue: terminal state

Rutgers CS440, Fall 2003


Alpha beta example23 l.jpg
Alpha-Beta Example

minimax(O,3,4) is returned to

O's beta  F's alpha: stop expanding O (alpha cut-off)

CallStack

A

α=

max

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

O

N

4

Oβ=-3

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

X

-5

X

-5

blue: terminal state

Rutgers CS440, Fall 2003


Alpha beta example24 l.jpg
Alpha-Beta Example

Why? Smart opponent will choose W or worse, thus O's upper bound is –3

So computer shouldn't choose O:-3 since N:4 is better

CallStack

A

α=

max

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

O

N

4

Oβ=-3

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

X

-5

blue: terminal state

Rutgers CS440, Fall 2003


Alpha beta example25 l.jpg
Alpha-Beta Example

minimax(F,2,4) is returned to

alpha not changed (maximizing)

CallStack

A

α=

max

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

N

4

Oβ=-3

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

X

-5

X

-5

blue: terminal state

Rutgers CS440, Fall 2003


Alpha beta example26 l.jpg
Alpha-Beta Example

minimax(B,1,4) is returned to

beta = 4, minimum seen so far

CallStack

A

α=

max

Bβ=4

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

N

4

Oβ=-3

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

B

A

W

-3

X

-5

X

-5

blue: terminal state

Rutgers CS440, Fall 2003


Effectiveness of alpha beta search l.jpg
Effectiveness of Alpha-Beta Search

  • Effectiveness depends on the order in which successors are examined. More effective if bestare examined first

  • Worst Case:

    • ordered so that no pruning takes place

    • no improvement over exhaustive search

  • Best Case:

    • each player’s best move is evaluated first (left-most)

  • In practice, performance is closer to bestrather than worst case

Rutgers CS440, Fall 2003


Effectiveness of alpha beta search28 l.jpg
Effectiveness of Alpha-Beta Search

  • In practice often get O(b(d/2)) rather than O(bd)

    • same as having a branching factor of b

      since (b)d = b(d/2)

  • For Example: Chess

    • goes from b ~ 35 to b ~ 6

    • permits much deeper search for the same time

    • makes computer chess competitive with humans

Rutgers CS440, Fall 2003


Dealing with limited time l.jpg
Dealing with Limited Time

  • In real games, there is usually a time limit T on making a move

  • How do we take this into account?

    • cannot stop alpha-beta midway and expect to useresults with any confidence

    • so, we could set a conservative depth-limit that guarantees we will find a move in time < T

    • but then, the search may finish early andthe opportunity is wasted to do more search

Rutgers CS440, Fall 2003


Dealing with limited time30 l.jpg
Dealing with Limited Time

  • In practice, iterative deepening search (IDS) is used

    • run alpha-beta search with an increasing depth limit

    • when the clock runs out, use the solution foundfor the last completed alpha-beta search(i.e., the deepest search that was completed)

Rutgers CS440, Fall 2003


The horizon effect l.jpg
The Horizon Effect

  • Sometimes disaster lurks just beyond search depth

    • computer captures queen, but a few moves later the opponent checkmates (i.e., wins)

  • The computer has a limited horizon; it cannotsee that this significant event could happen

  • How do you avoid catastrophic losses due to “short-sightedness”?

    • quiescence search

    • secondary search

Rutgers CS440, Fall 2003


The horizon effect32 l.jpg
The Horizon Effect

  • Quiescence Search

    • when evaluation frequently changing, look deeper than limit

    • look for a point when game “quiets down”

  • Secondary Search

    • find best move looking to depth d

    • look k steps beyond to verify that it still looks good

    • if it doesn't, repeat Step 2 for next best move

Rutgers CS440, Fall 2003


Book moves l.jpg
Book Moves

  • Build a database of opening moves, end games, and studied configurations

  • If the current state is in the database, use database:

    • to determine the next move

    • to evaluate the board

  • Otherwise, do alpha-beta search

Rutgers CS440, Fall 2003


Examples of algorithms which learn to play well l.jpg
Examples of Algorithmswhich Learn to Play Well

Checkers:

A. L. Samuel, “Some Studies in Machine Learning using the Game of Checkers,” IBM Journal of Research and Development, 11(6):601-617, 1959

  • Learned by playing a copy of itself thousands of times

  • Used only an IBM 704 with 10,000 words of RAM, magnetic tape, and a clock speed of 1 kHz

  • Successful enough to compete well at human tournaments

Rutgers CS440, Fall 2003


Examples of algorithms which learn to play well35 l.jpg
Examples of Algorithmswhich Learn to Play Well

Backgammon:

G. Tesauro and T. J. Sejnowski, “A Parallel Network that Learns to Play Backgammon,” Artificial Intelligence39(3), 357-390, 1989

  • Also learns by playing copies of itself

  • Uses a non-linear evaluation function - a neural network

  • Rated one of the top three players in the world

Rutgers CS440, Fall 2003


Non deterministic games l.jpg
Non-deterministic Games

  • Some games involve chance, for example:

    • roll of dice

    • spin of game wheel

    • deal of cards from shuffled deck

  • How can we handle games with random elements?

  • The game tree representation is extendedto include chance nodes:

    • agent moves

    • chance nodes

    • opponent moves

Rutgers CS440, Fall 2003


Non deterministic games37 l.jpg

Aα=

50/50

50/50

.5

.5

.5

.5

Bβ=2

Cβ=6

Dβ=0

Eβ=-4

7

2

9

6

5

0

8

-4

Non-deterministic Games

The game tree representation is extended:

max

chance

min

Rutgers CS440, Fall 2003


Non deterministic games38 l.jpg

Aα=

50/50

50/50

.5

.5

.5

.5

Bβ=2

Cβ=6

Dβ=0

Eβ=-4

7

2

9

6

5

0

8

-4

Non-deterministic Games

  • Weight score by the probabilities that move occurs

  • Use expected value for move: sum of possible random outcomes

max

50/50

4

50/50-2

chance

min

Rutgers CS440, Fall 2003


Non deterministic games39 l.jpg

Aα=

50/50

4

50/50

-2

.5

.5

.5

.5

Bβ=2

Cβ=6

Dβ=0

Eβ=-4

7

2

9

6

5

0

8

-4

Non-deterministic Games

  • Choose move with highest expected value

Aα=4

max

chance

min

Rutgers CS440, Fall 2003


Non deterministic games40 l.jpg
Non-deterministic Games

  • Non-determinism increases branching factor

    • 21 possible rolls with 2 dice

  • Value of lookahead diminishes: as depth increases probability of reaching a given node decreases

  • alpha-beta pruning less effective

  • TDGammon:

    • depth-2 search

    • very good heuristic

    • plays at world champion level

Rutgers CS440, Fall 2003


Computers can play grandmaster chess l.jpg
Computers can playGrandMaster Chess

“Deep Blue” (IBM)

  • Parallel processor, 32 nodes

  • Each node has 8 dedicated VLSI “chess chips”

  • Can search 200 million configurations/second

  • Uses minimax, alpha-beta, sophisticated heuristics

  • It currently can search to 14 ply (i.e., 7 pairs of moves)

  • Can avoid horizon by searching as deep as 40 ply

  • Uses book moves

Rutgers CS440, Fall 2003


Computers can play grandmaster chess42 l.jpg
Computers can playGrandMaster Chess

Kasparov vs. Deep Blue, May 1997

  • 6 game full-regulation chess match sponsored by ACM

  • Kasparov lost the match 2 wins & 1 tie to 3 wins & 1 tie

  • This was an historic achievement for computer chess being the first time a computer became the best chess player on the planet

  • Note that Deep Blue plays by “brute force” (i.e., raw power from computer speed and memory); it uses relatively little that is similar to human intuition and cleverness

Rutgers CS440, Fall 2003


Status of computers in other deterministic games l.jpg
Status of Computersin Other Deterministic Games

  • Checkers/Draughts

    • current world champion is Chinook

    • can beat any human, (beat Tinsley in 1994)

    • uses alpha-beta search, book moves (> 443 billion)

  • Othello

    • computers can easily beat the world experts

  • Go

    • branching factor b ~ 360 (very large!)

    • $2 million prize for any system that can beat a world expert

Rutgers CS440, Fall 2003


ad