lecture 6 adversarial search games
Download
Skip this Video
Download Presentation
Lecture 6: Adversarial Search & Games

Loading in 2 Seconds...

play fullscreen
1 / 43

Lecture 6: - PowerPoint PPT Presentation


  • 287 Views
  • Uploaded on

Lecture 6: Adversarial Search & Games. Reading: Ch. 6, AIMA. Adversarial search. So far, single agent search – no opponents or collaborators Multi-agent search: Playing a game with an opponent: adversarial search

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Lecture 6:' - Patman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
lecture 6 adversarial search games

Lecture 6:Adversarial Search & Games

Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

adversarial search
Adversarial search
  • So far, single agent search – no opponents or collaborators
  • Multi-agent search:
    • Playing a game with an opponent: adversarial search
    • Economies: even more complex, societies of cooperative and non-cooperative agents
  • Game playing and AI:
    • Games can be complex, require (?) human intelligence
    • Have to evolve in “real-time”
    • Well-defined problems
    • Limited scope

Rutgers CS440, Fall 2003

games and ai
Games and AI

Rutgers CS440, Fall 2003

games and search
Games and search
  • Traditional search: single agent, searches for its well-being, unobstructed
  • Games: search against an opponent
  • Consider a two player board game:
    • e.g., chess, checkers, tic-tac-toe
    • board configuration: unique arrangement of "pieces"
  • Representing board games as search problem:
    • states: board configurations
    • operators: legal moves
    • initial state: current board configuration
    • goal state: winning/terminal board configuration

Rutgers CS440, Fall 2003

wrong representation

X

X

X

X

X

X

X

X

X

X

O

O

O

X

X

O

X

Wrong representation
  • We want to optimize our (agent’s) goal, hence build a search tree based on possible moves/actions
  • Problem: discounts the opponent

Rutgers CS440, Fall 2003

better representation game search tree

X

X

X

X

X

X

O

X

X

X

X

X

O

O

O

O

O

X

X

O

X

X

X

O

X

X

O

Better representation: game search tree
  • Include opponent’s actions as well

Agent move

Full

move

Opponent move

Agent move

5

10

1

Utilities

(assigned to goal nodes)

Rutgers CS440, Fall 2003

game search trees
Game search trees
  • What is the size of the game search trees?
    • O(bd)
    • Tic-tac-toe: 9! leaves (max depth= 9)
    • Chess: 35 legal moves, average “depth” 100
      • bd ~ 35100 ~10154 states, “only” ~1040 legal states
  • Too deep for exhaustive search!

Rutgers CS440, Fall 2003

utilities in search trees

F

-7

G

-5

H

3

I

9

J

-6

K

0

L

2

M

1

N

3

O

2

Utilities in search trees
  • Assign utility to (terminal) states, describing how much they are valued for the agent
    • High utility – good for the agent
    • Low utility – good for the opponent

computer\'s possible moves

A

9

opponent\'spossible moves

B

-5

C

9

D

2

E

3

terminal states

board evaluation from agent\'s perspective

Rutgers CS440, Fall 2003

search strategy

B

-7

C

-6

D

0

E

1

B

C

D

E

F

-7

G

-5

H

3

I

9

J

-6

K

0

L

2

M

1

N

3

O

2

Search strategy
  • Worst-case scenario: assume the opponent will always make a best move (i.e., worst move for us)
  • Minimax search: maximize the utility for our agent while expecting that the opponent plays his best moves:
    • High utility favors agent => chose move with maximal utility
    • Low move favors opponent => assume opponent makes the move with lowest utility

A1

A

computer\'s possible moves

opponent\'spossible moves

terminal states

Rutgers CS440, Fall 2003

minimax algorithm

B

-5

B

-5

B

C

-6

C

-6

C

D

0

D

0

D

E

E

1

E

1

A

A1

A

A

A

A

max

min

B

B

B

C

C

C

D

D

D

E

E

E

F

-7

F

-7

F

-7

G

-5

G

-5

G

-5

H

3

H

3

H

3

I

9

I

9

I

9

J

-6

J

-6

J

-6

K

0

K

0

K

0

L

2

L

2

L

2

M

1

M

1

M

1

N

3

N

3

N

3

O

2

O

2

O

2

Minimax algorithm
  • Start with utilities of terminal nodes
  • Propagate them back to root node by choosing the minimax strategy

Rutgers CS440, Fall 2003

complexity of minimax algorithm
Complexity of minimax algorithm
  • Utilities propagate up in a recursive fashion:
    • DFS
  • Space complexity:
    • O(bd)
  • Time complexity:
    • O(bd)
  • Problem: time complexity – it’s a game, finite time to make a move

Rutgers CS440, Fall 2003

reducing complexity of minimax 1
Reducing complexity of minimax (1)
  • Don’t search to full depth d, terminate early
  • Prune bad paths
  • Problem:
    • Don’t have utility of non-terminal nodes
  • Estimate utility for non-terminal nodes:
    • static board evaluation function (SBE) is a heuristic that assigns utility to non-terminal nodes
    • it reflects the computer’s chances of winning from that node
    • it must be easy to calculate from board configuration
  • For example, Chess:

SBE = α* materialBalance + β* centerControl + γ* …

material balance = Value of white pieces - Value of black piecespawn = 1, rook = 5, queen = 9, etc.

Rutgers CS440, Fall 2003

minimax with evaluation functions
Minimax with Evaluation Functions
  • Same as general Minimax, except
    • only goes to depth m
    • estimates using SBE function
  • How would this algorithm perform at chess?
    • if could look ahead ~4 pairs of moves (i.e., 8 ply) would be consistently beaten by average players
    • if could look ahead ~8 pairs as done in a typical PC, is as good as human master

Rutgers CS440, Fall 2003

reducing complexity of minimax 2
Reducing complexity of minimax (2)
  • Some branches of the tree will not be taken if the opponent plays cleverly. Can we detect them ahead of time?
  • Prune off paths that do not need to be explored
  • Alpha-beta pruning
  • Keep track of while doing DFS of game tree:
    • maximizing level: alpha
      • highest value seen so far
      • lower bound on node\'s evaluation/score
    • minimizing level: beta
      • lowest value seen so far
      • higher bound on node\'s evaluation/score

Rutgers CS440, Fall 2003

alpha beta example
Alpha-Beta Example

minimax(A,0,4)

CallStack

max

A

α=

A

A

B

C

D

0

E

F

G

-5

H

3

I

8

J

K

L

2

M

N

4

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

A

W

-3

X

-5

Rutgers CS440, Fall 2003

alpha beta example16
Alpha-Beta Example

minimax(B,1,4)

CallStack

max

A

α=

min

B

B

Bβ=

C

D

0

E

F

G

-5

H

3

I

8

J

K

L

2

M

N

4

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

B

A

W

-3

X

-5

Rutgers CS440, Fall 2003

alpha beta example17
Alpha-Beta Example

minimax(F,2,4)

CallStack

max

A

α=

Bβ=

C

D

0

E

min

max

F

Fα=

F

G

-5

H

3

I

8

J

K

L

2

M

N

4

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

F

B

A

W

-3

X

-5

Rutgers CS440, Fall 2003

alpha beta example18
Alpha-Beta Example

minimax(N,3,4)

max

CallStack

A

α=

min

Bβ=

C

D

0

E

max

Fα=

G

-5

H

3

I

8

J

K

L

2

M

N

N

4

N

4

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

F

B

A

W

-3

X

-5

blue: terminal state

Rutgers CS440, Fall 2003

alpha beta example19
Alpha-Beta Example

minimax(F,2,4) is returned to

alpha = 4, maximum seen so far

CallStack

A

α=

max

Bβ=

C

D

0

E

min

Fα=4

Fα=

G

-5

H

3

I

8

J

K

L

2

M

max

N

4

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

F

B

A

W

-3

X

-5

blue: terminal state

Rutgers CS440, Fall 2003

alpha beta example20
Alpha-Beta Example

minimax(O,3,4)

CallStack

A

α=

max

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

O

N

4

Oβ=

O

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

X

-5

blue: terminal state

Rutgers CS440, Fall 2003

alpha beta example21
Alpha-Beta Example

minimax(W,4,4)

CallStack

max

A

α=

min

Bβ=

C

D

0

E

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

W

O

N

4

Oβ=

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

W

-3

X

-5

blue: terminal state

blue: terminal state (depth limit)

Rutgers CS440, Fall 2003

alpha beta example22
Alpha-Beta Example

minimax(O,3,4) is returned to

beta = -3, minimum seen so far

CallStack

max

A

α=

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

O

N

4

Oβ=-3

Oβ=

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

X

-5

blue: terminal state

Rutgers CS440, Fall 2003

alpha beta example23
Alpha-Beta Example

minimax(O,3,4) is returned to

O\'s beta  F\'s alpha: stop expanding O (alpha cut-off)

CallStack

A

α=

max

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

O

N

4

Oβ=-3

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

X

-5

X

-5

blue: terminal state

Rutgers CS440, Fall 2003

alpha beta example24
Alpha-Beta Example

Why? Smart opponent will choose W or worse, thus O\'s upper bound is –3

So computer shouldn\'t choose O:-3 since N:4 is better

CallStack

A

α=

max

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

O

N

4

Oβ=-3

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

X

-5

blue: terminal state

Rutgers CS440, Fall 2003

alpha beta example25
Alpha-Beta Example

minimax(F,2,4) is returned to

alpha not changed (maximizing)

CallStack

A

α=

max

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

N

4

Oβ=-3

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

X

-5

X

-5

blue: terminal state

Rutgers CS440, Fall 2003

alpha beta example26
Alpha-Beta Example

minimax(B,1,4) is returned to

beta = 4, minimum seen so far

CallStack

A

α=

max

Bβ=4

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

N

4

Oβ=-3

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

B

A

W

-3

X

-5

X

-5

blue: terminal state

Rutgers CS440, Fall 2003

effectiveness of alpha beta search
Effectiveness of Alpha-Beta Search
  • Effectiveness depends on the order in which successors are examined. More effective if bestare examined first
  • Worst Case:
    • ordered so that no pruning takes place
    • no improvement over exhaustive search
  • Best Case:
    • each player’s best move is evaluated first (left-most)
  • In practice, performance is closer to bestrather than worst case

Rutgers CS440, Fall 2003

effectiveness of alpha beta search28
Effectiveness of Alpha-Beta Search
  • In practice often get O(b(d/2)) rather than O(bd)
    • same as having a branching factor of b

since (b)d = b(d/2)

  • For Example: Chess
    • goes from b ~ 35 to b ~ 6
    • permits much deeper search for the same time
    • makes computer chess competitive with humans

Rutgers CS440, Fall 2003

dealing with limited time
Dealing with Limited Time
  • In real games, there is usually a time limit T on making a move
  • How do we take this into account?
    • cannot stop alpha-beta midway and expect to useresults with any confidence
    • so, we could set a conservative depth-limit that guarantees we will find a move in time < T
    • but then, the search may finish early andthe opportunity is wasted to do more search

Rutgers CS440, Fall 2003

dealing with limited time30
Dealing with Limited Time
  • In practice, iterative deepening search (IDS) is used
    • run alpha-beta search with an increasing depth limit
    • when the clock runs out, use the solution foundfor the last completed alpha-beta search(i.e., the deepest search that was completed)

Rutgers CS440, Fall 2003

the horizon effect
The Horizon Effect
  • Sometimes disaster lurks just beyond search depth
    • computer captures queen, but a few moves later the opponent checkmates (i.e., wins)
  • The computer has a limited horizon; it cannotsee that this significant event could happen
  • How do you avoid catastrophic losses due to “short-sightedness”?
    • quiescence search
    • secondary search

Rutgers CS440, Fall 2003

the horizon effect32
The Horizon Effect
  • Quiescence Search
    • when evaluation frequently changing, look deeper than limit
    • look for a point when game “quiets down”
  • Secondary Search
    • find best move looking to depth d
    • look k steps beyond to verify that it still looks good
    • if it doesn\'t, repeat Step 2 for next best move

Rutgers CS440, Fall 2003

book moves
Book Moves
  • Build a database of opening moves, end games, and studied configurations
  • If the current state is in the database, use database:
    • to determine the next move
    • to evaluate the board
  • Otherwise, do alpha-beta search

Rutgers CS440, Fall 2003

examples of algorithms which learn to play well
Examples of Algorithmswhich Learn to Play Well

Checkers:

A. L. Samuel, “Some Studies in Machine Learning using the Game of Checkers,” IBM Journal of Research and Development, 11(6):601-617, 1959

  • Learned by playing a copy of itself thousands of times
  • Used only an IBM 704 with 10,000 words of RAM, magnetic tape, and a clock speed of 1 kHz
  • Successful enough to compete well at human tournaments

Rutgers CS440, Fall 2003

examples of algorithms which learn to play well35
Examples of Algorithmswhich Learn to Play Well

Backgammon:

G. Tesauro and T. J. Sejnowski, “A Parallel Network that Learns to Play Backgammon,” Artificial Intelligence39(3), 357-390, 1989

  • Also learns by playing copies of itself
  • Uses a non-linear evaluation function - a neural network
  • Rated one of the top three players in the world

Rutgers CS440, Fall 2003

non deterministic games
Non-deterministic Games
  • Some games involve chance, for example:
    • roll of dice
    • spin of game wheel
    • deal of cards from shuffled deck
  • How can we handle games with random elements?
  • The game tree representation is extendedto include chance nodes:
    • agent moves
    • chance nodes
    • opponent moves

Rutgers CS440, Fall 2003

non deterministic games37

Aα=

50/50

50/50

.5

.5

.5

.5

Bβ=2

Cβ=6

Dβ=0

Eβ=-4

7

2

9

6

5

0

8

-4

Non-deterministic Games

The game tree representation is extended:

max

chance

min

Rutgers CS440, Fall 2003

non deterministic games38

Aα=

50/50

50/50

.5

.5

.5

.5

Bβ=2

Cβ=6

Dβ=0

Eβ=-4

7

2

9

6

5

0

8

-4

Non-deterministic Games
  • Weight score by the probabilities that move occurs
  • Use expected value for move: sum of possible random outcomes

max

50/50

4

50/50-2

chance

min

Rutgers CS440, Fall 2003

non deterministic games39

Aα=

50/50

4

50/50

-2

.5

.5

.5

.5

Bβ=2

Cβ=6

Dβ=0

Eβ=-4

7

2

9

6

5

0

8

-4

Non-deterministic Games
  • Choose move with highest expected value

Aα=4

max

chance

min

Rutgers CS440, Fall 2003

non deterministic games40
Non-deterministic Games
  • Non-determinism increases branching factor
    • 21 possible rolls with 2 dice
  • Value of lookahead diminishes: as depth increases probability of reaching a given node decreases
  • alpha-beta pruning less effective
  • TDGammon:
    • depth-2 search
    • very good heuristic
    • plays at world champion level

Rutgers CS440, Fall 2003

computers can play grandmaster chess
Computers can playGrandMaster Chess

“Deep Blue” (IBM)

  • Parallel processor, 32 nodes
  • Each node has 8 dedicated VLSI “chess chips”
  • Can search 200 million configurations/second
  • Uses minimax, alpha-beta, sophisticated heuristics
  • It currently can search to 14 ply (i.e., 7 pairs of moves)
  • Can avoid horizon by searching as deep as 40 ply
  • Uses book moves

Rutgers CS440, Fall 2003

computers can play grandmaster chess42
Computers can playGrandMaster Chess

Kasparov vs. Deep Blue, May 1997

  • 6 game full-regulation chess match sponsored by ACM
  • Kasparov lost the match 2 wins & 1 tie to 3 wins & 1 tie
  • This was an historic achievement for computer chess being the first time a computer became the best chess player on the planet
  • Note that Deep Blue plays by “brute force” (i.e., raw power from computer speed and memory); it uses relatively little that is similar to human intuition and cleverness

Rutgers CS440, Fall 2003

status of computers in other deterministic games
Status of Computersin Other Deterministic Games
  • Checkers/Draughts
    • current world champion is Chinook
    • can beat any human, (beat Tinsley in 1994)
    • uses alpha-beta search, book moves (> 443 billion)
  • Othello
    • computers can easily beat the world experts
  • Go
    • branching factor b ~ 360 (very large!)
    • $2 million prize for any system that can beat a world expert

Rutgers CS440, Fall 2003

ad