- 296 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Lecture 6: Adversarial Search & Games' - Patman

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Adversarial search

- So far, single agent search – no opponents or collaborators
- Multi-agent search:
- Playing a game with an opponent: adversarial search
- Economies: even more complex, societies of cooperative and non-cooperative agents
- Game playing and AI:
- Games can be complex, require (?) human intelligence
- Have to evolve in “real-time”
- Well-defined problems
- Limited scope

Rutgers CS440, Fall 2003

Games and AI

Rutgers CS440, Fall 2003

Games and search

- Traditional search: single agent, searches for its well-being, unobstructed
- Games: search against an opponent
- Consider a two player board game:
- e.g., chess, checkers, tic-tac-toe
- board configuration: unique arrangement of "pieces"
- Representing board games as search problem:
- states: board configurations
- operators: legal moves
- initial state: current board configuration
- goal state: winning/terminal board configuration

Rutgers CS440, Fall 2003

X

X

X

X

X

X

X

X

X

O

O

O

X

X

O

X

Wrong representation- We want to optimize our (agent’s) goal, hence build a search tree based on possible moves/actions
- Problem: discounts the opponent

Rutgers CS440, Fall 2003

X

X

X

X

X

O

X

X

X

X

X

O

O

O

O

O

X

X

O

X

X

X

O

X

X

O

Better representation: game search tree- Include opponent’s actions as well

Agent move

Full

move

Opponent move

Agent move

5

10

1

Utilities

(assigned to goal nodes)

Rutgers CS440, Fall 2003

Game search trees

- What is the size of the game search trees?
- O(bd)
- Tic-tac-toe: 9! leaves (max depth= 9)
- Chess: 35 legal moves, average “depth” 100
- bd ~ 35100 ~10154 states, “only” ~1040 legal states
- Too deep for exhaustive search!

Rutgers CS440, Fall 2003

-7

G

-5

H

3

I

9

J

-6

K

0

L

2

M

1

N

3

O

2

Utilities in search trees- Assign utility to (terminal) states, describing how much they are valued for the agent
- High utility – good for the agent
- Low utility – good for the opponent

computer's possible moves

A

9

opponent'spossible moves

B

-5

C

9

D

2

E

3

terminal states

board evaluation from agent's perspective

Rutgers CS440, Fall 2003

-7

C

-6

D

0

E

1

B

C

D

E

F

-7

G

-5

H

3

I

9

J

-6

K

0

L

2

M

1

N

3

O

2

Search strategy- Worst-case scenario: assume the opponent will always make a best move (i.e., worst move for us)
- Minimax search: maximize the utility for our agent while expecting that the opponent plays his best moves:
- High utility favors agent => chose move with maximal utility
- Low move favors opponent => assume opponent makes the move with lowest utility

A1

A

computer's possible moves

opponent'spossible moves

terminal states

Rutgers CS440, Fall 2003

-5

B

-5

B

C

-6

C

-6

C

D

0

D

0

D

E

E

1

E

1

A

A1

A

A

A

A

max

min

B

B

B

C

C

C

D

D

D

E

E

E

F

-7

F

-7

F

-7

G

-5

G

-5

G

-5

H

3

H

3

H

3

I

9

I

9

I

9

J

-6

J

-6

J

-6

K

0

K

0

K

0

L

2

L

2

L

2

M

1

M

1

M

1

N

3

N

3

N

3

O

2

O

2

O

2

Minimax algorithm- Start with utilities of terminal nodes
- Propagate them back to root node by choosing the minimax strategy

Rutgers CS440, Fall 2003

Complexity of minimax algorithm

- Utilities propagate up in a recursive fashion:
- DFS
- Space complexity:
- O(bd)
- Time complexity:
- O(bd)
- Problem: time complexity – it’s a game, finite time to make a move

Rutgers CS440, Fall 2003

Reducing complexity of minimax (1)

- Don’t search to full depth d, terminate early
- Prune bad paths
- Problem:
- Don’t have utility of non-terminal nodes
- Estimate utility for non-terminal nodes:
- static board evaluation function (SBE) is a heuristic that assigns utility to non-terminal nodes
- it reflects the computer’s chances of winning from that node
- it must be easy to calculate from board configuration
- For example, Chess:

SBE = α* materialBalance + β* centerControl + γ* …

material balance = Value of white pieces - Value of black piecespawn = 1, rook = 5, queen = 9, etc.

Rutgers CS440, Fall 2003

Minimax with Evaluation Functions

- Same as general Minimax, except
- only goes to depth m
- estimates using SBE function
- How would this algorithm perform at chess?
- if could look ahead ~4 pairs of moves (i.e., 8 ply) would be consistently beaten by average players
- if could look ahead ~8 pairs as done in a typical PC, is as good as human master

Rutgers CS440, Fall 2003

Reducing complexity of minimax (2)

- Some branches of the tree will not be taken if the opponent plays cleverly. Can we detect them ahead of time?
- Prune off paths that do not need to be explored
- Alpha-beta pruning
- Keep track of while doing DFS of game tree:
- maximizing level: alpha
- highest value seen so far
- lower bound on node's evaluation/score
- minimizing level: beta
- lowest value seen so far
- higher bound on node's evaluation/score

Rutgers CS440, Fall 2003

Alpha-Beta Example

minimax(A,0,4)

CallStack

max

A

α=

A

A

B

C

D

0

E

F

G

-5

H

3

I

8

J

K

L

2

M

N

4

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

A

W

-3

X

-5

Rutgers CS440, Fall 2003

Alpha-Beta Example

minimax(B,1,4)

CallStack

max

A

α=

min

B

B

Bβ=

C

D

0

E

F

G

-5

H

3

I

8

J

K

L

2

M

N

4

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

B

A

W

-3

X

-5

Rutgers CS440, Fall 2003

Alpha-Beta Example

minimax(F,2,4)

CallStack

max

A

α=

Bβ=

C

D

0

E

min

max

F

Fα=

F

G

-5

H

3

I

8

J

K

L

2

M

N

4

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

F

B

A

W

-3

X

-5

Rutgers CS440, Fall 2003

Alpha-Beta Example

minimax(N,3,4)

max

CallStack

A

α=

min

Bβ=

C

D

0

E

max

Fα=

G

-5

H

3

I

8

J

K

L

2

M

N

N

4

N

4

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

F

B

A

W

-3

X

-5

blue: terminal state

Rutgers CS440, Fall 2003

Alpha-Beta Example

minimax(F,2,4) is returned to

alpha = 4, maximum seen so far

CallStack

A

α=

max

Bβ=

C

D

0

E

min

Fα=4

Fα=

G

-5

H

3

I

8

J

K

L

2

M

max

N

4

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

F

B

A

W

-3

X

-5

blue: terminal state

Rutgers CS440, Fall 2003

Alpha-Beta Example

minimax(O,3,4)

CallStack

A

α=

max

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

O

N

4

Oβ=

O

O

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

X

-5

blue: terminal state

Rutgers CS440, Fall 2003

Alpha-Beta Example

minimax(W,4,4)

CallStack

max

A

α=

min

Bβ=

C

D

0

E

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

W

O

N

4

Oβ=

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

W

-3

X

-5

blue: terminal state

blue: terminal state (depth limit)

Rutgers CS440, Fall 2003

Alpha-Beta Example

minimax(O,3,4) is returned to

beta = -3, minimum seen so far

CallStack

max

A

α=

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

O

N

4

Oβ=-3

Oβ=

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

X

-5

blue: terminal state

Rutgers CS440, Fall 2003

Alpha-Beta Example

minimax(O,3,4) is returned to

O's beta F's alpha: stop expanding O (alpha cut-off)

CallStack

A

α=

max

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

O

N

4

Oβ=-3

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

X

-5

X

-5

blue: terminal state

Rutgers CS440, Fall 2003

Alpha-Beta Example

Why? Smart opponent will choose W or worse, thus O's upper bound is –3

So computer shouldn't choose O:-3 since N:4 is better

CallStack

A

α=

max

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

O

N

4

Oβ=-3

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

X

-5

blue: terminal state

Rutgers CS440, Fall 2003

Alpha-Beta Example

minimax(F,2,4) is returned to

alpha not changed (maximizing)

CallStack

A

α=

max

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

N

4

Oβ=-3

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

F

B

A

W

-3

X

-5

X

-5

blue: terminal state

Rutgers CS440, Fall 2003

Alpha-Beta Example

minimax(B,1,4) is returned to

beta = 4, minimum seen so far

CallStack

A

α=

max

Bβ=4

Bβ=

C

D

0

E

min

Fα=4

G

-5

H

3

I

8

J

K

L

2

M

max

N

4

Oβ=-3

P

9

Q

-6

R

0

S

3

T

5

U

-7

V

-9

min

B

A

W

-3

X

-5

X

-5

blue: terminal state

Rutgers CS440, Fall 2003

Effectiveness of Alpha-Beta Search

- Effectiveness depends on the order in which successors are examined. More effective if bestare examined first
- Worst Case:
- ordered so that no pruning takes place
- no improvement over exhaustive search
- Best Case:
- each player’s best move is evaluated first (left-most)
- In practice, performance is closer to bestrather than worst case

Rutgers CS440, Fall 2003

Effectiveness of Alpha-Beta Search

- In practice often get O(b(d/2)) rather than O(bd)
- same as having a branching factor of b

since (b)d = b(d/2)

- For Example: Chess
- goes from b ~ 35 to b ~ 6
- permits much deeper search for the same time
- makes computer chess competitive with humans

Rutgers CS440, Fall 2003

Dealing with Limited Time

- In real games, there is usually a time limit T on making a move
- How do we take this into account?
- cannot stop alpha-beta midway and expect to useresults with any confidence
- so, we could set a conservative depth-limit that guarantees we will find a move in time < T
- but then, the search may finish early andthe opportunity is wasted to do more search

Rutgers CS440, Fall 2003

Dealing with Limited Time

- In practice, iterative deepening search (IDS) is used
- run alpha-beta search with an increasing depth limit
- when the clock runs out, use the solution foundfor the last completed alpha-beta search(i.e., the deepest search that was completed)

Rutgers CS440, Fall 2003

The Horizon Effect

- Sometimes disaster lurks just beyond search depth
- computer captures queen, but a few moves later the opponent checkmates (i.e., wins)
- The computer has a limited horizon; it cannotsee that this significant event could happen
- How do you avoid catastrophic losses due to “short-sightedness”?
- quiescence search
- secondary search

Rutgers CS440, Fall 2003

The Horizon Effect

- Quiescence Search
- when evaluation frequently changing, look deeper than limit
- look for a point when game “quiets down”
- Secondary Search
- find best move looking to depth d
- look k steps beyond to verify that it still looks good
- if it doesn't, repeat Step 2 for next best move

Rutgers CS440, Fall 2003

Book Moves

- Build a database of opening moves, end games, and studied configurations
- If the current state is in the database, use database:
- to determine the next move
- to evaluate the board
- Otherwise, do alpha-beta search

Rutgers CS440, Fall 2003

Examples of Algorithmswhich Learn to Play Well

Checkers:

A. L. Samuel, “Some Studies in Machine Learning using the Game of Checkers,” IBM Journal of Research and Development, 11(6):601-617, 1959

- Learned by playing a copy of itself thousands of times
- Used only an IBM 704 with 10,000 words of RAM, magnetic tape, and a clock speed of 1 kHz
- Successful enough to compete well at human tournaments

Rutgers CS440, Fall 2003

Examples of Algorithmswhich Learn to Play Well

Backgammon:

G. Tesauro and T. J. Sejnowski, “A Parallel Network that Learns to Play Backgammon,” Artificial Intelligence39(3), 357-390, 1989

- Also learns by playing copies of itself
- Uses a non-linear evaluation function - a neural network
- Rated one of the top three players in the world

Rutgers CS440, Fall 2003

Non-deterministic Games

- Some games involve chance, for example:
- roll of dice
- spin of game wheel
- deal of cards from shuffled deck
- How can we handle games with random elements?
- The game tree representation is extendedto include chance nodes:
- agent moves
- chance nodes
- opponent moves

Rutgers CS440, Fall 2003

50/50

50/50

.5

.5

.5

.5

Bβ=2

Cβ=6

Dβ=0

Eβ=-4

7

2

9

6

5

0

8

-4

Non-deterministic GamesThe game tree representation is extended:

max

chance

min

Rutgers CS440, Fall 2003

50/50

50/50

.5

.5

.5

.5

Bβ=2

Cβ=6

Dβ=0

Eβ=-4

7

2

9

6

5

0

8

-4

Non-deterministic Games- Weight score by the probabilities that move occurs
- Use expected value for move: sum of possible random outcomes

max

50/50

4

50/50-2

chance

min

Rutgers CS440, Fall 2003

50/50

4

50/50

-2

.5

.5

.5

.5

Bβ=2

Cβ=6

Dβ=0

Eβ=-4

7

2

9

6

5

0

8

-4

Non-deterministic Games- Choose move with highest expected value

Aα=4

max

chance

min

Rutgers CS440, Fall 2003

Non-deterministic Games

- Non-determinism increases branching factor
- 21 possible rolls with 2 dice
- Value of lookahead diminishes: as depth increases probability of reaching a given node decreases
- alpha-beta pruning less effective
- TDGammon:
- depth-2 search
- very good heuristic
- plays at world champion level

Rutgers CS440, Fall 2003

Computers can playGrandMaster Chess

“Deep Blue” (IBM)

- Parallel processor, 32 nodes
- Each node has 8 dedicated VLSI “chess chips”
- Can search 200 million configurations/second
- Uses minimax, alpha-beta, sophisticated heuristics
- It currently can search to 14 ply (i.e., 7 pairs of moves)
- Can avoid horizon by searching as deep as 40 ply
- Uses book moves

Rutgers CS440, Fall 2003

Computers can playGrandMaster Chess

Kasparov vs. Deep Blue, May 1997

- 6 game full-regulation chess match sponsored by ACM
- Kasparov lost the match 2 wins & 1 tie to 3 wins & 1 tie
- This was an historic achievement for computer chess being the first time a computer became the best chess player on the planet
- Note that Deep Blue plays by “brute force” (i.e., raw power from computer speed and memory); it uses relatively little that is similar to human intuition and cleverness

Rutgers CS440, Fall 2003

Status of Computersin Other Deterministic Games

- Checkers/Draughts
- current world champion is Chinook
- can beat any human, (beat Tinsley in 1994)
- uses alpha-beta search, book moves (> 443 billion)
- Othello
- computers can easily beat the world experts
- Go
- branching factor b ~ 360 (very large!)
- $2 million prize for any system that can beat a world expert

Rutgers CS440, Fall 2003

Download Presentation

Connecting to Server..