### Monte Carlo Go Has a Way to Go

Adapted from the slides presented at AAAI 2006

Haruhiro Yoshimoto (*1)

Kazuki Yoshizoe (*1)

Tomoyuki Kaneko (*1)

Akihiro Kishimoto (*2)

Kenjiro Taura (*1)

(*1)University of Tokyo

(*2)Future University Hakodate

Games in AI

- Ideal test bed for AI research
- Clear results
- Clear motivation
- Good challenge

- Success in search-based approach
- chess (1997, Deep Blue)
- and others

- Not successful in the game of Go
- Go is to Chess as Poetry is to Double-entry accounting
- It goes to the core of artificial intelligence, which involves the study of learning and decision-making, strategic thinking, knowledge representation, pattern recognition and, perhaps most intriguingly, intuition

The game of Go

- An 4,000 years old board game from China
- Standard size 19×19
- Two players, Black and White, place the stones in turns
- Stones can not be moved, but can be captured and taken off
- Larger territory wins

Playing Strength

$1.2M was set for beating a professional with no handicap (expired!!!)

Handtalk in 1997 claimed $7,700 for winning an 11-stone handicap match against a 8-9 years old master

Difficulties in Computer Go

- Large search space
- the game becomes progressively more complex, at least for the first 100 ply

Difficulties in Computer Go

- Lack of good evaluation function
- a material advantage does not mean a simple way to victory, and may just mean that short-term gain has been given priority
- legal moves around 150–250, usually <50 acceptable (even <10), but computers have a hard time distinguishing them.

- Very high degree of pattern recognition involved in human capacity to play well.

Why Monte Carlo Go?

Replace evaluation function by random sampling

Brugmann:1993, Bouzy:2003

- Success in other domains
Bridge [Ginsberg:1999], Poker [Billings et al.:2002]

- Reasonable position evaluation based on sampling
search space from O(bd) to O(Nbd)

- Easy to parallelize
- Can win against search-based approach
- Crazy Stone won the 11th Computer Olympiad in 9x9 Go
- MoGo 19th, 20th KGS 9x9 winner, rated highest on CGOS

Basic idea of Monte Carlo Go

- Generate next moves by 1-ply search
- Play a number of random games and compute the expected score
- Choose the move with the maximal score
- The only domain-dependent information is eye.

Terminal Position of Go

Larger territory wins

Territory =

surrounded area + stones

▲ Black’s territory is 36 points

× White’s territory is 45 points

White wins by 9 points

Each player plays randomly

Compute average points for each move

Select the move that has the highest average

ExamplePlay rest of the game randomly

5 points win for black

9 points win for black

move A: (5 + 9) / 2 = 7 points

Monte Carlo Go and Sample Size

Monte Carlo with

1000 sample games

- Can reduce statistical errors with additional samples
- Relationships between sample size and strength are not yet investigated
- Sampling error～
- N: # of random games
Diminishing returns must appear

Monte Carlo with

100 sample games

Stronger than

Our Monte Carlo Go Implementation

- basic Monte Carlo Go
- atari-50 enhancement: Utilization of simple go knowledge in move selection
- progressive pruning [Bouzy 2003]: statistical move pruning in simulations

Atari-50 Enhancement

- Basic Monte Carlo: assign uniform probability for each move in sample game (no eye filling)
- Atari-50: higher probability for capture moves
- Capture is “mostly” a good move
- 50%

Move A captures black stones

Progressive Pruning [Bouzy2003]

- Try sampling with smaller sample size
- Prune statistically inferior moves

score

move

Can assign more sample games

to promising moves

Experimental Design

- Machine
- Intel Xeon Dual CPU at 2.40 GHz with 2 GB memory
- Use 64 PCs (128 processors) connected by 1GB/s network

- Three versions of programs
- BASIC: Basic Monte Carlo Go
- ATARI: BASIC + Atari-50 enhancement
- ATARIPP: ATARI + Progressive Pruning

- Experiments
- 200 self-play games
- Analysis of decision quality from 58 professional games

Diminishing Returns4*N samples vs N samplesfor each move

Decision Quality of Each Move

a

b

c

1

20

17

10

2b -> 9 times

2c -> 1 times

15

2

25

30

3

12

21

7

Selected move for

100 sample game

Monte Carlo Go

Evaluation score of “Oracle”

(64 million sample games)

Average error of one move is

((30 – 30) * 9 + (30 - 15 ) * 1) / 10 = 1.5 points

Decision Quality of Each Move (with Atari50 Enhancement)

Summary of Experimental Results

- Additional enhancements improve strength of Monte Carlo Go
- Diminish returns eventually
- Additional enhancements get quicker diminishing returns
- Need to collect more samples in the early stage game of 9x9 Go

Conclusions and Future Work

- Conclusions
- Additional samples achieve only small improvements
- Not like search algorithm, e.g. chess

- Good at strategy, not tactics
- blunder due to lack of domain knowledge

- Easy to evaluate
- Easy to parallelize
- The way for Monte Carlo Go to go
Small sample games with many enhancements will be promising

- Additional samples achieve only small improvements
- Future Work
- Adjust probability with pattern matching
- Learning
- Search + Monte Carlo Go
- MoGo (exploration-exploitation in the search tree using UCT)

- Scale to 19×19

?

Reference:

- Go wiki http://en.wikipedia.org/wiki/Go_(board_game)
- Gnu Go http://www.gnu.org/software/gnugo/
- KGS Go Server http://www.gokgs.com
- CGOS 9x9 Computer Go Server http://cgos.boardspace.net

