Monte carlo go has a way to go
Download
1 / 24

Monte Carlo Go Has a Way to Go - PowerPoint PPT Presentation


  • 352 Views
  • Uploaded on

Monte Carlo Go Has a Way to Go. Adapted from the slides presented at AAAI 2006. Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1). (*1)University of Tokyo (*2) Future University Hakodate. Games in AI.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Monte Carlo Go Has a Way to Go' - Mia_John


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Monte carlo go has a way to go l.jpg

Monte Carlo Go Has a Way to Go

Adapted from the slides presented at AAAI 2006

Haruhiro Yoshimoto (*1)

Kazuki Yoshizoe (*1)

Tomoyuki Kaneko (*1)

Akihiro Kishimoto (*2)

Kenjiro Taura (*1)

(*1)University of Tokyo

(*2)Future University Hakodate


Games in ai l.jpg
Games in AI

  • Ideal test bed for AI research

    • Clear results

    • Clear motivation

    • Good challenge

  • Success in search-based approach

    • chess (1997, Deep Blue)

    • and others

  • Not successful in the game of Go

    • Go is to Chess as Poetry is to Double-entry accounting

    • It goes to the core of artificial intelligence, which involves the study of learning and decision-making, strategic thinking, knowledge representation, pattern recognition and, perhaps most intriguingly, intuition


The game of go l.jpg
The game of Go

  • An 4,000 years old board game from China

  • Standard size 19×19

  • Two players, Black and White, place the stones in turns

  • Stones can not be moved, but can be captured and taken off

  • Larger territory wins



Playing strength l.jpg
Playing Strength

$1.2M was set for beating a professional with no handicap (expired!!!)

Handtalk in 1997 claimed $7,700 for winning an 11-stone handicap match against a 8-9 years old master


Difficulties in computer go l.jpg
Difficulties in Computer Go

  • Large search space

    • the game becomes progressively more complex, at least for the first 100 ply


Difficulties in computer go7 l.jpg
Difficulties in Computer Go

  • Lack of good evaluation function

    • a material advantage does not mean a simple way to victory, and may just mean that short-term gain has been given priority

    • legal moves around 150–250, usually <50 acceptable (even <10), but computers have a hard time distinguishing them.

  • Very high degree of pattern recognition involved in human capacity to play well.


Why monte carlo go l.jpg
Why Monte Carlo Go?

Replace evaluation function by random sampling

Brugmann:1993, Bouzy:2003

  • Success in other domains

    Bridge [Ginsberg:1999], Poker [Billings et al.:2002]

  • Reasonable position evaluation based on sampling

    search space from O(bd) to O(Nbd)

  • Easy to parallelize

  • Can win against search-based approach

    • Crazy Stone won the 11th Computer Olympiad in 9x9 Go

    • MoGo 19th, 20th KGS 9x9 winner, rated highest on CGOS


Basic idea of monte carlo go l.jpg
Basic idea of Monte Carlo Go

  • Generate next moves by 1-ply search

  • Play a number of random games and compute the expected score

  • Choose the move with the maximal score

  • The only domain-dependent information is eye.


Terminal position of go l.jpg
Terminal Position of Go

Larger territory wins

Territory =

surrounded area + stones

▲ Black’s territory is 36 points

× White’s territory is 45 points

White wins by 9 points


Example l.jpg

Play many sample games

Each player plays randomly

Compute average points for each move

Select the move that has the highest average

Example

Play rest of the game randomly

5 points win for black

9 points win for black

move A: (5 + 9) / 2 = 7 points


Monte carlo go and sample size l.jpg
Monte Carlo Go and Sample Size

Monte Carlo with

1000 sample games

  • Can reduce statistical errors with additional samples

  • Relationships between sample size and strength are not yet investigated

    • Sampling error~

    • N: # of random games

      Diminishing returns must appear

Monte Carlo with

100 sample games

Stronger than


Our monte carlo go implementation l.jpg
Our Monte Carlo Go Implementation

  • basic Monte Carlo Go

  • atari-50 enhancement: Utilization of simple go knowledge in move selection

  • progressive pruning [Bouzy 2003]: statistical move pruning in simulations


Atari 50 enhancement l.jpg
Atari-50 Enhancement

  • Basic Monte Carlo: assign uniform probability for each move in sample game (no eye filling)

  • Atari-50: higher probability for capture moves

    • Capture is “mostly” a good move

    • 50%

Move A captures black stones


Progressive pruning bouzy2003 l.jpg
Progressive Pruning [Bouzy2003]

  • Try sampling with smaller sample size

  • Prune statistically inferior moves

score

move

Can assign more sample games

to promising moves


Experimental design l.jpg
Experimental Design

  • Machine

    • Intel Xeon Dual CPU at 2.40 GHz with 2 GB memory

    • Use 64 PCs (128 processors) connected by 1GB/s network

  • Three versions of programs

    • BASIC: Basic Monte Carlo Go

    • ATARI: BASIC + Atari-50 enhancement

    • ATARIPP: ATARI + Progressive Pruning

  • Experiments

    • 200 self-play games

    • Analysis of decision quality from 58 professional games


Diminishing returns 4 n samples vs n samples for each move l.jpg
Diminishing Returns4*N samples vs N samplesfor each move



Decision quality of each move l.jpg
Decision Quality of Each Move

a

b

c

1

20

17

10

2b -> 9 times

2c -> 1 times

15

2

25

30

3

12

21

7

Selected move for

100 sample game

Monte Carlo Go

Evaluation score of “Oracle”

(64 million sample games)

Average error of one move is

((30 – 30) * 9 + (30 - 15 ) * 1) / 10 = 1.5 points



Decision quality of each move with atari50 enhancement l.jpg
Decision Quality of Each Move (with Atari50 Enhancement)


Summary of experimental results l.jpg
Summary of Experimental Results

  • Additional enhancements improve strength of Monte Carlo Go

  • Diminish returns eventually

  • Additional enhancements get quicker diminishing returns

  • Need to collect more samples in the early stage game of 9x9 Go


Conclusions and future work l.jpg
Conclusions and Future Work

  • Conclusions

    • Additional samples achieve only small improvements

      • Not like search algorithm, e.g. chess

    • Good at strategy, not tactics

      • blunder due to lack of domain knowledge

    • Easy to evaluate

    • Easy to parallelize

    • The way for Monte Carlo Go to go

      Small sample games with many enhancements will be promising

  • Future Work

    • Adjust probability with pattern matching

    • Learning

    • Search + Monte Carlo Go

      • MoGo (exploration-exploitation in the search tree using UCT)

    • Scale to 19×19


Slide24 l.jpg

Questions

?

Reference:

  • Go wiki http://en.wikipedia.org/wiki/Go_(board_game)

  • Gnu Go http://www.gnu.org/software/gnugo/

  • KGS Go Server http://www.gokgs.com

  • CGOS 9x9 Computer Go Server http://cgos.boardspace.net


ad