Evolution and Coevolution of ANNs playing Go

Evolution and Coevolution of ANNs playing Go Peter Mayer, 2004

Outline • Computers and Games • The game of Go • Experimental Setup • Training of Go playing ANNs • Evolution of Go Playing ANNs • Summary and Outlook

Games • Algorithms designed since AIs onset • Clearly defined rules • Still complex • Chess received the most attention • More researched than Go • Two main approaches • Rely on expertise – directly programmed weighted features; Extensive knowledge • Use evolution – less knowledge; more versatility

The game of Go • Oldest (unaltered) strategic board game in the world • 10,000,000 players in Japan “alone” • Fairly simple rules • BUT difficult to master • Immense tree (~200 opts) • Complex structures • Many concurrent goals

Go Rules • 19x19 board • Empty in the beginning • Black & White “stones” • Black starts • Each turn • Place 1 stone • At an intersection • Never move stones • OR pass

Go Rules [2] • Objective - Get the most points ! • Points are acquired by: • Securing Territories • Capturing opp’s pieces

Go Rules [3] • Stones at a vertically or horizontally adjacent intersection are called a group • An empty intersection adjacent to a stone or group is called a "liberty" of that group • 1 Liberty = group in “atari” • No liberties -> CAPTURE ! Group is removed • Example – Black places stone in X resulting in right figure

Go Rules [4] • Stones can be placed anywhere, but cannot commit suicide (except Chinese) • Legal if stone simultaneously captures opponent’s group (2 right figures) Suicide – white cannot place at X White CAN place at X Result: capture

Go Rules [5] • Same position cannot occur more than once • Endless repetitions: • Black can capture at upper figure by placing at X • White - same by placing at Y • Black – repeat… • Ko rule • White may not place at Y before playing somewhere else first • Avoid any repetitions

Go Rules – Live and Dead groups • “Dead” groups if impossible to prevent capture • It is not necessary to do so • Group remains on board • At end of game, removed and added to captured stones • “Living” groups are impossible to capture • Group with 2 “eyes” – even if white surrounds it, playing at X or Y is suicide • Opponent must play elsewhere

Go Basics – End game • Play continues until both players pass • Players then alternatively play stones at “neutral” points – adjacent to both White and Black • Also known as “dame” (DAH-MAY) • Dead stones are removed from the board and counted with other prisoners (1 point per prisoner) • Also - 1 point for each intersection surrounded by player’s stones (“territory”)

Go Basics – End game example • Prisoners were removed already • All 4 points marked X are dame – worthless • Black has • 7 points in UR (territory); 2 points in LL • 1 removed prisoner • TOTAL = 10 points • White has • 5 in UL; 2 in LR • 2 prisoners • TOTAL = 9 points • Black wins unless komi (5.5 pts compensation) is due

Ranking and Handicaps • Determine Go players’ strength • Resemblance to martial arts • Both amateur and professional ranking system • Amateur • 35 kyu to 1 kyu • THEN 1 dan to 7 dan • Pro • 1 dan to 9 dan • Awarded only by Go institutions • Pro dans are much stronger than amateur dans

Ranking and Handicaps (2) • Handicaps • Weaker player starts with several stones on the board • Placed at specific places • Helps make games more even • Difference in ranks ~ number of handicap stones needed to win • 2 stones to even 2 dan against 4 dan • 4 to even 3 kyu and 2 dan • The most powerful Go programs reach only … • … 10 kyu!

Experimental Setup • Opponent Go players • ANN player • Go board (input) representations • Move (output) representations • Coevolution • Hall of Fame coevolution • Cultural coevolution • General evolution setup

Go Players - Random • No strategy • Pass move also • “Knows” only the rules of go • legality of moves • Usually weakest opponent

Go Players – Naïve Player • Roughly human-beginner level • Able to save and capture stones • Knows about • Lost stones • Saving - connecting stones to living groups • Weak stones (not savable)

Go Players – Naïve Strategy • A subset of JaGo’s (main opponent) strategy • Outline (arranged by priority): • Attempt to save • Try to put opponent into atari • Connect weak stones • Capture opponent groups in atari • Check intersections for placing stones • In random order • Make sure no (own) liberties decrease below 2 as a result • Perform Random move

Go Players – JaGo Player • Java based program • Best computer player used • Not a strong player ~16 kyu • Knows standard techniques • Mainly save & capture • Uses pattern matching • Looks at entire board • 32 patterns, with rotations and mirrors

Go Players – JaGo Strategy (1) • Save stones in atari • Try to decrease liberties of large groups • Find own savable larger groups • Attack opponent’s groups (decreasing order:) • With 2 or more liberties and attackable • With 2 or more stones & less than 3 liberties • With 2 or less liberties

Go Players – JaGo Strategy (2) • Save own groups with few liberties if savable • Start pattern matching – Response; Center • Random move order • Seek opponent’s groups to capture in 2 moves • Perform random move which isn’t of a bad pattern • Capture opponent’s single liberties • Connect own weak stones • PASS

Go Players – JaGo Patterns (1)

Go Players – JaGo Patterns (2)

Go Players – GNU Go • Advantages • 5x5 to 19x19 boards • Handles handicaps well • Rated 10 kyu • Problems • 5x5 solved – open an C3 for 18.5 points (komi=5.5) – always wins in Black • GNU Go passes on B3, C2-4, D3 (only correct at C3) • Premature convergence of evolution

ANN Player • Inform ANN about actual position • Evaluate ANN output to receive next move • Representation is important! • Intention maps • For each Go move (including PASS) – value between [0,1] • High value – high intention to make move (and v.v) • Select legal move with highest value To avoid predictability – consider sub optimal moves also (“creativity factor”)

Player Strength • Commonly to receive a rating unrated Go players play against rated players (same in Chess) • The strength s of a player is determined by • The score of 1000 double games • Against each of 3 opponents: R, N, JaGo • Divided by the number of games (6,000) • 1 is perfect strength • 3 opponents help resist over-fitting

Player Competence • Strength is not understanding of rules (legality) • E.g. 2 players receive same score but only one always tried legal moves first • The competence C of a player is defined as follows: • bi = games; i = moves; tij = #tried illegal moves; kij = #possible illegal move • C is the averaged on all games

Board Representations • 19x19 boards • far too large • Even for evolved agents • Use only 5x5 boards

Board Representations • Should preprocess position to make ANNs life easier • Tested in training experiments • Standard Input Representation (SIR) • 2 neurons at each intersection :- • 1 per player’s piece; 1 per opponent’s • No distinction between B and W stones • Optional – 1 neuron to tell if B or W • (2*b^2) neurons (were b is board size) = 50

Representations - NIR • Naïve Input Representation • More compact • 1 neuron per intersection • Set to -1 (player’s stone) or 1 (opponent’s) • 0 if empty • Uses half of SIRs neurons = 25

Representations - LVIR • Limited View Input Representation • Splits the Go board into several quadratic areas of size 3x3 • Idea – simplest way of capturing stones works within this area • E.g. capture of 1 stone by surrounding it • Areas overlap at middle row and middle column • Coding – similar to SIR • w is number of areas (=4) • 72 Neurons • Could also be Naïve

Clever Representations • Based on image processing and circuits • We want less raw inputs to allow ANN to concentrate more on features • Manhattan distance • Used in integrated circuits where wires run parallel to X or Y axis • Got its name from Manhattan NY, where streets are aligned in grid • P1 = (x1, x2) • P2 = (y1, y2)

Clever Representations • Manhattan distance is related to distance of Go stones (no diagonals) • distance = [#(separating stones) – 1] • 1 if next to each other • 2 if separated by one stone • 3 for knight’s move or two separating stones

Representations: c-o-Matrix • Co-occurrence-matrices • Used in image processing • Many parameters are derived from it • Mean, Sd, energy, contrast, homogeneity, … • Quadratic • Based on a relation p between image positions (symmetric if p is)

Representations: c-o-Matrix • Elements C[i][j] = • Number of times pixels occur in an image of a specified value (color) • In the relation specified by p • Relative to other pixels • Size is number of different colors

Representations: c-o-Matrix • An actual go board is an “image” with 3 different colors (including empty) • Example • p1: Manhattan distance of 1 between 2 points • First matrix row: • B near B 16 times • B near W 3 times • B near empty 11 times

Representations: c-o-Matrix • Does not say much about absolute positions – must combine • SIR and C for whole board • NIR and C for whole board • NIR and Cs for 3x3 areas • sLVIR and Cs for 3x3 areas • NLVIR and Cs for 3x3 areas

Output Representations • Only 2  • Standard Output Representation (SOR) • Each intersection is represented by 1 neuron • 1 for PASS • (b^2 + 1) neurons

Output Representations • Row Column Output Representation (RCOR) • Used to decrease ANN size • 5 neurons for columns; 5 for rows • 1 for PASS • (2b + 1) neurons • Intention more complicated: • PASS intention is square of relevant neuron • RCOR Limits intention map: • v1>v2  y1>y2  v4>v3 • All values positive, non-zero

Coevolution • Derives non-static fitness, as in nature • 1 or more populations; interacting • Competitive [battle] vs. Cooperative [subtasks] • Advantages • “Who needs enemies when you got friends like these?” – saves finding opponents; Especially in Go where no strong program exists • Variety in fitness – adaptive opponents • No upper bound for improvement

Coevolution Methods Applied • Based on work by Lubberts & Mikkulainen [2001] • Hall of Fame • Host population and Master population • Maintaining the ability of host population to beat opponents of previous generations • Each generation, the best individual is added to HoF • All population competes against sample of the HoF

Coevolution - HoF • Applied in this resaearch • HoF initially filled without competition • Individuals get their fitness by competing against the masters • When full - host with highest win rates (against masters) joins HoF • Replace first Master to lose all games • Coevolutionary progress cannot be directly seen • Both populations constantly changeing

Cultural Coevolution • A new approach! • Maintains “culture” of masters resembling HoF • To enter culture, host must defeat all masters • Masters never replaced – unlimited culture size • Every individual receives a fitness score by competing against all masters • Culture growth rate decreases rapidly • Every new master is the strongest found (yet)

Cultural Coevolution [2] • Numerous advantages • Maintains ability to defeat weak players • Keeps good solutions found • Same player cannot enter twice • Needs to defeat itself • Culture’s performance never decreases • Avoid focusing on a specific player’s weakness • As soon as any master is immune, the hosts have to find another way • More masters  less likely to remember all weaknesses

General Evolution Setup • Opponents – Random; Naïve; JaGo • Fitness = strength • Rate of wins against all 3 opponents • 6,000 games of both colors • Not using scores, only win rates • Defeating more opponents is better • Generalized Multi-Layer Perceptrons (GMLPs) • All non-loop connections are permitted • Evolving • Hidden neurons; connections; weights; bias (for non-input)

General Evolution Setup [2] • 2 binary Chromosomes used • 1 for connections : 0-no 1-yes • 1 for hidden neurons (if 0, no connections also) • Number of possible connections: • ni, nh, no – number of input, hidden and output neurons • Determines size of chromosome • Real-Chromosome • Weights & Bias values (seen as weights) • Size is number of connections + number of bias vals (for non-input)

General Evolution Setup [3] • Tournament selection (size 2) • 2 point crossover • Binary mutation • Flip bits with 1/L probability • Real-Chromosome Mutation • multiple-σSA • Each object maintains altering “strategy” params which alter distribution of “object” params • Normal distributions used for both

Setup – Recurrent Nets • Difficult to learn Go without structured input • Experiments with recurrent nets included • Allow loops for input Ns • Naturally represent adjacent board intersections • No hidden Ns • Played against JaGo • Typically output changes without input change due to feedback loops • Computed output only once! • Only 2 directly connected Ns influence each other • Evolutions should connect only close Ns

Evolution and Coevolution of ANNs playing Go

Evolution and Coevolution of ANNs playing Go

Presentation Transcript

Coevolution

Coevolution

Coevolution

Coevolution : The joint evolution of two species with close ecological relationships

Chapter 20: Coevolution and Mutualism

Technology/Business Innovation and Coevolution

Coevolution of Industries and Academic Disciplines

Playing Evolution Games in the Classroom

Coevolution

What is Coevolution?

Product Evolution: Music playing dvices

Competitive Coevolution (Predator-Prey Coevolution)

The Evolution of Role Playing Games

Coevolution

Evolution of Music Playing Devices

Evolution and Coevolution of ANNs playing Go

Chapter 20: Coevolution and Mutualism

Coevolution

ANNs (Artificial Neural Networks)

Playing Hide and Go Seek with God

Evolution and Coevolution of Artificial Neural Networks playing Go