Honte, a Go-Playing Program Using Neural Nets

Honte, a Go-Playing Program Using Neural Nets Frederik Dahl

Combined approach • Supervised learning • Shape evaluation • Reinforcement learning • Group safety • Territory • Heuristic evaluation • Influence • Search • Capture • Connectivity • Life and death

Architecture

Shape evaluation: Multilayer perceptron • 190 inputs • Receptive field of radius 3 • Distance to edge • Liberties • Captured stones • 50 hidden nodes • Single output • Will an expert play here?

Shape evaluation:Training and performance • Trained on 400 expert games • Expert move used as positive example (+1) • Random legal move as negative example (0) • Error backpropagation • error = target - eval • Performance measured by treating prediction as evaluation function • What percentage of legal moves are ranked below the expert move?

Shape evaluation:Results

Local search • Selective search for local goals • Capture • Connectivity • Life and death • Only considers moves suggested by shape evaluating network • Deep and narrow search • Captures common-sense knowledge

Group safety evaluation:Multilayer perceptron • Groups defined by connectable blocks • 13 inputs • Number of stones in group • Number of liberties in group • Number of proven eyes • Average opponent influence over liberties • 20 hidden nodes • 1 output • Probability of group survival

Group safety evaluation:Temporal difference learning • Trained by self-play • Reward signal for the group is the average final safety of stones • 0 = captured • 1 = survived • TD(0) is used, replaying games backwards • Very simple idea: • error = eval(next) - eval(now)

Influence evaluation • Consider random walks from an intersection • How likely to end up at a black or white stone? • Can also take account of group safety estimates

Territory evaluation • Another multilayer perceptron • 4 Inputs • Revised influence (for both sides) • Distance from edge • 10 hidden nodes • 1 output • Predicted territory value • Trained by TD(0) using eventual territory value as reward signal

Playing strength • Playing 19x19 Go • Approximately even against Handtalk 97-06e • Wins more than 50% against Ego 1.0 • Weaknesses • Confuses group safety with group strength • Has no concept of the aji of a group

New version of WinHonte 1.03 Neural net to evaluate sente/gote Trial version available online! Recent work

Conclusions • Go knowledge can be learned • Combining different forms of knowledge can be a good idea • Multilayer perceptrons provide a flexible representation • Local search can be used effectively as input features for learning

Honte, a Go-Playing Program Using Neural Nets