140 likes | 282 Views
This paper presents a neural network-based program for playing Go, integrating supervised and reinforcement learning techniques. It elaborates on the architecture used in shape evaluation, group safety, influence, and territory assessment. By employing multilayer perceptrons with numerous input features, the program demonstrates effective prediction of expert moves and strategic evaluations. Trained through self-play and expert game data, the program significantly improves local search and incorporates common sense knowledge. Findings show a flexible representation of Go expertise can be cultivated combining various knowledge forms.
E N D
Honte, a Go-Playing Program Using Neural Nets Frederik Dahl
Combined approach • Supervised learning • Shape evaluation • Reinforcement learning • Group safety • Territory • Heuristic evaluation • Influence • Search • Capture • Connectivity • Life and death
Shape evaluation: Multilayer perceptron • 190 inputs • Receptive field of radius 3 • Distance to edge • Liberties • Captured stones • 50 hidden nodes • Single output • Will an expert play here?
Shape evaluation:Training and performance • Trained on 400 expert games • Expert move used as positive example (+1) • Random legal move as negative example (0) • Error backpropagation • error = target - eval • Performance measured by treating prediction as evaluation function • What percentage of legal moves are ranked below the expert move?
Local search • Selective search for local goals • Capture • Connectivity • Life and death • Only considers moves suggested by shape evaluating network • Deep and narrow search • Captures common-sense knowledge
Group safety evaluation:Multilayer perceptron • Groups defined by connectable blocks • 13 inputs • Number of stones in group • Number of liberties in group • Number of proven eyes • Average opponent influence over liberties • 20 hidden nodes • 1 output • Probability of group survival
Group safety evaluation:Temporal difference learning • Trained by self-play • Reward signal for the group is the average final safety of stones • 0 = captured • 1 = survived • TD(0) is used, replaying games backwards • Very simple idea: • error = eval(next) - eval(now)
Influence evaluation • Consider random walks from an intersection • How likely to end up at a black or white stone? • Can also take account of group safety estimates
Territory evaluation • Another multilayer perceptron • 4 Inputs • Revised influence (for both sides) • Distance from edge • 10 hidden nodes • 1 output • Predicted territory value • Trained by TD(0) using eventual territory value as reward signal
Playing strength • Playing 19x19 Go • Approximately even against Handtalk 97-06e • Wins more than 50% against Ego 1.0 • Weaknesses • Confuses group safety with group strength • Has no concept of the aji of a group
New version of WinHonte 1.03 Neural net to evaluate sente/gote Trial version available online! Recent work
Conclusions • Go knowledge can be learned • Combining different forms of knowledge can be a good idea • Multilayer perceptrons provide a flexible representation • Local search can be used effectively as input features for learning