1 / 39

AI techniques for the game of Go

AI techniques for the game of Go. Erik van der Werf Universiteit Maastricht / ReSound Algorithm R&D. Contents. Introduction Searching techniques The Capture Game Solving Go on Small Boards Learning techniques Move Prediction Learning to Score Predicting Life & Death

vilmos
Download Presentation

AI techniques for the game of Go

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AI techniques for the game of Go Erik van der Werf Universiteit Maastricht / ReSound Algorithm R&D

  2. Contents • Introduction • Searching techniques • The Capture Game • Solving Go on Small Boards • Learning techniques • Move Prediction • Learning to Score • Predicting Life & Death • Estimating Potential Territory • Summary of results • Conclusions

  3. The game of Go • Deceivingly simple rules • Black and White move in turns • A move places a stone on the board • Surrounded stones are captured • Direct repetition is forbidden (Ko-rule) • The game is over when both players pass • Theplayer controlling most intersections wins

  4. Some basic terminology

  5. Computer Go • Even the best Go programs have no chance against strong amateurs • Human players superior in area’s such as • pattern recognition • spatial reasoning • Learning

  6. Playing strength 29 stones handicap

  7. Problem statement How can Artificial Intelligence techniques be used to improve the strength of Go programs? We focused on Searching techniques & Learning techniques

  8. Searching techniques • Very successful for other board games • Evaluate positions by ‘thinking ahead’ • Research • Recognizing positions ‘that are irrelevant’ • Fast heuristic evaluations • Provably correct knowledge • Move ordering (the best moves first) • Re-use of partial results from the search process

  9. The Capture Game • Simplified version of Go • First to capture a stone wins the game • Passing not allowed • Detecting final positions trivial (unlike normal Go) • Search method • Iterative Deepening Principal Variation Search • Enhanced transposition table • Move ordering using shared tables for both colours for killer and history heuristic

  10. Heuristic evaluation for the capture game • Based on four principles: • Maximize liberties • Maximize territory • Connect stones • Make eyes Low order liberties (max. distance 3) Euler number (objects – holes) Fast computation using a bit-board representation

  11. Solutions for the Capture Game • All boards up to 5x5 were solved • Winner decided by board-size parity • Will initiative take over at 6 x 6? Solution for 44 (White wins) Solution for 55 (Black wins)

  12. Solutions for the Capture Game on 6x6 Initiative takes over at 6  6

  13. Solving Go on Small Boards • Iterative Deepening Principal Variation Search • Enhanced transposition table • Exploit board symmetry • Internal unconditional bounds • Effective move ordering • Evaluation function • Heuristic component • Similar to the capture game • Provably correct component • Benson’s algorithm for recognizing unconditional life extended with detection of unconditional territory

  14. Recognizing Unconditional Territory • Find regions surrounded by unconditionally alive stones of one colour • Find interior of the regions (eyespace) • Remove false eyes • Contract eyespace around defender stones • Count maximum sure liberties (MSL) MSL<2  Unconditionally territory. Otherwise  Play it out.

  15. Value of opening moves on 5x5 (2,2) (3,3) (3,2) Solutions for Small Boards

  16. Learning techniques • Successful in several related domains • Heuristic knowledge can be ‘learned’ from analysis of human games • Research • Representation & Generalization • Learn maximally from limited number of examples • Pros and cons of different architectures • Clever use of available domain knowledge

  17. Move prediction • Many moves in Go conform to local patterns which can be played almost reflexively • Train a MLP network to rank moves • Use move-pairs {expert , random} extracted from human game records • Training attempts to rank expert moves first

  18. Move Prediction - Representation • Selection of raw features: • Stones • Ko • Liberties after • Nearest stones • Edge • Liberties • Captures • Last move • Remove symmetry by canonical ordering & colour reversal • High-dimensional representation suffers from curse of dimensionality => Apply linear feature extraction to reduce dimensionality

  19. Move Prediction - Feature Extraction • Principal Component Analysis (PCA) • Linear Discriminant Analysis (LDA) • Move-Pair Analysis (MPA) • Linear projection maximizing the expected quadratic distance between pairs • Weakness: ignores global features • Modified Eigenspace Separation Transform (MEST) • Linear projection on eigenvectors with largest absolute eigenvalues of the correlation difference matrix • Good results using combination of MEST & MPA Standard techniques, sub-optimal for ranking

  20. Human & Computer Performance Compared Black must choose between two red intersections

  21. Performance on professional 19×19 games Cumulative performance (%) moves

  22. Learning to Score • Using archives of (online) Go servers, such as NNGS, for ML is non-trivial because of : • Missing information: Only a single numeric result is given. The status of individual board-points is not available. • Unfinished games: Humans resign early or do not even finish the game at all • Bad moves • To overcome 1&2, we need reliable final scores • Large dataset created: 18k labeled final 9x9 positions • Several tricks were used to identify dubious scores • A few thousand positions scored/verified manually

  23. The scoring method • Classify life & death for all blocks • Remove dead blocks • Mark empty intersections using flood-fills or distance to nearest remaining colour • (Optional) recursively update representation to take adjacent block status into account; return to 1

  24. Blocks to Classify • For final positions there are 3 types of blocks: • Alive (O): at border of own territory • Dead (X): inside the opponents territory • Irrelevant (?): removal does not change area score • We only train on blocks of type 1 and 2 !

  25. Representation of the blocks • Direct features of the block • Size • Perimeter • Adjacent opponent stones • 1st, 2nd, 3rd - order liberties • Protected liberties • Auto-atari liberties • Adjacent opponent blocks • Local majority (MD < 3) • Centre of mass • Bounding box size • Adjacent fully accessible CERs • Number of regions • Size • Perimeter • Split points • Adjacent partially accessible CERs • Number of partially accessible regions • Accessible size • Accessible perimeter • Inaccessible size • Inaccessible perimeter • Inaccessible split points • Disputed territory • Direct liberties of the block in disputed territory • Liberties of all friendly blocks in disputed territory • Liberties of all enemy blocks in disputed territory • Directly adjacent eyespace • Size • Perimeter • Optimistic chain • Number of blocks • Size • Perimeter • Split points • Adjacent CERs • Adjacent CERs with eyespace • Adjacent CERs, fully accessible from at least 1 block • Size of adjacent eyespace • Perimeter of adjacent eyespace • External opponent liberties • Opponent blocks (3x) • (1) Weakest directly adjacent opponent block (weakest = block with the fewest liberties) • (2) 2nd weakest directly adjacent opponent block • (3) Weakest opponent block adjacent or sharing liberties with the block’s optimistic chain • Perimeter • Liberties • Shared liberties • Split points • Perimeter of adjacent eyespace • Recursive features • Predicted value of strongest adjacent friendly block • Predicted value of weakest adjacent opponent block • Predicted value of second weakest adjacent opponent block • Average predicted value of weakest opponent block’s optimistic chain • Adjacent eyespace size of the weakest opponent block’s optimistic chain • Adjacent eyespace perimeter of the weakest opponent block’s optimistic chain

  26. Scoring Performance • Blocks (direct/recursive classification) • Full board (4-step recursive classification) • Incorrect score: 1.1% = better than the average rated NNGS player (~7 kyu) • Incorrect winner: 0.5% = comparable to the average NNGS player • Average absolute score difference: 0.15 points

  27. Life & Death during the game • Predict whether blocks of stones can be captured • Perfect predictionsnot possible in non-final positions! • Approximate the a posteriori probability that a block will be aliveat the end of the game • 4 Block types • First 3 types identified from final position (as before) • 4th type: blocks captured during the game -> dead • Irrelevant blocks not used during training! • Representation extended with 5 additional features Player to move, Ko , Distance to ko, Nr. of black/white stones on the board Black blocks 50% alive

  28. Performance over the game MLP, 25 hidden units, 175,000 training examples Average prediction error: 11.7%

  29. Estimating Potential Territory • Why estimate territory? • For predicting the score (potential territory) Main purpose: to build an evaluation function May also be used to adjust strategy (e.g., play safe when ahead) • To detect safe regions (secure territory) Main purpose: forward pruning (risky unless provably correct) • Our main focus is on (1) potential territory • We investigate: • Direct methods, known or derived from literature • ML methods, trained on game records • Enhancements with (heuristic) knowledge of L&D

  30. Direct methods • Explicit control • Direct control • Distance-based control • Influence based control (~ numerical dilations) • Bouzy’s method (numerical dilations + erosions) • Combinations 5+3 or 5+4 • Enhancements use knowledge of Life & Death to remove dead stones (or reverse their colour)

  31. features ML methods • Simple representation • Intersections in ROI: Colour {+1 black, -1 white, 0 empty} • Enhanced representation • Intersections in ROI: Colour x Prob.(Alive) • Edge • Colour of nearest stone • Colour of nearest livingstone • Prob.(Alive) obtained from pre-trained MLP predicted colour +1sure black 0neutral -1sure white

  32. Performance at various levels of confidence

  33. Predicting the winner (percentage correct)

  34. Predicting the score (absolute error)

  35. Summary: Searching Techniques • The capture game • Simplified Go rules(who captures the first stone wins) • boardsup to 6x6 solved • Go on small boards • Normal Go rules • First program in the world to have solved 5x5 Go • Perfect solutions up to ~30 intersections • Heuristic knowledge required for larger boards

  36. Summary: Learning Techniques 1 • Move prediction • Very good results (strong kyu level) • Strong play is possible with limited selection of moves • Scoring final positions • Excellent classification • Reliable training data

  37. Summary: Learning Techniques 2 • Predicting life and death • Good results • Most important ingredient for accurate evaluation of positions during the game • Estimating potential territory • Comparison of non-learning and learning methods • Best resultswith learning methods

  38. Conclusions • Knowledge is the most important ingredient to improve Go programs • Searching techniques • Provably correct knowledge sufficient for solving small problems up to ~30 intersections • Heuristic knowledge essential for larger problems • Learning techniques • Heuristic knowledge learned quite well from games • Learned heuristic knowledge at least at the level of reasonably strong kyu players

  39. Questions? ? More information at: http://erikvanderwerf.tengen.nl/ Email:

More Related