1 / 20

Learning Shape in Computer Go

Learning Shape in Computer Go. David Silver. A brief introduction to Go. Black and white take turns to place down stones Once played, a stone cannot move The aim is to surround the most territory Usually played on 19x19 board. Capturing.

sarahblair
Download Presentation

Learning Shape in Computer Go

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Shape in Computer Go David Silver

  2. A brief introduction to Go • Black and white take turns to place down stones • Once played, a stone cannot move • The aim is to surround the most territory • Usually played on 19x19 board

  3. Capturing • The lines radiating from a stone are called liberties • If a connected group of stones has all of its liberties removed then it is captured • Captured stones are removed from the board

  4. Capturing • The lines radiating from a stone are called liberties • If a connected group of stones has all of its liberties removed then it is captured • Captured stones are removed from the board

  5. Atari Go (Capture Go) • Atari Go is a simplified version of Go • The winner is the first player to capture • Often used to teach Go to beginners • Circumvents several tricky issues • The game only finishing by agreement • Ko (local repetitions of position) • Seki (local stalemates)

  6. Computer Go • Computer Go programs are very weak • Search space is too large for brute force techniques • No good evaluation functions • Human intuition (shape knowledge) has proven difficult to capture. • Why not learn shape knowledge? • And use it to learn an evaluation function?

  7. Local shape • Local shape describes a pattern of stones • It is used extensively by current Computer Go programs (pattern databases) • Inputting local shape by hand takes many years of hard labour • We would like to: • Learn local shapes by trial and error • Assign a value for the goodness of a shape • Just how good is a particular shape?

  8. Enumerating local shapes • In these experiments all possible local shapes are used as features • Up to a small maximum size (e.g. 2x2) • A local shape is defined to be: • A particular configuration of stones • At a canonical position on the board • Local shapes are used as binary features by the learning algorithm

  9. Invariances • Each canonical local shape can be: • Rotated • Reflected • Inverted • So each position may cause updates to multiple instances of each feature.

  10. Algorithm • Value function is learnt for afterstates • Move selection is done by 1-ply greedy search (ε = 0) over value function • Active local shapes are identified • Linear combination is taken • Sigmoid squashing function is applied • Backups are performed using TD(0) • Reward of +1 for winning, 0 for losing

  11. Value function approximation

  12. Training procedure • The challenge: • Learn to beat the average liberty player • So learning algorithm was trained specifically against the average liberty player • The problem: learning is very slow, since the agent almost never wins any games by chance. • The solution: mix in a proportion of random moves until the agent wins 50% of all games. • Reduce the proportion of randomness as the agent learns to win more games.

  13. Training procedure • The two pint challenge: • Learn to beat the average liberty player • So learning algorithm was trained specifically against the average liberty player • The problem: learning is very slow, since the agent almost never wins any games by chance. • The solution: mix in a proportion of random moves until the agent wins 50% of all games. • Reduce the proportion of randomness as the agent learns to win more games.

  14. Results for different shape sizes

  15. Results for different board sizes

  16. Shapes learned (1x1)

  17. Shapes learned (2x2)

  18. Shapes learned (3x3)

  19. Conclusions • Local shape information is sufficient to beat a naïve rule-based player • Significant shapes can be learned • The ‘goodness’ of shapes can be learned • A linear threshold unit can provide a reasonable evaluation function • Enumerating all local shapes reaches a natural limit at 3x3 • Training methodology is crucial

  20. Future work • Learn shapes selectively rather than enumerating all possible shapes • Learn shapes to answer specific questions • Can black B4 be captured? • Can white connect A2 to D5? • Learn non-local shape: • Use connectivity relationships • Build hierarchies of shapes

More Related