1 / 45

Machine Learning in Computer Game Players

Machine Learning in Computer Game Players. Chikayama & Taura Lab. M1 Ayato Miki. Outline. Introduction Computer Game Players Machine Learning in Computer Game Players Tuning Evaluation Functions Supervised Learning Reinforcement Learning Evolutionary Algorithms Conclusion.

dewey
Download Presentation

Machine Learning in Computer Game Players

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning in Computer Game Players Chikayama & Taura Lab. M1 Ayato Miki

  2. Outline • Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion

  3. 1. Introduction • Improvements in Computer Game Players • DEEP BLUE defeated Kasparov in 1997 • GEKISASHI and TANASE SHOGI on WCSC 2008 • Strong Computer Game Players are usually developed by strong human players • Input heuristics manually • Devote a lot of time and energy to tuning

  4. Machine Learning for Games • Machine Learning enables automatic tuning using a large amount of data • It is not necessary for a developer to be an expert of the game

  5. Outline • Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion

  6. 2. Computer Game Players • Games • Game Trees • Game Tree Search • Evaluation Function

  7. Games • Turn system games • ex. tic-tac-toe, chess, shogi, poker, mah-jong… • Additional Classification • two player or otherwise • zero-sum or otherwise • deterministic or non-deterministic • perfect or imperfect information • Game Tree Model

  8. Game Trees ← player’s turn ← move 2 move 1 → ← opponent’s turn

  9. Game Tree Search • ex. Minimax search algorithm 5 Max 5 3 Min Min 5 8 3 6 Max Max 3 1 5 4 8 2 3 0 1 6 4 2

  10. Game Tree Search • Difficult to search up to leaf nodes • 10^220 possible positions in shogi • Stop search at practicable depth • And “Evaluate” nodes • Using Evaluation Function

  11. Evaluation Function • Estimate the superiority of the position • Elements • feature vector of the position • parameter vector feature vector of position s parameter vector

  12. Outline • Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion

  13. 3. Machine Learning inComputer Game Players • Initial work • Samuel’s research [1959] • Learning objective • What do Computer Game Players Learn ?

  14. Samuel’s Checker Player [1959] • Many useful techniques • Rote learning • Quiescence search • 3-layer neural network evaluation function • And some machine learning techniques • Learning through self-play • Temporal-difference learning • Comparison training

  15. Learning Objective • Opening Book • Search Control • Evaluation Function

  16. Learning Evaluation Functions • Automatic construction of evaluation function • Construct and select a feature vector automatically • ex. GLEM [Buro, 1998] • Difficult • Tuning evaluation function parameters • Make a feature vector manually and tune its parameters automatically • Easy and effective

  17. Outline • Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion

  18. 4. Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithm

  19. Supervised Learning • Provide the program with example positions and their exact evaluation values • Adjusts the parameters in a way that minimizes the error between the evaluation function outputs and the exact values ・・・ 20 50 50 40

  20. Difficulty of Hard Supervised Training • Manual labeling positions • Quantitative evaluation Consider more soft approach

  21. Comparison Training • Soft Supervised Training • Require only relative order for the possible moves • Easier and more intuitive >

  22. Bonanza [Hoki, 2006] • Comparison training using records of expert games • Simple relative order The expert move other moves >

  23. Bonanza Method • Based on the Optimal Control Theory • Minimize the Cost Function J example positions in the records total number of example positions error function

  24. Bonanza Method Error Function child position with move m total number of possible moves the move played in the record minimax search value order discriminant function

  25. Order Discriminant Function • Sigmoid Function • k is the parameter to control the gradient • When , T(x) is Step Function • In this case, the error function means “the number of moves that were considered to be better than the move in the record”

  26. Bonanza • 30,000 professional game records and 30,000 high rating game records in SHOGI CLUB 24 were used • The weight parameters of about 10,000 feature elements were tuned • And won in the World Computer Shogi Championship 2006

  27. Problem of Supervised Learning • It is costly to accumulate a training data set • It takes a lot of time to label manually • Using expert records has been successful • But how if not enough expert records ? • New games • Minor games • Other approach without a training set • ex. Reinforcement Learning (Next)

  28. 4. Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithm

  29. Reinforcement Learning • The learner gets “a reward” from the environment • In the domain of game, the reward is final outcome(win/lose) • Reinforcement learning requires only the objective information of the game

  30. Reinforcement Learning +10 +20 -10 +30 +60 -30 +120 -60 +60 +200 -100 +100 Inefficient in Games…

  31. Temporal-Difference Learning +10 +10 +30 +15 +60 +10 +80 +100

  32. TD-Gammon [Tesauro, 1992] • Trained through self-play

  33. Problems of Reinforcement Learning • Falling into a local optimum • Lack of playing variation • Solutions • Add intentional randomness • Play against various players (computer/human) • Credit Assignment Problem (CAP) • Not clear which action was effective

  34. 4. Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithm

  35. Evolutionary Algorithm Initialize Population Randomly Vary Individuals Evaluate “Fitness” Apply Selection

  36. Research of Fogel et al. [2004] • Evolutionary algorithm forchess player • Using open-source chess program • Attempt to tune its parameters

  37. Initialization • Make initial 10 parents • Initialize parameters with random values

  38. Variation • Create 10 offsprings from each surviving parent by mutating parental parameters Gaussian random variable strategy parameter

  39. Evaluate Fitness and Selection • Each player plays ten games against randomly selected opponents • Ten best players become parents of the next generation Select 10 opponents randomly

  40. Tuned Parameters • Material value • Positional value • Weights and biases of three neural networks

  41. Three Neural Networks • Each network has 3 Layers • Input = Arrangement of specific areas (front 2 rows, back 2 rows, and center 4x4 square) • Hidden = 10 Units • Output = Worth of the area arrangement 16 input 10 hidden 1 output

  42. Result • Initial Rating = 2066 (Expert) • Rating of open-source player • Best Rating = 2437 (Senior Master) • But the program cannot yet compete with other strongest chess programs (R2800~) 10 independent trials (Each has 50 generations)

  43. Outline • Introduction • Computer Game Players • Machine Learning in Computer Game Players • Tuning Evaluation Functions • Supervised Learning • Reinforcement Learning • Evolutionary Algorithms • Conclusion

  44. Characteristics

  45. Future Work • Automatic position labeling • Using records or computer play • Sophisticated reward • Consider opponent’s strength • Move analysis for credit assignment • Experiment in other games

More Related