1 / 10

Learning BlackJack with ANN (Aritificial Neural Network)

Learning BlackJack with ANN (Aritificial Neural Network). Ip Kei Sam sam@cae.wisc.edu ID: 9012828100. Goal. Use Reinforcement Learning algorithm to learn strategies in Blackjack. Train MLP to play Blackjack without explicitly teaching the rules of the game.

Patman
Download Presentation

Learning BlackJack with ANN (Aritificial Neural Network)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning BlackJack with ANN (Aritificial Neural Network) Ip Kei Sam sam@cae.wisc.edu ID: 9012828100

  2. Goal • Use Reinforcement Learning algorithm to learn strategies in Blackjack. • Train MLP to play Blackjack without explicitly teaching the rules of the game. • Develop a better strategy with ANN that beats the Dealer’s 17 points rule.

  3. Blackjack • Draw cards from a deck of 52 cards to a total value as close to 21 as possible. • Simplify Blackjack to allow only hit or stand in each turn.

  4. Reinforcement Learning • Map situations to actions such that the reward value is maximized. • Decide which actions (hit/stand) to take by finding the actions that yields the highest reward through trial and error. • Update winning probability of the intermediate states after each game. • The winning probability of each state converges as the learning parameter decreases after each game.

  5. Result table from learning • The first 5 columns = dealer’s cards • next 5 columns = the player’s cards • Card sorted in ascending order • Column 11 = the winning probability of each state • Column 12 & 13 = action taken by the player • Action [1 0] -> “hit” • [0 1] -> “stand” and [1 1] -> end state 2.0000 5.0000 0 0 0 6.0000 6.0000 0 0 0 0.3700 1.0000 0 2.0000 5.0000 0 0 0 4.0000 6.0000 6.0000 0 0 0.2500 1.0000 0 2.0000 5.0000 10.0000 0 0 4.0000 6.0000 6.0000 7.0000 0 0 1.0000 1.0000

  6. MLP and game flow

  7. MLP Configurations • Normalization in feature vectors, and scaled to range of -5 to 5. • Max. Training Epochs: 1000, epoch size = 64 • Activation function (hidden layer)=hyperbolic tangent • Activation function (output layer) = sigmoidal • MLP1: α = 0.1, µ = 0, MLP Config 4-10-10-10-2. 89.5%. • MLP2:α = 0.1, µ = 0.8, MLP Config 5-10-10-10-2. 91.1%. • MLP3: α = 0.8, µ = 0, MLP Config 5-10-10-10-2. 92.5%. • MLP4: α = 0.1, µ = 0, MLP Config 6-12-12-12-2. 90.2%.

  8. Experiment Results When dealer uses 17-point rule: When player uses random moves: When both dealer and player use MLP:

  9. Conclusion • MLP network works best for highly random and dynamic games, where the game rules and the strategies are hard to define and the game outputs are hard to predict exactly. • Strategies interpreted from Reinforcement Learning - Hit if less than 15, otherwise stand. • As the number of game increases, the game strategies will change over time.

  10. Future work • Current hand depends on the last hands. Use card memory in Blackjack. • Train ANN with a teacher to eliminate duplicate patterns (for example, 4 + 7 = 7 + 4 = 5 + 6 = …) and identify misclassified pattern • Train ANN to play against different experts so that it can pick up various game strategies • Include game tricks and strategies in a table for the ANN to look up • Explore other learning methods

More Related