Learning BlackJack with ANN (Aritificial Neural Network)

1 / 10

# Learning BlackJack with ANN (Aritificial Neural Network) - PowerPoint PPT Presentation

Learning BlackJack with ANN (Aritificial Neural Network). Ip Kei Sam sam@cae.wisc.edu ID: 9012828100. Goal. Use Reinforcement Learning algorithm to learn strategies in Blackjack. Train MLP to play Blackjack without explicitly teaching the rules of the game.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Learning BlackJack with ANN (Aritificial Neural Network)' - Patman

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Learning BlackJack with ANN (Aritificial Neural Network)

Ip Kei Sam

sam@cae.wisc.edu

ID: 9012828100

Goal
• Use Reinforcement Learning algorithm to learn strategies in Blackjack.
• Train MLP to play Blackjack without explicitly teaching the rules of the game.
• Develop a better strategy with ANN that beats the Dealer’s 17 points rule.
Blackjack
• Draw cards from a deck of 52 cards to a total value as close to 21 as possible.
• Simplify Blackjack to allow only hit or stand in each turn.
Reinforcement Learning
• Map situations to actions such that the reward value is maximized.
• Decide which actions (hit/stand) to take by finding the actions that yields the highest reward through trial and error.
• Update winning probability of the intermediate states after each game.
• The winning probability of each state converges as the learning parameter decreases after each game.
Result table from learning
• The first 5 columns = dealer’s cards
• next 5 columns = the player’s cards
• Card sorted in ascending order
• Column 11 = the winning probability of each state
• Column 12 & 13 = action taken by the player
• Action [1 0] -> “hit”
• [0 1] -> “stand” and [1 1] -> end state

2.0000 5.0000 0 0 0 6.0000 6.0000 0 0 0 0.3700 1.0000 0

2.0000 5.0000 0 0 0 4.0000 6.0000 6.0000 0 0 0.2500 1.0000 0

2.0000 5.0000 10.0000 0 0 4.0000 6.0000 6.0000 7.0000 0 0 1.0000 1.0000

MLP Configurations
• Normalization in feature vectors, and scaled to range of -5 to 5.
• Max. Training Epochs: 1000, epoch size = 64
• Activation function (hidden layer)=hyperbolic tangent
• Activation function (output layer) = sigmoidal
• MLP1: α = 0.1, µ = 0, MLP Config 4-10-10-10-2. 89.5%.
• MLP2:α = 0.1, µ = 0.8, MLP Config 5-10-10-10-2. 91.1%.
• MLP3: α = 0.8, µ = 0, MLP Config 5-10-10-10-2. 92.5%.
• MLP4: α = 0.1, µ = 0, MLP Config 6-12-12-12-2. 90.2%.
Experiment Results

When dealer uses 17-point rule:

When player uses random moves:

When both dealer and player use MLP:

Conclusion
• MLP network works best for highly random and dynamic games, where the game rules and the strategies are hard to define and the game outputs are hard to predict exactly.
• Strategies interpreted from Reinforcement Learning - Hit if less than 15, otherwise stand.
• As the number of game increases, the game strategies will change over time.
Future work
• Current hand depends on the last hands. Use card memory in Blackjack.
• Train ANN with a teacher to eliminate duplicate patterns (for example, 4 + 7 = 7 + 4 = 5 + 6 = …) and identify misclassified pattern
• Train ANN to play against different experts so that it can pick up various game strategies
• Include game tricks and strategies in a table for the ANN to look up
• Explore other learning methods