learning blackjack with ann aritificial neural network l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Learning BlackJack with ANN (Aritificial Neural Network) PowerPoint Presentation
Download Presentation
Learning BlackJack with ANN (Aritificial Neural Network)

Loading in 2 Seconds...

play fullscreen
1 / 10

Learning BlackJack with ANN (Aritificial Neural Network) - PowerPoint PPT Presentation


  • 551 Views
  • Uploaded on

Learning BlackJack with ANN (Aritificial Neural Network). Ip Kei Sam sam@cae.wisc.edu ID: 9012828100. Goal. Use Reinforcement Learning algorithm to learn strategies in Blackjack. Train MLP to play Blackjack without explicitly teaching the rules of the game.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Learning BlackJack with ANN (Aritificial Neural Network)' - Patman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
learning blackjack with ann aritificial neural network

Learning BlackJack with ANN (Aritificial Neural Network)

Ip Kei Sam

sam@cae.wisc.edu

ID: 9012828100

slide2
Goal
  • Use Reinforcement Learning algorithm to learn strategies in Blackjack.
  • Train MLP to play Blackjack without explicitly teaching the rules of the game.
  • Develop a better strategy with ANN that beats the Dealer’s 17 points rule.
blackjack
Blackjack
  • Draw cards from a deck of 52 cards to a total value as close to 21 as possible.
  • Simplify Blackjack to allow only hit or stand in each turn.
reinforcement learning
Reinforcement Learning
  • Map situations to actions such that the reward value is maximized.
  • Decide which actions (hit/stand) to take by finding the actions that yields the highest reward through trial and error.
  • Update winning probability of the intermediate states after each game.
  • The winning probability of each state converges as the learning parameter decreases after each game.
result table from learning
Result table from learning
  • The first 5 columns = dealer’s cards
  • next 5 columns = the player’s cards
  • Card sorted in ascending order
  • Column 11 = the winning probability of each state
  • Column 12 & 13 = action taken by the player
  • Action [1 0] -> “hit”
  • [0 1] -> “stand” and [1 1] -> end state

2.0000 5.0000 0 0 0 6.0000 6.0000 0 0 0 0.3700 1.0000 0

2.0000 5.0000 0 0 0 4.0000 6.0000 6.0000 0 0 0.2500 1.0000 0

2.0000 5.0000 10.0000 0 0 4.0000 6.0000 6.0000 7.0000 0 0 1.0000 1.0000

mlp configurations
MLP Configurations
  • Normalization in feature vectors, and scaled to range of -5 to 5.
  • Max. Training Epochs: 1000, epoch size = 64
  • Activation function (hidden layer)=hyperbolic tangent
  • Activation function (output layer) = sigmoidal
  • MLP1: α = 0.1, µ = 0, MLP Config 4-10-10-10-2. 89.5%.
  • MLP2:α = 0.1, µ = 0.8, MLP Config 5-10-10-10-2. 91.1%.
  • MLP3: α = 0.8, µ = 0, MLP Config 5-10-10-10-2. 92.5%.
  • MLP4: α = 0.1, µ = 0, MLP Config 6-12-12-12-2. 90.2%.
experiment results
Experiment Results

When dealer uses 17-point rule:

When player uses random moves:

When both dealer and player use MLP:

conclusion
Conclusion
  • MLP network works best for highly random and dynamic games, where the game rules and the strategies are hard to define and the game outputs are hard to predict exactly.
  • Strategies interpreted from Reinforcement Learning - Hit if less than 15, otherwise stand.
  • As the number of game increases, the game strategies will change over time.
future work
Future work
  • Current hand depends on the last hands. Use card memory in Blackjack.
  • Train ANN with a teacher to eliminate duplicate patterns (for example, 4 + 7 = 7 + 4 = 5 + 6 = …) and identify misclassified pattern
  • Train ANN to play against different experts so that it can pick up various game strategies
  • Include game tricks and strategies in a table for the ANN to look up
  • Explore other learning methods