Randomized strategies and temporal difference learning in poker
Download
1 / 22

Randomized Strategies and Temporal Difference Learning in Poker - PowerPoint PPT Presentation


  • 181 Views
  • Uploaded on

Randomized Strategies and Temporal Difference Learning in Poker. Michael Oder April 4, 2002 Advisor: Dr. David Mutchler. Overview. Perfect vs. Imperfect Information Games Poker as Imperfect Information Game Randomization Neural Nets and Temporal Difference Experiments Conclusions

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Randomized Strategies and Temporal Difference Learning in Poker' - adie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Randomized strategies and temporal difference learning in poker

Randomized Strategies and Temporal Difference Learning in Poker

Michael Oder

April 4, 2002

Advisor: Dr. David Mutchler


Overview
Overview Poker

  • Perfect vs. Imperfect Information Games

  • Poker as Imperfect Information Game

  • Randomization

  • Neural Nets and Temporal Difference

  • Experiments

  • Conclusions

  • Ideas for Further Study


Perfect vs imperfect information
Perfect vs. Imperfect Information Poker

  • World-class AI agents exist for many popular games

    • Checkers

    • Chess

    • Othello

  • These are games of perfect information

  • All relevant information is available to each player

  • Good understanding of imperfect information games would be a breakthrough


Poker as an imperfect information game
Poker as an Imperfect Information Game Poker

  • Other players’ hands affect how much will be won or lost.However, each player is not aware of this vital information.

  • Non-deterministic aspects as well


Enter loki
Enter Loki Poker

  • One of the most successful computer poker players created

  • Produced at University of Alberta by Jonathan Schaeffer et al

  • Employs randomized strategy

    • Makes player less predictable

    • Allows for bluffing


Probability triples
Probability Triples Poker

  • At any point in a poker game, player has 3 choices

    • Bet/Raise

    • Check/Call

    • Fold

  • Assign a probability to each possible move

  • Single move is now a probability triple

  • Problem: Associate payoff with hand, betting history, and triple (move selected)


Neural nets
Neural Nets Poker

  • One promising way to learn such functions is with a neural network

  • Neural Networks consist of connected neurons

  • Each connection has a weight

  • Input game state, output a prediction of payoff

  • Train by modifying weights

  • Weights are modified by an amount proportional to learning rate


Neural net example
Neural Net Example Poker

hand

P(2)

P(1)

P(-1)

P(-2)

history

triple


Temporal difference
Temporal Difference Poker

  • Most common way to train multiple layer neural net is with backpropagation

  • Relies on simple input-output pairs.

  • Problem: need to know correct answer right away in order to train nets

  • Solution: Temporal Difference (TD) learning.

  • TD(λ) algorithm developed by Richard Sutton


Temporal difference cont d
Temporal Difference (cont’d) Poker

  • Trains responses over the course of a game over many time steps

  • Tries to make each prediction closer to the prediction in the next time step

P1 P2 P3 P4 P5


University of mauritius group
University of Mauritius Group Poker

  • TD Poker program produced by group supervised by Dr. Mutchler

  • Provides environment for playing poker variants and testing agents


Simple poker game
Simple Poker Game Poker

  • Experiments were conducted on extremely simple variant of Poker

  • Deck consists of 2, 3, and 4 of Hearts

  • Each player gets one card

  • One round of betting

  • Player with highest card wins the pot

  • Goal: Get the net to produce accurate payoff values as outputs


Early results
Early Results Poker

  • Started by pitting a neural net player against a random one

  • Results were inconsistant

  • Problem: Innappropriate value for learning rate

  • Too low: Outputs never approach true payoffs

  • Too high: Outputs fluctuate between too high and too low


Experiment set i
Experiment Set I Poker

  • Conjecture: Learning should occur with very small learning rate over many games

  • Learning Rate = 0.01

  • Train for 50,000 games

  • Only set to train when card is a 4

  • First player always bets, second player tested

  • Two Choices

    • call 80%, fold 20% -> avg. payoff = 1.4

    • call 20%, fold 80% -> avg. payoff = -0.4

  • Want payoffs to settle in on average values


Results
Results Poker

  • 3 out of 10 trials came within 0.1 of the correct result for the highest payoff

  • 2 out of 10 trials came within 0.1 of the correct result for the lowest payoff

  • None of the trials came within 0.1 of the correct result for both

  • The results were in the correct order in only half of the trials


More distributions
More Distributions Poker

  • Repeated experiment with six choices instead of two

    • call 100% -> avg. payoff = 2.0

    • call 80%, fold 20% -> avg. payoff = 1.4

    • call 60%, fold 40% -> avg. payoff = 0.8

    • call 40%, fold 60% -> avg. payoff = 0.2

    • call 20% fold 80% -> avg. payoff = -0.4

    • fold 100% -> avg. payoff = -1.0

  • Using more distributions did help the program learn to order value of the distributions correctly

  • All six distributions were ranked correctly 7 out of 10 times (0.14% chance for any one trial)


Output encoding
Output Encoding Poker

  • Distributions are ranked correctly, but many output values are still inaccurate.

  • Seems to be largely caused by the encoding of outputs

  • Network has four outputs, each representing probability of a specific payoff

  • This encoding is not expandable, and four outputs must all be correct for good payoff prediction.


Relative payoff encoding
Relative Payoff Encoding Poker

  • Replace four outputs with single number

  • The number represents the payoff relative to highest payoff possibleP = 0.5 + (winnings/total possible)

  • Total possible winnings determined at beginning of game (sum of other players’ holdings)

  • Repeated previous experiments using this encoding


Results experiment set 2
Results (Experiment Set 2) Poker

  • Payoff predictions were generally more accurate using this encoding

  • 5 out of 10 trials got exact payoff (0.502) for best distribution choice with six choices available

  • Most trials had very close value for payoff associated with one of the distributions

  • However, no trial was significantly close on multiple probability distributions


Observations conclusions
Observations/Conclusions Poker

  • Neural Net player can learn strategies based on probability

  • Payoff is successfully learned as a function of betting action

  • Consistency is still a problem

  • Trouble learning correct payoffs for more than one distribution


Further study
Further Study Poker

  • Issues of expandability

    • Coding for multiple-round history

    • Can previous learning be extended?

  • Variable learning rate

  • Study distribution choices

  • Sample some bad distribution choices

  • Test against a variety of other players


Questions

Questions? Poker


ad