Poker: Opponent Modelling

Poker: Opponent Modelling • Early AI work on poker used simplified variants of poker. • More recently attention has focused mainly on “Limit Texas Hold’em”, in both its “heads-up” form (only two players) and its many-player form (often 10). • Texas Hold’em is a popular form of poker in the USA. • As in all forms of poker, betting is an essential element. • Texas Hold’em offers four opportunities per hand for a round of betting. • In Limit Texas Hold’em there are two sizes of bet increment: • the small bet - say $2 • the large bet - say $4 • In No-Limit Texas Hold’em, players may bet any amount up to the current size of the pot. http://csiweb.ucd.ie/Staff/acater/comp30260.html Artificial Intelligence for Games and Puzzles

The structure of a hand of Texas Hold’em • A hand (if played to the bitter end) proceeds through nine stages: The winner is the player who makes the best 5-card poker hand using a combination of his hole cards and the community cards (the board). http://csiweb.ucd.ie/Staff/acater/comp30260.html Artificial Intelligence for Games and Puzzles

Decisions to be made • In a round of betting, players have to choose one of five actions repeatedly, starting off with the player to dealer’s left and proceeding clockwise: Limit games require no decision about the amount of bets and raises. No-limit games require more complex reasoning because bets may vary in size. http://csiweb.ucd.ie/Staff/acater/comp30260.html Artificial Intelligence for Games and Puzzles

Betting based on probabilities • One rational way to play poker is to use probabilities: • Given your own known hole cards, • and the community cards that are on show, • for each possible combination of community cards yet to appear, • how likely is your hand to be better than any other player’s hand? • Compare this to the pot odds - the ratio • If the comparison is very favourable, bet or raise; • if merely favourable, check or call; • if not, check if possible otherwise fold http://csiweb.ucd.ie/Staff/acater/comp30260.html Artificial Intelligence for Games and Puzzles

Predictability is bad • Basing your behaviour on the probabilities like this is a poor strategy. • Other players will • observe the cards you reveal at the showdowns, • learn about your conservative style of play, • learn about your assessment of winning chances, • interpret your betting behaviour as a clue to the strength of your hand, • and use this to beat you over the course of many hands. • Good poker players • observe the decisions of their opponents and gather what evidence they can • base their decisions on models of their opponents, exploiting any weaknesses they detect • strive to frustrate the formation of accurate models of themselves, by bluffing, and by consciously & deliberately changing their own style http://csiweb.ucd.ie/Staff/acater/comp30260.html Artificial Intelligence for Games and Puzzles

Poker as AI Testbed domain In many other games, there is little to be gained by modelling opponents. Rudimentary models, like “contempt factor”, or no model at all, are common. In poker, modelling opponents - and awareness of their trying to model you - is essential to good play. http://csiweb.ucd.ie/Staff/acater/comp30260.html Artificial Intelligence for Games and Puzzles

Bayesian Network approach to modelling By training over many self-played hands, CPP (conditional probability table) can be built up, Then in real play, knowing all influences upon “opponent action” except “opponent current hand”, can draw conclusions about “opponent current hand”. But CPP at ~200k entries cannot reasonably be modified over a game of ~100 hands. (Boulton) 4 25 25 10 8 http://csiweb.ucd.ie/Staff/acater/comp30260.html Artificial Intelligence for Games and Puzzles

Classification of hands • At the outset, the two hole cards of an opponent player may be any two of the 50 cards you don’t have. 50x49/2=1225 combinations if you enumerate them. • Sufficient to distinguish 169 qualitatively different hands: • 13 possible pairs - AA KK QQ … 22 • 78 pairings of cards of the same suit - AKs, AQs, AJs, A10s, A9s, … 42s, 32s • 78 pairings of cards of different suits - AK, AQ, AJ, … 32 • Collapsing still further, to 25 or so classes, loses some information but facilitates learning of statistics. • Classification of boards and of pot sizes can proceed similarly. http://csiweb.ucd.ie/Staff/acater/comp30260.html Artificial Intelligence for Games and Puzzles

The Loki program • Loki, from Univ.Alberta, used a probabilistic approach, with one initial model (set of weights) for all players, then updating weights for individual players on the basis of their observed actions. • Assess prob. of holding each class of hand, given own cards & board; • Modify prob. estimates in light of each action • e.g. “raise”  increase strong hand probs. & decrease weak hand probs • Adjust weights from estimated hands to better predict observed action • This showed improved performance compared to (i) programs with no modelling and (ii) programs with only static modelling. http://csiweb.ucd.ie/Staff/acater/comp30260.html Artificial Intelligence for Games and Puzzles

Bluffing behaviour • Being able to model others is only part of the solution. Good players find it easy to model opponents who never bluff. • Bluffing purely at random (say 5% of hands) has a problem: in some cases opponents can know for certain you cannot win, avoid bluffing at such a time. • Keeping raising when bluffing is not typical of behaviour when you truly do have a good hand - good opponents will detect the difference. • Follow a plan: proceed as if your chance of losing was say 50% of your true estimate of that chance - this will lead to consistent and realistic behaviour that cannot be easily diagnosed as bluffing. http://csiweb.ucd.ie/Staff/acater/comp30260.html Artificial Intelligence for Games and Puzzles

The Poki program • Poki is a rewrite & enhancement of Loki. • It features a neural-network opponent modelling mechanism, inputs include • estimated hand strength • estimated hand potential • previous action of opponent • position of player clockwise from dealer (first, last, neither) • predictions from “expert predictors” • Opponent modelling is viewed as machine learning: predict opponent’s action • Backpropagation within the neural network • Plug-in “Expert Predictor(s)” (ensemble) may be machine-learning systems too • Poki also features game-tree search, to 5 ply, using “miximax” to handle the problem of imperfect knowledge. http://csiweb.ucd.ie/Staff/acater/comp30260.html Artificial Intelligence for Games and Puzzles

References • www.csse.monash.edu.au/hons/projects/2003/Darren.Boulton/website/ • www.cs.ualberta.ca/~games/poker/ http://csiweb.ucd.ie/Staff/acater/comp30260.html Artificial Intelligence for Games and Puzzles

References • www.csse.monash.edu.au/hons/projects/2003/Darren.Boulton/website/ • www.cs.ualberta.ca/~games/poker/ • Quoted in Aaron Davidson’s 2002 MSc thesis at the U.Alberta site: http://csiweb.ucd.ie/Staff/acater/comp30260.html Artificial Intelligence for Games and Puzzles

Poker: Opponent Modelling