1 / 6

Game Theory-Based Opponent Modeling in Large Imperfect-Information Games

Game Theory-Based Opponent Modeling in Large Imperfect-Information Games. Tuomas Sandholm Carnegie Mellon University Computer Science Department Joint work with Sam Ganzfried. Traditionally two approaches. Game theory approach ( abstraction+equilibrium finding)

ophrah
Download Presentation

Game Theory-Based Opponent Modeling in Large Imperfect-Information Games

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Game Theory-Based Opponent Modeling in Large Imperfect-Information Games Tuomas Sandholm Carnegie Mellon University Computer Science Department Joint work with Sam Ganzfried

  2. Traditionally two approaches • Game theory approach (abstraction+equilibrium finding) • Safe in 2-person 0-sum games • Doesn’t maximally exploit weaknesses in opponent(s) • Opponent modeling • Get-taught-and-exploited problem [Sandholm AIJ-07] • Needs prohibitively many repetitions to learn in large games (loses too much during learning) • Crushed by game theory approach in Texas Hold’em…even with just 2 players and limit betting • Same tends to be true of no-regret learning algorithms

  3. Let’s hybridize the two approaches • Start playing based on game theory approach • As we learn opponent(s) deviate from equilibrium, start adjusting our strategy to exploit their weaknesses

  4. The dream of safe exploitation • Wish: Let’s avoid the get-taught-and-exploited problem by exploiting only to an extent that risks what we have won so far • Proposition. It is impossible to exploit to any extent (beyond what the best equilibrium strategy would exploit) while preserving the safety guarantee of equilibrium play • So we give up some on worst-case safety …

  5. Deviation-Based Best Response (DBBR) algorithm(can be generalized to multi-player non-zero-sum) • Many ways to determine opponent’s “best” strategy that is consistent with observations • L1 or L2 distance to equilibrium strategy • Custom weight-shifting algorithm • ... Dirichlet prior

  6. Experiments • Performs significantly better in 2-player Limit Texas Hold’em against trivial opponents, and weak opponents from AAAI computer poker competitions, than game-theory-based base strategy • Can be turned on only against weak opponents • Examples of winrate evolution:

More Related