1 / 13

Building Agents for the Lemonade Game Using a Cognitive Hierarchy Population Model

Building Agents for the Lemonade Game Using a Cognitive Hierarchy Population Model. Michael Wunder Michael Kaisers Michael Littman John Yaros. Overview of Method. In the Lemonade-Stand Game (LG), players are rewarded for finding a partner quickly, to avoid becoming the odd man out

lane
Download Presentation

Building Agents for the Lemonade Game Using a Cognitive Hierarchy Population Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Agents for the Lemonade Game Using a Cognitive Hierarchy Population Model Michael Wunder Michael Kaisers Michael Littman John Yaros

  2. Overview of Method • In the Lemonade-Stand Game (LG), players are rewarded for finding a partner quickly, to avoid becoming the odd man out • As a result, complicated prediction-optimization learners are at a disadvantage • Utilizing heuristics, an agent can identify (and attract) potential partners • Population-based models are useful to determine the best heuristics in the LG

  3. Example: p-beauty contest • Keynes proposed that the stock market is like a beauty contest where judges are trying to guess the contestant (or stock, or strategy) that others like • n players submit a number x between 0 and 100, and the winner is closest to a fraction of the average guess, p*(∑i xi)/n, • p is fraction between 0 and 1, i.e. 2/3

  4. P-beauty game explained • The Nash strategy is to play 0 because it cannot be outplayed • However, first-time players do not reach this outcome…why? from Behavioral Game Theory by Colin Camerer

  5. How a Cognitive Hierarchy Works Level k: reacts to Level k-1 Poor predictable Bart, always picks Rock. … Level 1: reacts to the base strategy at Level 0 only Good ol’ Rock. Homer can’t beat that. Good ol’ Rock. Nothing beats that. Level 0: no reasoning, random action or simple rule

  6. Population-based Reasoning • Steps of the CH technique: • Identify base strategies (random, static) • Derive processes for steps of reasoning • A step of reasoning, in this case, is the strategy that can exploit the one before • Recursively apply steps to each level k • These levels form the “hierarchy” according to some distribution f(k) • Select a strategy that does well against desired population

  7. Lemonade-Stand Game Levels • LG yields elegant level heuristics • L0-U: Uniformly random action • L0-C: Constant action • L0-X: Constant with probability X, otherwise choose randomly • L1: Move Across from most most stable player (with highest X). Also Optimal against L1. This move is Cooperative equilibrium.

  8. Lemonade Game Levels, Cont’d. • L2: Stay Constant for at least one turn, in case opponents are two L1s. If the current location is disadvantageous, move somewhere else, perhaps Across from a good partner. • L3: With other L3, “Sandwich” a constant or L2 player, and become Across from each other if it moves. • Can we classify contestants by level?

  9. Actual Competition Results • Using idealized agents from each of these levels, find the score of each contestant against populations of adjacent levels

  10. Actual Competition Results • The x-axis is composed of a ratio of the nearby levels—Level 1.2 is a population of 80% L1 and 20% Level 2

  11. Actual Competition Results • This population construction method allows for clear distinctions between levels, but other possibilities exist

  12. Mock Competition of Levels

  13. Conclusion • Our agent (RL3) contains elements of all three levels, which is not optimal against this population of competitors • The model that emerges from LG does predict the outcome fairly well • The model predicts that subsequent repetitions would generally move the population “up” the hierarchy • CH has implications for larger games (e.g. TAC)

More Related