1 / 36

Evaluation Through Conflict

Evaluation Through Conflict. Martin Zinkevich Yahoo! Inc. http:// martin.zinkevich.org /lemonade. Who was I. Worked with U Alberta Computer Poker Research Group Designed Counterfactual Regret Algorithm Theory behind DIVAT Worked on AAAI Computer Poker Competition

dixon
Download Presentation

Evaluation Through Conflict

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. http://martin.zinkevich.org/lemonade

  2. Who was I • Worked with U Alberta Computer Poker Research Group • Designed Counterfactual Regret Algorithm • Theory behind DIVAT • Worked on AAAI Computer Poker Competition • 2006 as lead programmer, 2007 as chair • Work used in Man Vs Machine

  3. Who am I • Run the Lemonade Stand Game Competition • Work with Yahoo Anti-Abuse Team

  4. AAAI Computer Poker Competition • 5 years running • Now the ANNUAL Computer Poker Competition • Latest-11 universities et al

  5. Competitions:Science vs Entertainment

  6. AAAI Computer Poker Competition May The Best Program Win! And Win Again IF WE PLAYED AGAIN!

  7. Head to Head VS for 1000 hands

  8. Head to Head VS for 1000 hands

  9. All Combinations

  10. OK, But Who Won? • Online: Maximize total winnings • Equilibrium: Maximize number of people I can win money from (or don’t lose against)

  11. Why a New Competition? Computing Equilibria Choosing Equilibria ✓ ?

  12. Bach or Stravinsky

  13. Big Question: How Do (or Would) People Get to Nash Equilibria?

  14. Solvable Games $

  15. Unsolvable Games ∞ ? $

  16. An Old Idea • Think about learning in the presence of other intelligent agents. • Prove cool stuff about your learning algorithm given: • constraints about the adversary • constraints about the game

  17. Solving the Unsolvable • In current competitions, people are often applying techniques that are effective in solvable games, even when the game is not solvable. • In what competitions is it useless to approximate the game as solvable?

  18. Axelrod’s Iterated Prisoner’s Dilemma • A competition between many competitors. • One entry: tit-for-tat (Anatol Rapaport) • Nice (initially) • Retaliating • Forgiving • Non-envious • Learned that cooperation has value, but: • Cooperate with whom? • How do we cooperate?

  19. The Lemonade Stand Game

  20. The Lemonade Stand Game

  21. The Lemonade Stand Game

  22. What Is The Lemonade Stand Game? • Every round for 100 rounds: • each person selects an action privately • then, the actions are revealed • The score of a player is the distance clockwise to the next player plus the distance counterclockwise.

  23. Strategy #1: Play Constant Key Observations Strategy #2: Play Opposite Strategy #3: Sandwich • A constant-sum game between 3 players. • For every gain, someone has to lose. • Possibilities For Cooperation • Opposite sides of the circle, “sandwiching” • Not a “Solvable Game” (Nash, 1951) • Playing equilibrium strategies is not advisable • Easy To Set “Table Image” • The constant strategy often evokes cooperative behavior • Existing Techniques Fail • Experts algorithms lose to constant strategy

  24. Competition Structure • Every set of three players played 100 rounds 180 times (1.5 million rounds total) • Highest Total Score Wins • Mean, Standard Error can be calculated

  25. Competitors • 28 players, 9 teams • University of Southampton/Imperial College London (Soton) • Yahoo! Inc. (Pujara) • Rutgers University (RL3) • Brown University (Brown) • Carnegie Mellon (2 teams-Waugh, ACTR) • University of Michigan (FrozenPontiac) • Princeton University (Schapire) • (Greg Kuhlmann)

  26. Competition Results Score Per Round Competitor

  27. Results Modified Constant Uniformly Random Score Per Round-8 Competitor

  28. Restricting to Top 6 Score Per Round-8 Competitor

  29. Restricting to Top 4

  30. Teach Simply! EQUILIBRIUM FREE =

  31. Learn = = = ?

  32. Learn = = 10 7

  33. The High Level • Phenomenal Intelligence: the observed behavior used by a set of people at a point in time for some task.

  34. Lofty Goals • Phenomenal Intelligence: the observed behaviorused by a set of people at a point in time for some task. • behavior: a fully specified strategy. • used: actually leveraged

  35. Practical Concessions • Phenomenal Intelligence: the observed behavior used by a set of people at a point in time for some task. • Not any intelligent agent • Not any time (people change) • Not any task (context matters)

  36. Thank You http://martin.zinkevich.org/lemonade

More Related