B C D C (5,5) (-10,20) A D (20,-10) (-5,-5)

Rational Games‘Rational’ stable and unique solutions for multiplayer sequential games.Richard ShiffrinMichael LeeShunan Zhangdraft—research in progress

The topic is enormous, overlapping psychology, philosophy, economics, computer science, business, mathematics, logic, political science, and more. • We have not yet been able to discover whether the present research is a rediscovery—let us know if it is • (If it it is old, at least we have rediscovered something interesting).

The first author has given talks for the last decade arguing that rationality is not a normative concept but rather a social consensus of a sufficiently large proportion of humans judged to be sufficiently expert. • One of many examples illustrating this idea is found in multi-agent games, in which normative game theory stipulates decisions that harm all players, when players could all gain by other decisions.

We argue that multiplayer games have rational ‘solutions’, based on players who are selfish and rational but take into account the fact that the other players are are also selfish and rational. • These solutions are not Nash equilibria. Our solutions will find weak and strong Pareto equilibria when they exist (although in most cases these will not exist).

B C D C (5,5) (-10,20) A D (20,-10) (-5,-5) One example is the well known ‘prisoners dilemma’: Players A and B independently decide to cooperate C or defect D. The table gives the payoffs for A and B in that order: Whatever B chooses, A gains by defecting (similarly for B). Hence (-5,-5) is the Nash Equilibrium. But clearly (5,5) is better for both—the Pareto equilibrium. We argue rational players should cooperate (even many presumably not-so-rational human players do).

Some might argue it is rational for A to defect, even if B chooses to be rational and cooperate. But if it is rational to defect, both players know this and both will do so. Doug Hofstadter has called this reasoning ‘super-rationality’. (And there are many other similar ideas; see David Gauthier in Philosophy, and much more). [Of course real players can’t usually assume the other player is rational, making it difficult or impossible to reach a rational solution.]

The difficulties faced by real players aside, it is still useful to develop an algorithm for reaching a jointly rational solution. • Such a solution can serve as a normative baseline against which performance might be measured, and could conceivably serve as a starting point for bargaining.

Similar to PD but less obvious is a two-trial two-player centipede game: A A (0,0) B B (-1,10) (9,9) The first number is the payoff for A and the second is the payoff for B. One might argue B should choose (-1, 10) if given a choice, because B acts Selfishly and prefers 10 to 9. But if this is indeed rational A will know this and will selfishly choose (0,0). This is rational for neither because it is dominated by (9,9). Thus (9,9) is the rational game solution. What prevents B from ‘defecting’ if B is given the choice? If defection is rational, then A knows this and will choose (0,0).

Even before presenting an algorithm to find a rational solution, one must do one’s best to establish the rationality of our premises, a highly controversial matter. Covering every aspect of this matter could easily merit a long book. • Instead we present one argument that is sometimes effective.

Suppose there are N players, each playing one trial of the centipede game against him or herself. They do this under the drug Midazolam that leaves reasoning intact but prevents memory of what decision had been made at the first decision point. Each player is told to adopt a Player 1 strategy that will maximize the number of times they get a Player 1 payoff better than the other people when they are Player 1. Similarly each player adopts a Player 2 strategy to maximize the number of times they get a Player 2 payoff that is better than the other people when they are Player 2. • IMPORTANT: When you are Player 2, you must decide what you will do before you know whether you will get the opportunity to play--You must make a conditional decision: If I get to play what will I do? It could well be that you would decide not to play when Player 1, so you might never get a chance to give a Player 2 decision.

When you are player 1, your goal is to maximize your player one result (although of course you should consider that when player 2 you will be trying to maximize your player 2 result). • And similarly when you are player 2. • (Your goal is not to maximize the sum of the two outcomes, though such an outcome is not precluded if it happens).

So what strategies do you choose? When 1, you can get -1, 0, or 9. If you play when Player 1, and also cooperate when Player 2, then you will get 9 and at least tie for best. • However, if you think you will decide to defect when Player 2 you would be best off not playing at all (-1 would tie for worst).

When Player 2 you can get 0,9, or 10. If you get a chance to play you can cooperate and get 9 or defect and get 10. If you cooperate you will lose to those who play at step 1 and defect at step 2. • But which other players would do that? Players who think defection is rational will not play at step 1. Thus how can you lose by cooperating? • You can reason this before playing as Player 1, and hence can confidently play, and be reasonably certain you will later cooperate.

What is it about playing one’s self that makes cooperation at step 2 seem rational? The key is the correlation between the decisions made at both steps. You are able to assume that whatever you decide before play 1 about what you will later decide as Player 2 will in fact come to pass. • The assumption of rational players makes this correlation even stronger—if a set of decisions is rational all rational players will adopt them.

It is of course just this correlation between decisions of multiple players that is ignored when arriving at Nash equilibria, and other seemingly irrational decisions. • Thus when we say all players are assumed rational (not very satisfactory if we have not defined rationality), we are more importantly saying that the decisions of the players are positively correlated.

There are many reasons why decisions could and should be correlated—social norms, (playing one’s self), group consensus, reliance on experts, and so on. • Interestingly, if the expert community defines rationality in a way that makes defection rational, then assuming the opponents to be rational would lead to defection, not cooperation. Perhaps this occurred in the early days of game theory.

Many other arguments can be given for (and possibly against) the rationality of ‘selfish’ cooperation, but for the rest of this presentation, we assume such rationality and see where it leads us.

We would like to find an algorithm for finding a general rational solution for multiplayer decision games. • One caveat: • Let us start by letting all decisions be sequential, not simultaneous. Why? Simultaneous decisions can lead to ambiguous decision making-- there are examples for which there is no unique rational solution. (Examples omitted). Thus we begin with sequential decisions, for which there is always a solution for the case of two players.

So consider sequential decision games. These can be written as tree structures, as will be illustrated by examples. A B B B A A A A A A A etc

Each player takes turns making choices, sequentially, in specified and known order. The game can be written as a tree, with each terminal node j giving numerical outcomes, (vj1, vj2, …, vjn) for players 1, 2, .., n. • We will assume that all outcomes at terminal nodes for a given player differ from each other (no ties among any outcomes for a given player). • Thus every decision path is uniquely identified by the joint payoff outcome at the termination of the decision path, sometimes termed a ‘solution’.

There are two basic games: • Quantitative games: • All players are assumed to know the utility of all outcomes for all players (perhaps equal), so outcomes can be compared quantitatively (this assumption is of course unlikely). Many unsolved problems exist in this case, including the role of ‘threats’. Thus we consider only: • Ordinal (Qualitative) games: • Each player is assumed to prefer more rather than less, but quantitative differences are otherwise irrelevant. Thus the outcomes for any player only matter ordinally, and quantitative comparisons of outcomes across players are irrelevant. E.g. one player’s outcomes can be exponentiated and nothing would change.

Some player principles: • Each player is ‘selfish’, always maximizing individual gain, in the certain knowledge that other players have the same goal, and that all players are fully ‘rational’ in their own self interest. • The players do not have prior agreements. They know nothing about the other players other than all players are selfishly rational. (But all know that other’s utilities are unknown, so only ordinality matters).

Under these assumptions it is clear that one type of rational solution is the Pareto maximum: All players prefer outcome (vj1, vj2,…vjn) to (vk1, vk2, ….vkn) if vj1>vk1, vj2>vk2,…., vjn>vkn. • [Although this Pareto criterion sounds trivial but is not, as we saw in the examples of ‘prisoner’s dilemma, and the ‘centipede game’.] • We will see that there are many other rational solutions that are not Pareto optimal (not all players gain).

Some other examples help set the scene. • Let there be only two players, A and B. In the figures and examples to follow, let the first listed outcome at a terminal node be A’s payoff, and the second listed be B’s payoff.

A A (15,5) (10,10) A B A B Here A will choose (15,5). A has control and is selfish.

A A (10,10) B B (7,7) (15,5) Here A will choose the left branch, getting 10, because A knows B will maximize her gain by choosing (7,7). A prefers 10 to 7.

A A B B B B (7,7) (3,15) (10,10) (5,12) A controls the first choice and B controls the second choice. (5,12) is the rational joint choice: A knows B will choose selfishly and therefore chooses the left branch, giving A 5 rather than 3. A would only choose the right branch if B would make a choice giving A more than 5. But (7,7) gives B less than 15, so B would not do so.

But there are of course cases where cooperation is rational. We have discussed the prisoners dilemma and the centipede game.

A A A (6,1) B B B B (7,7) (3,15) (10,10) (5,12) If B chooses selfishly, then A would choose between (5,12), (3,15) and (6,1). Of these choices A would choose (6,1). However, both (10,10) and (7,7) are better for both players. Since (10,10) dominates, this is the rational solution.

What we see here is the player with a choice at a node in the decision tree choosing selfishly among the ‘rational’ solutions for the subtrees at that node. • This provides a ‘provisional’ solution and better alternatives are those that jointly benefit both players compared with the provisional solution.

A potential problem arises when there is a choice of better alternatives.

A A A (6,1) B B B B (11,9) (3,15) (10,10) (5,12) Now the choices dominating (6,1) are (10,10), preferred by B, and (11,9), preferred by A. One might argue that B should win, because B could ‘threaten’ to choose (3,12) if sent down the middle path by A. The problem with ‘threats’ of this sort is that they generally lead to cycles of threats, no solution, or a poor solution.

A A A (6,1) B B B B A (11,9) (3,15) (5,12) Now B could threaten (3,15) if sent down the middle path, but if A goes left at the first choice, and B goes left, then A could threaten to choose (12,-1). Such threats would cycle endlessly. In large decision trees, there will almost always be such competing ‘threats’ and hence no rational solution. (10,10) (12,-1)

One could argue that two rational players should know that solutions shouldbe restricted to those alternatives that jointly benefit both relative to the provisional selfish solution. If so then one can prune the decision tree and re-run the algorithm to find the rational solution.

Meta-reasoning provides another basis for reaching this answer: Rational players would want a decision strategy that would give them the best chance of reaching a solution benefitting them, and would want an approach that would do so in as many decision trees as possible.

In all two player games, the ‘successive pruning’ strategy will converge on a unique rational solution. • Note: Because it limits consideration to jointly better alternatives, it does not allow the types of ‘threats’ described in earlier slides.

The method is simple: • We start at the terminal nodes and work upwards, establishing rational solutions at each higher node in turn. • At a given node in this process, we establish a provisional solution based on a selfish choice among that player’s choices, where each such choice is the rational solution already established for each choice node.

1. Identify all alternatives in the subtree that give both players more than the provisional solution. 2. Prune the decision tree to contain just these alternatives. 3. Iterate the general algorithm on the pruned tree, from the terminal nodes up to the root of the subtree. 4. Continue until convergence.

This method establishes a rational solution for a given node in the decision tree. • We then move up a level in the tree. We first establish rational solutions for all subtrees below the new node. The new node has a provisional solution determined by a selfish choice among these subtree solutions, and the whole process iterates.

In the pruned tree (below) A chooses (11,9) over (10,10) and that is the answer. A A A (6,1) B B B B (11,9) (3,15) (10,10) (5,12) A The choice (11,9) becomes the new provisional solution, but no alternative is left that dominates this so that is the final answer. (6,1) B B (10,10) (11,9)

A (6,1) A B B A A A (6,1) (11,9) (10,10) B B B B A (11,9) (3,15) Here (11,9) wins. But if we changed (12,-1) to (12,2) then (12,2) would win: (5,12) (10,10) (12,-1) A (6,1) B B A (11,9) (10,10) (12,2)

These examples and the proposed algorithm make it clear how to find a rational solution in games with two players. • We are working to formalize these arguments and the resultant algorithm, but it should be clear how it works.

We have assumed ordinality. • Suppose all utilities of all players are known to all players (unreasonable, but let us accept this conditionally for now). • In one sense, nothing changes—we still have an ordering of preferences for each player (according to each player’s utilities), so one could argue the same rational solutions would appear.

Is it rational for all players to make a common assumption about utilities, an assumption that would allow a rational solution to be found? • For example, the players could assume that the utilities for all players are equal. • But more is needed, because the exact common utility function would be needed. Although it is known that utilities are non-linear, perhaps an assumption of linearity could nonetheless be assumed?

This chain of assumptions is difficult to defend, but even it such assumptions were rational, quantitative games raise the question of ‘threats’.

The possibility of ‘threats’ can appear: All players can imagine that a given player will accept a small loss to punish a sufficiently greedy other player. o A A The rational solution is (500,6): A goes right knowing B will then choose the larger payoff. But if utilities are known, B might think to accept a dollar loss and choose (0,5). If so, A would lose the 100 available from the left branch. This loss is high enough to ‘force’ A to choose (100,100). B (and A) reason that B’s dollar loss of one unit at the second node is outweighed by A’s loss of 100. A prefers 100 to 0 more than B prefers 6 to 5. If threats are ‘agreed’ to be rational then both players would know this and it might become rational for A to choose the left branch. (100,100) o B B (0,5) (500,6)

This kind of reasoning may indeed be used by humans, but whether it is rational remains an open question. Even if rational, the assumption that all utilities are communally known is very unlikely, if not impossible. If utilities are not known exactly, then we are thrown into some sort of Bayesian-like situation in which a prior over utilities must be guessed, and regardless of what sort of decision metric is assumed, guessing insures that a fixed normative rational solution will not exist.

Empirical studies of human decision making: • Humans attempting to decide rationally will likely be affected by quantitative differences, especially when these become large. • Thus, the normative ordinal solution we have introduced will act best as a baseline for human performance if a human study uses a design that makes payoffs as ordinal as possible. • Perhaps payoffs for both payers could be made linear, placed on a common scale, and shrunk to a small range: • E.g. Each player could see payoffs such as 10.1, 10.2, 10.3 etc.

Extensions to Multiple Players (>2) • The two player algorithm provides ideas that are useful when considering the much more complex case of three or more players. • Unfortunately, when there are three of more players there are cases without rational solutions.

Even when solutions exist, extensions to three+ players in a general decision tree setting is very difficult because there is a super-exponential explosion of possibilities. • Other complexities are due to the fact that different subgroups of players can have control and influence at different points in the tree.

B C D C (5,5) (-10,20) A D (20,-10) (-5,-5)

B C D C (5,5) (-10,20) A D (20,-10) (-5,-5)

Presentation Transcript