Multiagent interactions
Download
1 / 30

Multiagent Interactions - PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on

Multiagent Interactions. A game theoretical approach. Matteo Venanzi [email protected] Outline. Utilities and preferences Multiagent encounters Solution concepts The Prisoner’s Dilemma IPD in a social network. The spheres of influence. Utilities and preferences.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Multiagent Interactions' - larya


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Multiagent interactions

Multiagent Interactions

A game theoretical approach

[email protected]


Outline
Outline

  • Utilities and preferences

  • Multiagent encounters

    • Solution concepts

  • The Prisoner’s Dilemma

    • IPD in a social network



Utilities and preferences
Utilities and preferences

  • Self-interested agents:Each agent hasits own preferences and desires over the states of the world(non-cooperative game theory).

  • Modelling preferences:

    • Outcomes (states of the world):

    • Ω = {w1, w2, ...}

    • Utility function:

    • ui: Ω → R


Preference ordering
Preference ordering

Utility functions lead to preference orderings over outcomes

  • Preference over w

  • Strict preference over w

  • Properties

    • Reflexivity

    • Transitivity

    • Comparability


Multiagent encounter
Multiagent Encounter

  • Interaction as a game:the states of the world can be seen as the outcomes of a game.

    • Assume we have just two agents (players) Ag={i,j},

    • The final outcome in Ω depends on the combination of actions selected by each agent.

  • State transformer function:


Multiagent encounter1
Multiagent Encounter

  • Normal-form game (or strategic-form game):

    A strategic interaction is familiarly represented in game theory as a tuple(N, A, u), where:

    • Nis (finite) the set of player.

    • A = A1 x A2 x ... x Anwhere Ai is the set of actions available to player i

    • U = (u1, u2, ... , un) utility functions of each player

  • Payoffmatrix:

    n-dimentionalmatrixwithcellsenclosing the valuesof the utility functionsforeach player


Multiagent encounter2
Multiagent Encounter

  • Rational agent’s strategy:

  • Given a game scenario, howa rationalagentwillact?

  • Apparently: “just” maximize the expected payoff (single-agent point of view).

    • In most cases unfeasible because the individual best strategy depends on the choices of others (multi-agent point of view)

  • In practice: answer in solution concepts:

    • Dominant strategies

    • Pareto optimality

    • Nash equilibrium


Solution concepts
Solution Concepts

  • Best response: given the player j’s strategy sj, the player i’s best response to sj is the strategy si that gives the highest payoff for player i.

Example:

j plays C -----> i’s best response = D

j plays D -----> i’s best response = C


Solution concepts1
Solution Concepts

  • Dominant strategy: A strategy si* is dominant for player iifno matter what strategy sjagent j chooses, i will do at least as well playing si* as it would doing anything else.(si* is dominant if it is the best response to all of agent j’s strategies.)

  • Example:

    • - D is the dominant strategy for player j

    • There are not dominant strategies for player i

  • Dominated strategy can be removed from the table


Solution concepts2
Solution Concepts

  • Dominant strategy: A strategy si* is dominant for player iifno matter what strategy sjagent j chooses, i will do at least as well playing si* as it would doing anything else.(si* is dominant if it is the best response to all of agent j’s strategies.)

  • Example:

    • - D is the dominant strategy for player j

    • There are not dominant strategies for player i

  • Dominated strategy can be removed from the table


Solution concepts3
Solution Concepts

  • Dominant strategy: A strategy si* is dominant for player iifno matter what strategy sjagent j chooses, i will do at least as well playing si* as it would doing anything else.(si* is dominant if it is the best response to all of agent j’s strategies.)

  • Example:

    • - D is the dominant strategy for player j

    • There are not dominant strategies for player i

  • Dominated strategy can be removed from the table


Solution concepts4
Solution Concepts

  • Pareto optimality (or Pareto efficiency): An outcome for the game is said to be Pareto optimal (or Pareto efficient) if there is no other outcome that makes one agent better off without making another agent worse off.

If an outcome w is not Pareto optimal, then there is another outcome w’ that makes everyone as happy, if nothappier, thanw. “Reasonable” agents would agree to move to w’ in this case. (Even if I don’t directly benefit from w’, you can benefit without me suffering.)


Solution concepts5
Solution Concepts

  • Pareto optimality (or Pareto efficiency): An outcome for the game is said to be Pareto optimal (or Pareto efficient) if there is no other outcome that makes one agent better off without making another agent worse off.

If an outcome w is not Pareto optimal, then there is another outcome w’ that makes everyone as happy, if nothappier, thanw. “Reasonable” agents would agree to move to w’ in this case. (Even if I don’t directly benefit from w’, you can benefit without me suffering.)


Solution concepts6
Solution Concepts

  • Pareto optimality (or Pareto efficiency): An outcome for the game is said to be Pareto optimal (or Pareto efficient) if there is no other outcome that makes one agent better off without making another agent worse off.

Not Pareto efficient

Not Pareto efficient

Pareto efficient

Not Pareto efficient

Pareto efficient

Not Pareto efficient

Pareto efficient

Not Pareto efficient

Not Pareto efficient

If an outcome w is not Pareto optimal, then there is another outcome w’ that makes everyone as happy, if nothappier, thanw. “Reasonable” agents would agree to move to w’ in this case. (Even if I don’t directly benefit from w’, you can benefit without me suffering.)

Pareto optimality is a nice proberty but not very useful for selecting strategies


Nash equilibrium
Nash Equilibrium

Nash equilibrium for pure strategies:Two strategies s1 and s2 arein Nash equilibriumif:

  • under the assumption that agent iplays s1, agent j can do no better than play s2;

    AND

  • under the assumption that agent j plays s2, agentican do no better than play s1.

  • Neither agent has any incentive to deviate from a Nash equilibrium.

  • Nash equilibrium represents the “Rational” outcome of a game played by self-interested agents.

    Unfortunately:

    • Not every interaction scenario has a Nash equilibrium.

    • Some interaction scenarios have more than one Nash equilibrium.

      But:

      Nash’s Theorem (1951): Every game with a finite number of players and a finite set of possible strategies has at least one Nash equilibrium in mixed strategies.

      However:

      “The complexity of finding a Nash equilibrium is the most important concrete open question on the boundary of P today”

      [Papadimitriou, 2001]

  • John Forbes Nash, Jr.

    Bluefield. 1928

    Nobel Prize for

    Economics, 1994

    “A beautiful mind”


    Finding nash equilibria
    Finding Nash Equilibria

    • Battle of the sexes game:husband and wifewishto go to the movies and they are undecidedbetween “lethalWeapon (LW)” and “Wondrous Love (WL)”. Theymuchpreferto go togetherratherthan separate to the movie, altought the wifeprefers LW and the husbandprefers (WL)”.

    Easy way to find pure-strategy Nash equilibria in a payoff matrix:

    Take the cell where the first number is the maximum of the column and check if the second number is the maximum of that row.


    Finding nash equilibria1
    Finding Nash Equilibria

    • Battle of the sexes game:husband and wifewishto go to the movies and they are undecidedbetween “lethalWeapon (LW)” and “Wondrous Love (WL)”. Theymuchpreferto go togetherratherthan separate to the movie, altought the wifeprefers LW and the husbandprefers (WL)”.

    Easy way to find pure-strategy Nash equilibria in a payoff matrix:

    Take the cell where the first number is the maximum of the column and check if the second number is the maximum of that row.


    Finding nash equilibria2
    Finding Nash Equilibria

    • Max-min strategy: the maxmin strategy for player i is the strategy si that maximizes the i’s payoff in the worst case. (Minmax is the dual strategy)

      Example: (zero-sum game: the sum of the utilities is always zero)

    • From the player 1’s point of view:

      • - If he plays A, the min payoff is 2.

      • - If he plays B, the min payoff is 0.

      • Player 1 chooses A.

    • With the same reasoning:

      • player 2 chooses A too.

    • The “maxmin solution” is (A, A).

    • It is also the (only) Nash equilibrium of the game. In fact:

      • Minmax Theorem ( von Neuman, 1928 ) : In any finite, two-players, zero-sum game maxmin strategies, minmax strategies and Nash equilibria coincide.


    The prisoner s dilemma
    The Prisoner’s Dilemma

    Two men are collectively charged with a crime and held in separate cells, with no way of meeting or communicating. They are toldthat:

    • if one confesses and the other does not, the confessor will be freed, and the other will be jailedforfiveyears;

    • if both confess, then each will be jailed for two years.

    • Both prisoners know that if neither confesses, then they will each be jailed for one year.

  • Well-known (and ubiquitous) puzzle in game theory.

  • The fundamental paradox of multi-agent interactions.

    Payoff matrix for prisoner’s dilemma:


  • The prisoner s dilemma1
    The Prisoner’s Dilemma

    (Confession = defection, No confession = cooperation)

    • Solution concepts:

      • D is a dominant strategy.

      • (D, D) is the only Nash equilibrium.

      • All outcomes except (D;D) are Pareto optimal.

    • Game theorist’s conclusions:

      • In the worst case Defection guarantees a highest payoff so Defection is for both players the rational move to make.

        • Both agents defects and get payoff = 2

    • Why dilemma..?

      • They both know they could make better off by both cooperating and each get payoff =3.

        • Cooperation is too dangerous!!

        • In the one-shot prisoner’s dilemma the individual rational choice is defection


    Iterated prisoner s dilemma
    Iterated Prisoner’s Dilemma

    • Observation: If the PD is repeated a number of time (so, I will meet my opponent again) then the incentive to Defect appears to decrease and Cooperation can finally emerge (Iterated Prisoner’s Dilemma).

    • Axelrod in 1985 organized two computer tournament, inviting participants to submit their strategies to play the IPD.

      • The length of the game was unknown to the players.

      • 15 strategies submitted.

      • Best strategies:

        • SPITEFUL: Cooperate until receive a defection, then always defect

        • MISTRUST: Defect, then play the last opponent’s move.

        • TIT FOR TAT: Cooperate on 1st round, then play what the opponent played on the previous round.

      • TIT_FOR_TAT won both the tournaments!!!(Stilltoday, nobodywasableto “clearly” defeat TFT)


    Axelrod s results
    Axelrod’s Results

    • Even though TFT was the best strategy in the Axelrod’s tournament, it doesn’t mean it the Optimal Strategy for the IPD. In fact:

      • Axelrod’s Theorem: If the game is long enough and the players do care about their future encounters, a universal optimal strategy for the Iterated Prisoner’s Dilemma doesnotexist.

    • Axelrod also suggests the following rules for a successful strategy for the IPD:

      • Don’t beenvious: Remember the PD is not a zero-sum game. Don’t play as you would strike your opponent!

      • Be nice: Never be the first to defect.

      • Retaliateappropriately: Theremustbe a propermeasure in rewardingcooperation and punishingdefection.

      • Don’t be too clever: The performance of a strategy are not necessarily related to its complexity.


    Ipd in a social network
    IPD in a Social Network

    • Imagine agents playing the IPD, arranged as a graph where the edges represents rating relationships.

    • The agent can use the knowledge and experience of his neighbours for retrieving ratings about his opponent and inferring his reputation.

    • Rating propagation:Mechanism to propagate ratings from one node to another within the network.

    • possible solution: the propagated rating the product of the weights on the edges along the path connecting the two players. In case of Multiple Path, take the one with the highest weight product.

  • Strategy: Cooperate with probability Pmax


  • Ipd in a social network1
    IPD in a Social Network

    Random Network

    • TrustRate-Agents get all around the same score

    • The ranking keeps the features of the Axelrod’s Tournament


    Ipd in a social network2
    IPD in a Social Network

    Dense Graph

    • Increases the number of ratings.

    • There is always a nice path between any couple of players.


    Ipd in a social network3
    IPD in a Social Network

    Get new agents into the game in progress

    Agent 11 inserted at round 55/100

    • The network helps the new player to immediately recognize the opponents with who cooperate or to avoid.

    • Agent 11 scores the same points than the starting players.


    Ipd in a social network4
    IPD in a Social Network

    • Conclusions:

      • Acquaintance’s network allows to promote cooperation in the IPD with a distributed approach.

        • No data overload on a single agent.

        • Information spread through the whole network.

      • Incoming agents are immediately in conditions to get into the logics of cooperation.

        • If surrounded by some nice neighbourhood, the agent is able to immediately distinguish cooperative players from defective players, even without any prior experience.



    References
    References

    • An introduction to Multiagent Systems(2nd edition).M. Wooldridge, John Wiley, 2009. [Cap. 11]

    • Multiagent Systems. Algorithmic, Game-Theoretic, and Logical Foundations.Y. Shoham, K. Leyton-Brown, Cambridge, 2009. [Cap. 3](Free download at http://www.masfoundations.org)

    • Trust, Kinship and Locality in the Iterated Prisoner’s Dilemma.M. Venanzi. MSc Thesis. [Cap. 5](PDF available here)


    ad