joint strategy fictitious play l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Joint Strategy Fictitious Play PowerPoint Presentation
Download Presentation
Joint Strategy Fictitious Play

Loading in 2 Seconds...

play fullscreen
1 / 36

Joint Strategy Fictitious Play - PowerPoint PPT Presentation


  • 99 Views
  • Uploaded on

Joint Strategy Fictitious Play. Sherwin Doroudi. “Adapted” from. J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for potential games,” in Proceedings of the 44th IEEE Conference on Decision and Control , December 2005, pp. 6692-6697. Review: Game. Players:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Joint Strategy Fictitious Play


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
adapted from
“Adapted” from

J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for potential games,” in Proceedings of the 44th IEEE Conference on Decision and Control, December 2005, pp. 6692-6697.

review game
Review: Game
  • Players:
  • Actions:
  • Payoffs:
review game4
Review: Game

We then play the game repeatedly in “stages,” starting at stage 0. Players can use learning algorithms as discussed in lecture. Note that players know the structural form of their own payoff function, but do not know the form of the other players’ payoff functions.

notation actions
Notation: Actions

As in the lecture, we use the notation

review regret matching
Review: Regret Matching
  • Guaranteed to converge to a Coarse Correlated Equilibrium (CCE) in all games (Hart & Mas-Colell, 2000).
  • But CCE can be quite bad in some cases, as they are a superset of Nash Equilibria (NE).
review fictitious play fp
Review: Fictitious Play (FP)
  • Observe empirical frequencies of every player’s action
  • Consider best response(s) under the (incorrect) assumption that other players play according to their empirical frequencies
  • Randomly choose a best response and act accordingly
empirical frequency in fp
Empirical Frequency in FP

The empirical frequency for a player and an action is the percentage of stages that the player chose that action up to the previous stage:

empirical frequency in fp9
Empirical Frequency in FP

Each player also has an empirical frequency vector.

best response in fp
Best Response in FP

Each player assumes an expected payoff

And each player chooses a best response from the set

the good news
The Good News!

“The empirical frequencies generated by FP converge to a Nash equilibrium in potential games” (Monderer & Shapley, 1996).

the bad news if any
The Bad News (if any)?

What are some weaknesses of FP?

a routing example
A Routing Example
  • Consider a routing game with 100 players all with the same source and sink
  • There are 4 roads from the source to the sink
  • Players want to minimize their cost.
a routing example14
A Routing Example
  • The cost of traveling on each road is given by a quadratic cost function with positive coefficients (could be randomly generated) depending on the number of players choosing that road
  • Can we use FP as a learning algorithm in this example?
a routing example15
A Routing Example

Formalizing the game, we have

a routing example16
A Routing Example

Remember this?

a routing example17
A Routing Example

Remember this?

The sum above is over 4^99=2^198 terms!

a routing example18
A Routing Example

Remember this?

This is not computationally feasible!

The sum above is over 4^99=2^198 terms!

what do we do
What do we do?

The routing example (which is fairly realistic) is motivation that we either need to find a more effective way to compute this utility or we need to develop an algorithm that is computationally suitable for “large” games.

joint strategy fictitious play jsfp
Joint Strategy Fictitious Play (JSFP)
  • Observe empirical frequencies of joint actions
  • Consider best response(s) under the (still incorrect) assumption that all other players act collectively as a group according to their joint empirical frequency
  • Randomly choose a best response and act accordingly
does fp jsfp
Does FP=JSFP?
  • In the case of two players it is easy to see that FP and JSFP are the same.
does fp jsfp22
Does FP=JSFP?
  • In the case of two players it is easy to see that FP and JSFP are the same
  • But in the case of three or more players this is not necessarily the case!
empirical frequency in jsfp
Empirical Frequency in JSFP

The empirical frequency for an action profile may be calculated as follows:

expected payoff in jsfp
Expected Payoff in JSFP

Each player assumes an expected payoff

expected payoff in jsfp25
Expected Payoff in JSFP

Each player assumes an expected payoff

But this looks about as bad (maybe worse) than FP!

So what can we do?

expected payoff in jsfp26
Expected Payoff in JSFP

Each player assumes an expected payoff

We rewrite it in a more useful form!

the jsfp payoff recursion
The JSFP Payoff Recursion

So now, we can rewrite the expected payoff as a simple recursion, and at every stage choose a value that maximizes it (our best response)

We are maximizing regret!

convergence properties of jsfp
Convergence Properties of JSFP

The convergence properties of JSFP (for games of three or more players) remain unknown; so this is an open problem. But when a joint action generated by JSFP reaches a strict NE, it will stay there forever. To get convergence properties, we add “inertia” to our learning algorithm.

jsfp with inertia
JSFP with Inertia
  • Assume that all NE are strict
  • JSFP-1: If the action chosen by a player in the previous stage is a best response to the current stage choose that action
  • JSFP-2: Otherwise choose an action according to the distribution
the jsfp 2 distribution
The JSFP-2 Distribution

Here the alpha parameter represents the player’s willingness to optimize at a given stage, while the beta parameter whose support is contained in the set of best responses to this stage, and the v term is a distribution with full support on the action taken in the previous stage.

jsfp w inertia converges
JSFP w/ Inertia Converges!
  • In particular to some Nash Equilibria for generalized ordinal potential games
  • Of course there is no equilibrium selection mechanism
  • And not much is known regarding the convergence rate
  • But we have shown that JSFP w/ Inertia is a good substitute for FP in “large” games
jsfp w inertia converges32
JSFP w/ Inertia Converges!

If you want the proof, read the paper as the proof is not trivial!

the fading memory variant
The Fading Memory Variant

We used the recursion

But we could also use the recursion

Here, rho is a constant or function less than or equal to 1, and it is also proven that this algorithm gives rise to a process converging to some NE.

a routing example revisited
A Routing Example, Revisited
  • We can now apply JSFP w/ Inertia and fading memory to the routing problem, and we should converge to some NE (in generalized ordinal potential games, which includes routing games)
  • Simulations show that JSFP without inertia should also work in this case
  • Try it!
conclusion
Conclusion
  • We have demonstrated some weaknesses of FP (computational demands, observational demands, etc.)
  • We have developed JSFP, which seems to accommodate computational limitations