Oblivious equilibrium for stochastic games with concave utility
Download
1 / 12

Oblivious Equilibrium for Stochastic Games with Concave Utility - PowerPoint PPT Presentation


  • 272 Views
  • Uploaded on

Oblivious Equilibrium for Stochastic Games with Concave Utility Sachin Adlakha, Ramesh Johari, Gabriel Weintraub and Andrea Goldsmith DARPA ITMANET Meeting March 5-6, 2009 ACHIEVEMENT DESCRIPTION STATUS QUO IMPACT State of device i State of other devices Action of device i

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Oblivious Equilibrium for Stochastic Games with Concave Utility' - jacob


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Oblivious equilibrium for stochastic games with concave utility l.jpg

Oblivious Equilibrium for Stochastic Games with Concave Utility

Sachin Adlakha, Ramesh Johari, Gabriel Weintraub and Andrea Goldsmith

DARPA ITMANET Meeting

March 5-6, 2009


Slide2 l.jpg

ACHIEVEMENT DESCRIPTION Utility

STATUS QUO

IMPACT

State of device i

State of other devices

Action of device i

NEXT-PHASE GOALS

NEW INSIGHTS

Oblivious equilibrium for stochastic games with concave utilityS. Adlakha, R. Johari, G. Weintraub, A. Goldsmith

MAIN RESULT:

Consider stochastic games per-period utility and state dynamics that are increasing, concave, submodular.

Then in a large system, each node can find approximately optimal policies by treating the state of other nodes as constant.

HOW IT WORKS:

Under our assumptions, no single node is overly influential )we can replace other nodes’ states by their mean.So the optimal policies decouple between nodes.

ASSUMPTIONS AND LIMITATIONS:

This result holds under much more general technical assumptions than our early results on the problem.

A key modeling limitation, however, is that the limit requires all nodes to interact with each other.Thus the results apply only to densenetworks.

  • Our results provide a general framework to study the interaction of multiple devices.

  • Further, our results:

  • unify existing models for which such limits were known

  • and provide simple exogenous conditions that can be checked to ensure the main result holds

Next state

Utility

Current state orcurrent action

Current state orcurrent action

Many cognitive radio models do notaccount for reaction of other devicesto a single device’s action.

In prior work, we developed a generalstochastic game model to tractably capture interactions of many devices.

# of other devices withgiven state

In principle, tracking state of other devices is complex.

We approximate state of other devices via a mean field limit.

State

We will apply our results to a modelof interfering transmissions among energy-constrained devices.

Our main goal is to develop arelated model that applies when a single node interacts with a small number of other nodes each period.

Real environments are reactive and non-stationary;this requires new game-theoretic models of interaction


Wireless environments are reactive l.jpg
Wireless environments are reactive Utility

  • Scenario: Wireless devices sharing same spectrum.

  • Typical Approach: Assume that the environment is non-reactive.

  • Flawed assumption at best:

    • In cognitive radio networks, the environment consists of other cognitive radios – hence is highly reactive

      Questions:

  • How do we design policies for such networks?

  • What is the performance loss if we assume non-reactive environments?


Foundational theory markov perfect equilibrium l.jpg

State of player i Utility

State of other players

Action of player i

Foundational theory – Markov Perfect Equilibrium

  • Model such reactive environments as stochastic dynamic games.

  • Key solution concept is that of Markov perfect equilibrium (MPE).

  • The action of each player depends on the state of everyone.

  • Problems:

  • Tracking state of everyone else is hard.

  • MPE is hard to compute.


Foundational theory oblivious equilibrium l.jpg

State of player i Utility

Average state of other

players

Action of player i

Foundational Theory – Oblivious Equilibrium

  • Oblivious policies – Each player reacts to only average state of other players

  • Easy to compute and implement.

  • Requires little information exchange.

Question:

When is oblivious equilibrium close to MPE?


Our model l.jpg

# of players Utility

state

Our model

  • mplayers

  • State of player i is xi; action of player i is ai

  • State evolution:

  • Payoff:

    where f-i = empirical distribution of other players’ states


Mpe and oe l.jpg
MPE and OE Utility

  • A Markov policy is a decision rule based on the current state and the empirical distribution:

    ai, t = ¹(xi, t, f-i, t(m))

  • A Markov perfect equilibrium is a vector of Markov policies, where each player has maximized present discounted payoff, given policies of other players.

  • In an oblivious policy, a player responds instead to x-i, t and only the long run average f-i(m).

  • In an oblivious equilibrium each player has maximized present discounted payoff using an oblivious policy, given long run average state induced by other players’ policies.


Prior work l.jpg
Prior Work Utility

  • Generalized the idea of OE to general stochastic games [Allerton 07].

  • Unified existing models, such as LQG games, via our framework [CDC 08].

  • Exogenous conditions for approximating MPE using OE for linear dynamics and separable payoffs [Allerton 08].

    Current Results:

    We have a general set of exogenous conditions (including nonlinear dynamics and nonseparable payoffs) under which OE is a good approximation to MPE.

    These conditions also unify our previous results and existing models.


Assumptions l.jpg
Assumptions Utility

[A1] The state transition function is concave in state and action and has decreasing differences in state and action.

[A2] For any action, is a non-increasing function of state and eventually becomes negative.

[A3] The payoff function is jointly concave in state and action and has decreasing differences in state and action.

[A4] The logarithm of the payoff is Gateaux differentiable w.r.t. f-i.

[A5] MPE and OE exist.

[A6] We restrict attention to policies that make the individual state Markov chain recurrent and keep the discounted sum of the square of the payoff finite.


Assumptions10 l.jpg
Assumptions Utility

Define

g(y) can be interpreted as the maximum rate of change of the logarithm of the payoff function w.r.t a small change in fraction of players at state y.

[A7] We assume that the payoff function is such thatg(y) » O(yK) for some K.

[A8] We assume that there exists a constant C such that the payoff function satisfies the following condition


Main result l.jpg
Main Result Utility

Under [A1]-[A8], oblivious equilibrium payoff is approximately optimal over Markov policies, as m1.

In other words, OE is approximately an MPE.

The key point here is that no single player is overly influential and the true state distribution is close to the time average—so knowledge of other player’s policies does not significantly improve payoff.

Advantage:

Each player can use oblivious policy without loss in performance.


Main contributions and future work l.jpg
Main Contributions and Future Work Utility

  • Provides a general framework to study the interaction of multiple devices.

  • Provides exogenous conditions which can be easily checked to ensure the main result holds.

  • Unifies existing models for which such limits are known.

    Future Work:

  • Apply this model to interfering transmissions between energy constrained nodes.

  • Develop similar models where a single node interacts with a small set of nodes at each time period.


ad