Regret minimization in stochastic games
Download
1 / 20

Regret Minimization in Stochastic Games - PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on

Regret Minimization in Stochastic Games. Shie Mannor and Nahum Shimkin Technion, Israel Institute of Technology Dept. of Electrical Engineering. Introduction. Modeling of a dynamic decision process as a stochastic game: Non stationarity of the environment

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Regret Minimization in Stochastic Games' - ajaxe


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Regret minimization in stochastic games
Regret Minimization in Stochastic Games

Shie Mannor and Nahum Shimkin

Technion, Israel Institute of Technology

Dept. of Electrical Engineering

UAI 2000


Introduction
Introduction

  • Modeling of a dynamic decision process as a stochastic game:

    • Non stationarity of the environment

    • Environments are not (necessarily) hostile

  • Looking for the best possible strategy in light of the environment’s actions.

UAI 2000


Repeated matrix games
Repeated Matrix Games

  • The sets of single stage strategies P and Q are simplical.

  • Rewards are defined by a reward matrix G: r(p,q)=pGq

  • Reward criteria - average reward

    Need not converge –stationarity is not

    assumed

UAI 2000


Regret for repeated matrix games
Regret for Repeated Matrix Games

  • Suppose by time t, average reward is , opponent empirical strategy is qt.

  • The regret is defined as:

  • A policy is called regret minimizing if:

UAI 2000


Regret minimization for repeated matrix games
Regret minimization for repeated matrix games

  • Such policies do exist (Hannan, 56)

  • A proof using Approachability theory (Blackwell, 56)

  • Also for games with partial observation (Auer et al. ,1995 ; Rustichini, 1999)

UAI 2000


Stochastic games
Stochastic Games

  • Formal Model:

    S={1,…,s} state space

    A=A(s) actions of Regret minimizing player, P1

    B=B(s) actions of the “environment”, P2

    r - reward function, r(s,a,b)

    P - transition kernel, P(s`|s,a,b)

  • Expected average for pP, qQ is r(p,q)

  • Single state recurrence assumption

UAI 2000


Bayes reward in strategy space
Bayes Reward in Strategy Space

  • For every stationary strategy qQ, the Bayes reward is defined as:

  • Problems:

    • P2’s strategy is not completely observed

    • P1’s observations may depends on the strategies of both players

UAI 2000


Bayes reward in state action space
Bayes Reward in State-Action Space

  • Let psb be the observed frequency of P2’s action b and state s.

  • A natural estimate of q is:

    The associated Bayes envelope is:

UAI 2000


Approachability theory
Approachability Theory

  • A standard tool in the theory of repeated matrix games (Blackwell, 1956)

  • For a game with vector reward and average reward

  • A set is approachable by P1 with a policy s if:

  • Was extended to recurrent stochastic games (Shimkin and Shwartz, 1993)

UAI 2000


The convex bayes envelope
The Convex Bayes Envelope

  • In general BE is not approachable.

  • Define CBE=co(BE), that is

    where is the lower convex hull

    of

    Theorem: CBE is approachable.

    (val is the value of the game)

UAI 2000


Single controller games
Single Controller Games

Theorem: Assume that P2 alone controls the transitions, i.e.

then BE itself is approachable.

UAI 2000


An application to prediction with expert advice
An Application to Prediction with Expert Advice

  • Given a channel and a set of experts

  • At each time epoch each expert states his prediction of the next symbol and P1 has to choose his prediction, 

  • Then a letter  appears in the channel and P1 receives his prediction reward r(, )

  • Problem can be formulated as stochastic game, P2 stands for all experts and the channel

UAI 2000


Prediction example cont

(0,0,0)

r(a,b)

r=0

0

0

(k-1,k,k)

(k,k,k)

Expert recommendation

Prediction Example (cont’)

Theorem: P1 has a zero regret strategy.

UAI 2000


An example in which be is not approachable

a=1

P=0.99

P=0.99

P=0.99

r=b

S1

r=b

S0

a=0

B(0)=B(1)={-1,1}

P=0.99

An example in which BE is not approachable

It can be proved that BE for the

above game is not approachable

UAI 2000


Example cont
Example (cont’)

  • In r*(q) space the envelopes are:

UAI 2000


Open questions
Open questions

  • Characterization of minimal approachable sets in reward-state-actions space

  • On-line learning schemes for stochastic games with unknown parameters

  • Other ways of formulating optimality with respect to observed state action frequencies

UAI 2000


Conclusions
Conclusions

  • The problem of regret minimization for stochastic games was considered

  • The proposed solution concept, CBE, is based on convexification of the Bayes envelope in the natural state action space.

  • The concept of CBE ensures an average reward that is higher than value when the opponent is sub optimal

UAI 2000


Regret minimization in stochastic games1
Regret Minimization in Stochastic Games

Shie Mannor and Nahum Shimkin

Technion, Israel Institute of Technology

Dept. of Electrical Engineering

UAI 2000


Approachability theory1
Approachability Theory

  • Let m(p,q) be the average vector valued reward in a game when P1 and P2 play p and q

  • Define

  • Theorem [Blackwell 56]: A convex set C is approachable if and only if for every qQ

  • Extended to stochastic games (Shimkin and Shwartz, 1993)

UAI 2000


A related vector valued game
A related Vector Valued Game

  • Define the following vector valued game:

    • If in state s action b is played by P2 and a reward r is gained then the vector valued mt :

UAI 2000


ad