- 284 Views
- Uploaded on
- Presentation posted in: Sports / GamesEducation / CareerFashion / BeautyGraphics / DesignNews / Politics

Uri Zwick Tel Aviv University

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Simple Stochastic GamesMean Payoff GamesParity Games

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.: AAAA

Simple Stochastic Games

Mean Payoff Games

Parity Games

Randomizedsubexponential algorithm for SSG

Deterministicsubexponential algorithm for PG

Simple Stochastic Games

Mean Payoff Games

Parity Games

R

R

R

R

A simple Simple Stochastic Game

min-sink

MAX-sink

R

min

MAX

RAND

Two Players: MAX and min

Objective:MAX/min the probability of getting to the MAX-sink

A generalstrategy may be randomized and history dependent

A positional strategy is deterministicand history independent

Positionalstrategy for MAX: choice of an outgoing edge from each MAX vertex

Every vertex i in the game has a valuevi

general

positional

general

positional

Both players have positionaloptimal strategies

There are strategies that are optimal for every starting position

Terminating binary games

The outdegrees of all non-sinks are 2

All probabilities are ½.

The game terminates with prob. 1

Easy reduction from general gamesto terminating binary games

The values vi of the vertices of a game are the unique solution of the following equations:

The values are rational numbersrequiring only a linear number of bits

Corollary: Decision version in NP co-NP

Iterate the operator:

Converges to the unique solution

But, may require an exponentialnumber of iterations to get close

R

min

MAX

RAND

Limiting average version

Discounted version

R

min

MAX

RAND

Theorem:[Epenoux (1964)]

Values and optimal strategies of a MDP can be found by solving an LP

Deciding whether the value of a game isat least (at most) v is in NP co-NP

To show that value v,guess an optimal strategy for MAX

Find an optimal counter-strategy for min by solving the resulting MDP.

Is the problem in P ?

R

min

MAX

RAND

Non-terminating version

Discounted version

ReachabilitySSGs

PayoffSSGs

MPGs

Pseudo-polynomial algorithm

(PZ’96)

Again, both players have optimal positional strategies.

Value(σ,) – average of cycle formed

Priorities

2

3

2

1

4

1

EVEN wins if largest priorityseen infinitely often is even

8

3

ODD

EVEN

EVEN wins if largest priorityseen infinitely often is even

Equivalent to many interesting problemsin automata and verification:

Non-emptyness of -tree automata

modal -calculus model checking

8

3

ODD

EVEN

Mean Payoff Games (MPGs)

[Stirling (1993)] [Puri (1995)]

Replace priority k by payoff (n)k

Move payoffs to outgoing edges

…

Start with some strategy σ (of MAX)

While there are improving switches, perform some of them

As each step is strictly improving and as there is a finite number of strategies, the algorithm must end with an optimal strategy

SSG PLS (Polynomial Local Search)

Performing only one switch at a time may lead to exponentially many improvements,even for MDPs [Condon (1992)]

What happens if we perform all profitable switches[Hoffman-Karp (1966)]

???

Not known to be polynomialO(2n/n) [Mansour-Singh (1999)]

No non-linear examples2n-O(1) [Madani (2002)]

A randomized subexponential algorithm for simple stochastic games

Start with an arbitrary strategy for MAX

Choose a random vertex iVMAX

Find the optimal strategy ’ for MAX in the gamein which the only outgoing edge of i is (i,(i))

If switching ’ at i is not profitable, then ’ is optimal

Otherwise, let (’)i and repeat

MAX vertices

All correct !

Would never be switched !

There is a hidden order of MAX vertices under which the optimal strategy returned by the first recursive call correctly fixes the strategy of MAX at vertices 1,2,…,i

ui(σ)- the maximum sum of values of a strategy of MAX that agrees with σ on i

Order the vertices such that

Positions 1,..,iwere switchedand would neverbe switched again

General (non-binary) SSGs can be solved in time

Independently observed by[Björklund-Sandberg-Vorobyov (2005)]

AUSO – Acyclic Unique Sink Orientations

GPLCPGeneralized Linear ComplementaryProblem with a P-matrix

A deterministic subexponential algorithm for parity games

Mike PatersonMarcin JurdzinskiUri Zwick

Priorities

2

3

2

1

4

1

EVEN wins if largest priorityseen infinitely often is even

8

3

ODD

EVEN

Mean Payoff Games (MPGs)

[Stirling (1993)] [Puri (1995)]

Replace priority k by payoff (n)k

Move payoffs to outgoing edges

Vertices of highest priority(even)

Firstrecursivecall

Vertices from whichEVEN can force thegame to enter A

Lemma: (i)

(ii)

Second recursivecall

In the worst case, both recursive calls are on games of size n1

Idea:Look for small dominions!

Second recursivecall

Dominions of size s can be found in O(ns) time

Dominion

Dominion: A (small) set from which one of the players can win without the play ever leaving this set

- Polynomial algorithms?
- Is the Policy Improvement algorithm polynomial?
- Faster subexponential algorithmsfor parity games?
- Deterministic subexponential algorithmsfor MPGs and SSGs?
- Faster pseudo-polynomial algorithmsfor MPGs?