1 / 39

# Issues on the border of economics and computation ?????? ????? ????? ?????? - PowerPoint PPT Presentation

Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב. Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides from Prof. Yishay Mansour’s course at TAU). Two Things. Ex1 to be published by Thu submission deadline: 6.12.12, midnight

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Issues on the border of economics and computation ?????? ????? ????? ??????' - pepin

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Issues on the border of economics and computationנושאים בגבול כלכלה וחישוב

Speaker: Dr. Michael Schapira

Topic: Dynamics in Games (Part III)

(Some slides from Prof. YishayMansour’s courseat TAU)

• Ex1 to be published by Thu

• submission deadline: 6.12.12, midnight

• can submit in pairs

• submit through Dr. Blumrosen’s mailbox

• Debt from last class.

(1,-1) (-1,1)

Left Right

Left

Right

Reminder: Zero-Sum Games

• Azero-sum game is a 2-player strategic game such that for eachsS, we haveu1(s) + u2(s) = 0.

• What is good for me, is bad for my opponent and vice versa

Reminder: Minimax-Optimal Strategies

• A (mixed) strategy s1*isminimax optimal for player 1, if

mins2 S2u1(s1*,s2) ≥mins2 S2u1(s1,s2) for all s1S1

• Similar for player 2

• Can be found via linear programming.

Reminder: Minimax Theorem

• Every 2-player zero-sum game has a unique value V.

• Minimax optimal strategy for R guarantees R’s expected gain at least V.

• Minimaxoptimal strategy for C guarantees R’s expected gain at most V.

• The minimax theorem is a useful tool in the analysis of randomized algorithms

• Let’s see why.

• There are n boxes and exactly one box contains a dollar bill, and the rest of the boxes are empty.

• A probe is defined as opening a box to see if it contains the dollar bill.

• The objective is to locate the box containing the dollar bill while minimizing the number of probes performed.

• How well can a deterministic algorithm do?

• Can we do better via a randomized algorithm?

• i.e., an algorithm that is a probability distribution over deterministic algorithms

• Randomized Find: select xin {H,T} uniformly at random

• if x = H then probe boxes in order from 1 through n and stop if bill is found

• Otherwise, probe boxes in order from n through 1 and stop if bill is found

• The expected number of probes made by the algorithm is (n+1)/2.

• if the dollar bill is in the ith box, then i probes are made with probability ½ and (n - i + 1) probes are made with probability ½.

• Lemma: A lower bound on the expected number of probes required by any randomized algorithm to solve the Find-Bill problem is (n + 1)/2.

• Proof via the minimax theorem!

ALG1 ALG2 … ALGn

• Row player aims to choose malicious inputs;

• Column player aims to choose efficient algorithms

• Payoff for (I,ALG) is the running time of ALG on I

Input1

Input2

.

.

.

Inputm

T(Alg,I)

ALG1 ALG2 … ALGn

• Pure strategies:

• specific input for row player

• deterministic algorithm for column player

• Mixed strategies:

• distribution over inputs for row player

• randomized algorithm for column player

Input1

Input2

.

.

.

Inputm

T(Alg,I)

ALG1 ALG2 … ALGn

• If I’m the column player what strategy (i.e., randomized algorithm) do I want to choose?

Input1

Input2

.

.

.

Inputm

T(Alg,I)

ALG1 ALG2 … ALGn

• What does the minimax theorem mean here?

Input1

Input2

.

.

.

Inputm

T(Alg,I)

Yao’s Principle

• Let T(I,Alg) denote the time required for deterministic algorithm Alg to run on input I. Then,maxp on IminAlgE[T(Ip,Alg)] = minq on algsmaxIE[T(I,Algq)]

• So, for any two probability distributions p and qmindet-algE[T(Ip,Alg)] maxIE[T(I,Algq)]

• Useful technique for proving lower bounds on running times of randomized algorithms

• Step I: Design a probability distribution Ip over inputs for which every deterministic algorithm’s running time is at least a

• Step II:Deduce that every randomized algorithm’s (expected) running time is at least a

• Lemma: A lower bound on the expected number of probes required by any randomized algorithm to solve the Find-Bill problem is (n + 1)/2.

• Proof:

• Consider the scenario that the bill is located in any one of the n boxes uniformly at random.

• Consider only deterministic algorithms that do not probe the same box twice.

• By symmetry we can assume that the probe order for a deterministic algorithm ALG is 1 through n.

• The expected #probes for ALG is ∑i/n = (n+1)/2

• Yao’s principle implies the lower bound.

No Regret Algs: So far…

• In some games (e.g., potential games), best-/better-response dynamics are guaranteed to converge to a PNE.

• In 2-player zero-sum games no-regret dynamics converge to a NE.

• What about general games?

(1,-3) (-4,-4)

Stop Go

Stop

Go

Chicken Game

½

½

¼

¼

½

½

¼

¼

What are the pure NEs?

What are the (mixed) NEs?

(1,-3) (-4,-4)

Stop Go

Stop

Go

Correlated Equilibrium: Illustration

0

½

½

0

• Suppose that there is a trusted random device that samples a pure strategy profile from a distribution P

• … and tells each player his component of the strategy profile.

• If all players other than i are following the strategy suggested by the random device, then i does not have any incentive to deviate.

(1,-3) (-4,-4)

Stop Go

Stop

Go

Correlated Equilibrium: Illustration

1/3

1/3

1/3

0

• Suppose that there is a trusted random device that samples a pure strategy profile from a distribution P

• … and tells each player his component of the strategy profile.

• If all players other than i are following the strategy suggested by the random device, then i does not have any incentive to deviate.

• Consider a game:

• Si is the set of (pure) strategies for player i

• S = S1 x S2 x… x Sn

• s = (s1,s2,…,sn )  S is a vector of strategies

• Ui: S  R is the payoff function for player i.

• Notation: given a strategy vector s, let s-i= (s1,…,si-1,si,…,sn)

• The vector siwhere the i’th element is omitted

A correlated equilibrium is a probability distribution p over (pure) strategy profiles in S such that forany i, si, si’:Σs-ip(si,s-i) ui(si,s-i) ≥ Σs-ip(si,s-i) ui(si’,s-i)

• CE always exists

• why?

• The set of CE is convex

• what about NE?

• CEs are the solution to a set of linear equations

• CE can be computed in an efficient manner (e.g., via linear programming)

• When every player uses a no-regret algorithm to select strategies the dynamics converges to a CE

• in any game!

• But this requires a stronger definition of no-regret…

Types of No-Regret Algs

• No external regret: Do (nearly) as well as best strategy in hindsight

• what we’ve been talking about so far

• I should have always taken the same route to work…

• No internal regret: the Alg could not gain (in hindsight) by substituting a single strategy with another (consistently)

• each time strategy si was chosen substitute with si’

• each time I bought a Microsoft stock I should have bought the Google stock

• No internal regret implies no external regret

• why?

Reminder: Minimizing Regret

• At each round t=1,2, …,T

• There are n actions (experts) 1,2, …, n

• Algorithm selects an action in {1,…,n}

• and then observes the gain gi,t[0,1] of each action i{1,…,n}

• Let gi = Stgi,t. Let gmax = maxigi

• No external regret: Do (at least) “nearly as well” as gmax in hindsight.

• Assume that alg outputs action sequenceA=a1… aT

• The action sequence A(b → d) :

• Change everyait=btoait=din

• g(b→d)is the gain ofA(b → d) (for the same gains gi,t)

• Internal regret:

max{b,d}g(b→d) - galg– = max{b,d} Σt(gd,t-gb,t)pb,t

• An algorithm has no internal regret alg if its internal regret goes to 0 as T goes to infinity

Internal Regret and Dominated Strategies

• Suppose that a player uses a no-internal-regret algorithm to select strategies

• in a repeated game against others

• What guarantees does the player have?

• beyond the no-regret guarantee

Dominated Strategies

• Strategy siis dominated by a (mixed) strategy si’ if for everys-i we have thatui(si,s-i) < ui(si’, s-i)

• Clearly, we like to avoid choosing dominated strategies

si

s’i

Internal Regret and Dominated Strategies

• siis dominated by si’

• every time we playedsiwe do better withsi’

• Define internal regret

• swapping the pair of strategies

• No internal regret 

no dominated strategies

Does a No-Internal-Regret Alg Exist?

• Yes!

• In fact, there exist algorithms with a stronger guarantee: no swap regret.

• no swap regret: alg cannot benefit in hindsight by changing action i to F(i) for any F:{1,…,n} -> {1,…,n}

• We show a generic reduction fromno-external-regret to no-internal-regret

Alg1

External to Swap Regret

• Our algorithm utilizes no-external-regret algorithms to achieve no-internal-regret:

• n no-external-regret algorithms

• intuitively, each algorithm represents a strategyin {1,…,n}

• for algorithm Algi, and for any sequence of gain vectors:gAlgi > gmax - Ri

Alg2

Algn

q1

Alg1

qi

p

qn

External to Swap Regret

• At timet:

• each Algioutputs a distribution qi

• induces a matrix Q

• our algorithm uses Q to decide on a distribution p over the strategies {1,…,n}

• adversary decides on gains vector g=<g1…gn>

• our algorithm returns to each Algisome gains vector

Q

Alg2

Algn

p

Combining the No-External-Regret Algs

• Approach I:

• Select an expert Ai with probability ri

• Let the “selected” expert decide the outcome p

• strategy distribution p=Qr

• Approach II:

• Directly decide on p.

• Our approach: make p=r

• Find a p such that p=Qp

Alg1

Distributing Gain

• Adversary selects gains g=(g1…gn)

• Return to Algi gain vector pig

• Note: Σ pig=g

Alg2

Algn

q1

Alg1

qi

p

qn

• At time t:

• each Algioutputs a distribution qi

• induces a matrix Q

• output distribution p such that p=Qp

• pj = Σi piqi,j

• observe gains g=(g1,…,gn)

• return to Algi the gain vector pig

Q

Alg2

Algn

• Gain of Algi(from its view) at round t

• <qi,t,(pig)> = pi,t<qi,t,gt>

• No-external-regret guarantee:

• gAlgi= Σtpi,t<qi,t,gt> > Σtpi,tgj,t – Ri

• For any swap function F:

• gAlg = Σt <pt,gt>

= Σt<ptQt,gt>

= ΣtΣipi,t<qi,t,gt> = ΣigAlgi>ΣiΣtpi,tgF(i),t – Ri= gAlg,F - ΣiRi

Swap Regret

Corollary:

Can be improved to:

• The Minimax Theorem is a useful tool for analyzing randomized algorithms

• Yao’s Principle

• There exist no-swap-regret algorithms

• Next time: When all players use no-swap-regret algorithms to select strategies the dynamics converge to a CE