When is it Best to Best-Reply?

When is it Best to Best-Reply? Michael Schapira (Yale University and UC Berkeley) Joint work withNoam Nisan (Hebrew U), Gregory Valiant (UC Berkeley) and Aviv Zohar (Hebrew U)

Sprint AT&T Comcast Qwest Motivation: Internet Routing Establish routes between Autonomous Systems (ASes). Currently handled by the Border Gateway Protocol (BGP).

Internet Routing as a Game[Levin-S-Zohar] • Internet routing is a game! • players = ASes • players’ types = preferences over routes • strategies = routes • BGP = Best-Response Dynamics • each AS constantly selects its best available route to each destination • … until a “stable state” (= PNE) is reached.

But… • Challenge I: No synchronization ofplayers’ actions • players can best-reply simultaneously. • players can best-reply based on outdated information. • When is BGP guaranteed to converge to a stable state? • Challenge II: Are players incentivized to follow best-response dynamics? • Can an AS gain from not executing BGP?

Agenda • Mechanism design approach to best-response dynamics.(main focus of this talk) • Convergence of best-response dynamics in asynchronous environments. [Jaggard-S-Wright](if time permits)

Agenda • Part I: mechanism design approach to best-response dynamics. • Part II: on the convergence of best-response dynamics in asynchronous environments. Incentive-Compatible Best-Response Dyanmics

Main Questions • When is myopic best-replying also good in the long run? • When can stable outcomes be implemented in partial-information settings? • Can we reason about partial-information settings via complete-information games?

Our Results Have Implications For • Internet protocols • Internet routing (BGP), congestion control (TCP) • Auctions • 1st-price auctions, unit-demand auctions, GSP • Matching • correlated markets, interns and hospitals • Cost-sharing mechanisms • Moulin mechanisms, …

1st Price Auction Alice (va=4) winner:utility Bob (vb=2)

Ascending-Price English Auction Alice (va=4) Bob (vb=2)

Best-Reply(with some-tie breaking) Alice (va=4) Bob (vb=2)

The Model • n players • Player i has • action setAi • (private) typetiєTi • utility functionui

The Model: Dynamic Interaction • Discrete time steps. Initial action profile a0. • One player is activated in each time step • round-robin (cyclic) order • our results are independent of the order (and also hold for asynchronous environments) • Players’ strategies specify which actions are selected in each time step. • can be history-dependent • Best-response dynamics = the strategy profile in which each player constantly best-replies to others’ actions

Two Possible Payoff Models Cumulative model • Payoffs are accumulated • Alternative formulation with discount factors Payoff at the limit • If the dynamics converges to a stable outcome a* • If no convergence, the resulting payoff is low. Weaker (actively discourages oscillations), interesting applications More natural.sometimes too restrictive

Solution Concept • A strategy profile  is an ex-post Nash equilibrium if no player wishes to deviate from  regardless of the types (this is essentially the best possible in a distributed environment [Shneidman-Parkes])

Best-Replying is Not Always Best Row Player: Type 2 Row Player: Type 1 • dominance-solvable • potential game • unique and Pareto optimal PNE

When is it Good to Best-Reply? • Goal: identify a class of games in which best-response dynamics is an ex-post Nash equilibrium. • i.e., best-replying is incentive-compatible • close in spirit to “learning equilibria”[Brafman-tennenholtz] • This class is going to be VERY restricted. Still… a variety of mechanisms/protocols. • Remark: The best replies are not always unique. Thus, we must handle tie-breaking.

One Class of Games • Lemma: If each realization of types yields a game in which each player has a single dominant strategy, then best-response dynamics is an ex-post Nash equilibrium.

On the Other Hand… Row Player: Type 2 Row Player: Type 1 • no player has a dominant strategy (in both realizations). • best-response dynamics is an ex-post Nash equilibrium. • This game is blindly solvable.

Blindly-Dominated Strategy Sets T

Blindly-Solvable Games • Defn: A game is blindly-solvable if iterated elimination of blindly-dominated strategy sets results in a single strategy profile. • Observation: the “surviving” strategy profile is the unique PNE of the game. • Defn: A partial-information game is blindly-solvable if every realization of types yields a blindly-solvable game.

1st-Price Auctions Revisited Alice (va=4) Bob (vb=2)

Merits of Blindly-Solvable Games • Thm: Let G be a blindly-solvable partial-information game. Let a* be the surviving strategy profile. Then, • Best-response dynamics converges to a* within n(Sj|Aj|) time steps. • In the “payoff at the limit” model, best-response dynamics is incentive-compatible, and even collusion-proof, in ex-post Nash.

Intuition for Proof of (2) • The first action that was not “eliminated” in the elimination sequence of G must belong to a manipulator. • The manipulator’s utility from that action is lower than his utility from a*.

Best-Response 1st-PriceAuction Mechanism Alice (va=4) Bob (vb=2)

Implications forInternet Environments • Under realistic conditions routing with the Border Gateway Protocol is incentive compatible.[Levin-S-Zohar] • Convergence and incentive compatibility results for congestion control. [Godfrey-S-Zohar-Shenker] Mechanism design without money!

Beyond Blindly-Solvable Games

Generalized 2nd-Price Auction (GSP) • Used for selling ads on search engines. • k slots. Each slot j with click-through-rate j. • Users submit bids (per click) bi. • They are ranked in order of bids. • If ad is clicked: pay next highest bid.

Generalized 2nd-Price Auction (GSP) • No dominant strategy equilibrium. • There exists an equilibrium with VCG payments. [Edelman-Ostrovsky-Schwarz, Varian] • Best-response dynamics (with tie-breaking) converge with probability 1 to that equilibrium. [Cary et al.] • Thm (informal): Best-replying in GSP is incentive-compatible. • Generalizes the English auction of [Edelman-Ostrovsky-Schwarz]

Auctions With Unit-Demand Bidders • n bidders. m items. • Each bidder i has value vi,j for each item j, and is interested in at most one item. • Thm: There exists a best-response mechanism for auctions with unit-demand bidders that is incentive-compatible in ex-post Nash and converges to the VCG outcome. • Generalizes the English auction of [Demange-Gale-Sotomayer] • The proof of incentive-compatibility is simple. The proof of convergence is more complex and is based on Kuhn’s Hungarian method.

A New Perspective on Some Centralized Mechanisms

Centralized vs. Distributed distributed centralized players declare types players reach a stable outcome in a distributed manner simulate interaction output the outcome ex-post equilibrium in the decentralized setting dominant strategy implementation in the centralized setting.

The Centralized Setting • Each player i has an action set Ai, a private type ti, and a utility function ui(as before). • Wanted: a direct revelation mechanism that outputs a pure Nash equilibrium of the game. and incentivizes truthfulness

Clearly, This is Not Always Possible Row Player: Type 2 Row Player: Type 1

Corollary I • If every player has a single dominant strategy in everyrealization, then the direct-revelation mechanism is truthful. • Give each player his dominant strategy in the reported realization.

Corollary II • If the game is blindly solvable, then the direct-revelation mechanism is truthful. Row Player: Type 2 Row Player: Type 1

More Blindly-Solvable Games • Cost-Sharing mechanisms • Moulin mechanisms [Moulin, Moulin-Shenker] • Acyclic mechanisms [Mehta-Roughgarden-Sundararajan] • Matching games • Interns and Hospitals • Correlated two sided markets

Directions for Future Research • Implementability of other kinds of equilibria (mixed Nash, correlated, …)? • Incentive-compatibility of other kinds of dynamics (fictitious play, regret minimization)?

Agenda • Part I: mechanism design approach to best-response dynamics. • Part II: on the convergence of best-response dynamics in asynchronous environments. Best-Response DynamicsOut of Sync

Synchronous Environments • In traditional best-response dynamics players are activated one at a time. • More generally, the study of game dynamics normally supposes synchrony. • What if the interaction between players is asynchronous? (Internet, markets)

Illustration Column Player 0,0 2,1 Row Player 0,0 1,2

But… Column Player 0,0 2,1 Row Player 0,0 1,2

Model for Analyzing Asynchronous Best-Response Dynamics • Infinite sequence of discrete time-steps • In each time-step a subset of the players best-replies. • The “schedule” is chosen by an adversarial entity (“the Scheduler”). • The schedule must be fair (no player is indefinitely “starved” from best-replying).

Result [Jaggard-S-Wright] • Thm: If two pure Nash equilibria(or more) exist in a game then asynchronous best-reply dynamics can potentially oscillate. • Implications for Internet protocols, diffusion of innovations in social networks, and more.

Directions for Future Research • Characterization of games for which asynchronous best-response dynamics converge. • More generally, exploring game dynamics in the realm that lies beyond synchronization (fictitious play, regret minimization).

Thank You!

When is it Best to Best-Reply?