Regret Minimization and Job Scheduling

Regret Minimizationand Job Scheduling YishayMansour Tel Aviv University

Regret Minimization: External

Decision Making under uncertainty • Online algorithms • Stochastic models • Competitive analysis • Absolute performance criteria • A different approach: • Define “reasonable“ strategies • Compete with the best (in retrospect) • Relative performance criteria

Routing • Model: Each day 1. select a path from source to destination 2. observe the latencies. • Each day diff values • Strategies: All source-dest. paths • Loss: The average latency on the selected path • Performance Goal: match the latency of best single path

Financial Markets: options • Model: stock or cash. Each day, set portfolio then observe outcome. • Strategies: invest either: all in stock or, all in cash • Gain: based on daily changes in stock • Performance Goal: Implements an option ! CASH STOCK

Machine learning – Expert Advice • Model: each time 1. observe expert predictions 2. predict a label • Strategies: experts (online learning algorithms) • Loss: errors • Performance Goal: match the error rate of best expert • In retrospect 1 1 2 0 3 1 4 1

Parameter Tuning • Model: Multiple parameters. • Strategies: settings of parameters • Optimization: any • Performance Goal: match the best setting of parameters

Parameter Tuning • Development Cycle • develop product (software) • test performance • tune parameters • deliver “tuned” product • Challenge: can we combine • testing • tuning • runtime

Regret Minimization: Model • Actions A={1, … ,N} • Time steps: t ∊{ 1, … , T} • At time step t: • Agent selects a distribution pt(i) over A • Environment returns costs ct(i)ε [0,1] • Adversarial setting • Online loss: lt(on) = Σict(i) pt(i) • Cumulative loss : LT(on) = Σtlt(on)

External Regret • Relative Performance measure: • compares to the best strategy in A • The basic class of strategies • Online cumulative loss : LT(on) = Σtlt(on) • Actionicumulative loss : LT(i) = Σt ct(i) • Best action: LT(best) = MINi{LT(i) }=MINi{Σtct(i)} • External Regret = LT(on) – LT(best)

External Regret Algorithm • Goal: Minimize Regret • Algorithm: • Track the regrets • Weights proportional to the regret • Formally: At time t • Compute the regret to each action • Yt(i)= Lt(on)- Lt(i), and rt(i) = MAX{ Yt(i),0} • pt+1(i) = rt(i) / Σirt(i) • If all rt(i) = 0 select pt+1 arbitrarily. • Rt = < rt(1), …,rt(N)> and ΔRt= Yt - Yt-1

External Regret Algorithm: Analysis Rt = < rt(1), …,rt(N)> and ΔRt= Yt - Yt-1 • LEMMA: ΔRt ∙ Rt-1 = 0 Σi(ct(i) – lt(on)) rt-1(i) = Σict(i)rt-1(i)– Σilt(on)rt-1(i) Σilt(on) rt-1(i) = [Σi ct(i) pt(i) ]Σi rt-1(i) = Σi ct(i)rt-1(i) • LEMMA: Rt-1 Rt

External regret: Bounds • Average regret goes to zero • No regret • Hannan [1957] • Explicit bounds • Littstone & Warmuth ‘94 • CFHHSW ‘97 • External regret = O(log N + √Tlog N)

Regret Minimization: Internal/Swap

Dominated Actions Cost Action y Cost Action x • Model: action y dominates x if y always better than x • Goal: Not to play dominated actions • Goal (unknown model): The fraction of times we play dominated actions is played is vanishing .3 .2 .8 .4 .9 .7 .6 .3 .3 .1

Internal/Swap Regret • Internal Regret • Regret(x,y) = ∑t: a(t)=x ct(x) - ct(y) • Internal Regret = maxx,y Regret(x,y) • Swap Regret • Swap Regret = ∑xmaxy Regret(x,y) • Swap regret ≥ External Regret • ∑xmaxy Regret(x,y) ≥ maxy ∑x Regret(x,y) • Mixed actions • Regret(x,y) = ∑t (ct(x) - ct(y))pt(x)

Dominated Actions and Regret • Assume action y dominates action x • For any t: ct(x) > ct(y)+δ • Assume we used action x for n times • Regret(x,y) > δ n • If SwapRegret < R then • Fraction of time dominated action used • At most R/δ

Calibration Predict prob. of rain • Model: each step predict a probability and observe outcome • Goal: prediction calibrated with outcome • During time steps where the prediction is p the average outcome is (approx) p predictions outcome .3 .5 .3 .5 .3 Calibration: .3 1/3 .5 1/2

Calibration to Regret • Reduction to Swap/Internal regret: • Discrete Probabilities • Say: {0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0} • Loss of action x at time t: (x – ct)2 • y*(x)= argmaxy Regret(x,y) • y*(x)=avg(ct|x) • Consider R(x,y*(x))

Internal regret • No internal regret • [Foster & Vohra] , [Hart & Mas-Colell] • Based on the approachability theorem [Blackwell ’56] • Explicit Bounds • [Cesa-Bianchi & Lugasi’03] • Internal regret = O(log N + √T log N) • [Blum & Mansour] • Swap regret = O(log N + √T N)

Regret: External vs Internal • External regret • You should have bought S&P 500 • Match boyi to girli • Internal regret • Each time you bought IBM you should have bought SUN • Stable matching • Limitations: • - No state • - Additive over time

Regret Minimization: Dynamics [Even-Dar, Mansour, Nadav, 2009]

f2,T e f1,L Latency on edge e = Le(f1,L + f2,T) Routing Games • Atomic • Finite number of players • Player i transfer flow from si to ti s1 f1, L f1 • Splittable flows f1, R • Costi = pε(si, ti)Latency(p) * flowi (p) f2, T t2 s2 t1 f2, B f2

Cournot Oligopoly [Cournot 1838] • Firms select production level • Market price depends on the TOTAL supply • Firms maximize their profit = revenue - cost Market price Y X Cost1(X) Cost2(Y) P X y Overall quantity • Best response dynamics converges for 2 players [Cournot 1838] • Two player’s oligopoly is a super-modular game [Milgrom, Roberts 1990] • Diverges for n 5 [Theocharis 1960]

) f( - $25M U = Resource Allocation Games The best response dynamics generally diverges for linear resource allocation games • Advertisers set budgets: $25M $5M $10M $17M • Each advertiser wins a proportional market share 25 ‘s allocated rate = 5+10+17+25 • Utility: • Concave utility from allocated rate • Quasi-linear with money

Properties of Selfish Routing, Cournot Oligopoly and Resource Allocation Games There exists 1,…,n > 0 Such that 1 u1 (x) + 2 u2(x)+…+n un(x) • Closed convex strategy set • A (weighted) social welfare isconcave • The utility of a player is convex in the vector of actions of other players R Socially Concave Games

The relation between socially concave games and concave games • Concave Games [ Rosen 65] • The utility of a player is strictly concave in her own strategy • A sufficient condition for equilibrium uniqueness Atomic, splittable routing Normal Form Games (with mixed strategies) SociallyConcaveGames ConcaveGames Cournot Zero SumGames Resource Allocation Unique Nash Equilibrium

The average action and average utility converge to NE If each player uses a procedure without regret in socially concave games then their joint play converges to Nash equilibrium: Theorem 1: The average action profile converges to NE Day 1 Day 2 Day 3 Day T Average of days 1…T Player 1: Player 2: (T) - Nash equilibrium Player n: Theorem 2: The average daily payoff of each player converges to her payoff in NE

Convergence of the “average action” and “average payoff”are two different things! • Here the average action converges to (½,½) for every player • On Even Days • On Odd Days t s t s • But the average cost is 2, while the average cost in NE is 1 t s

The Action Profile Itself Need Not Converge • On Even Days • On Odd Days t s t s

Correlated Equilibrium • CE: A joint distribution Q • Each time t, a joint action drawn from Q • Each player action is BR • Theorem [HM,FV]: Multiple players playing low internal (swap) regret converge to CE Action x Action y .3 .2 .8 .4 .9 .7 .6 .3 .3 .1

Regret Minimization and Job Scheduling [Even-Dar, Klienberg, Mannor, Mansour, 2009]

Job Scheduling: Motivating Example GOAL: Minimize load on servers Load Balancer servers users

Job scheduling N unrelated machines machine = action each time step: a job arrives has different loads on different machines algorithm schedules the job on some machine Given its loads Goal: minimize the loads makespan or L2 Regret minimization N actions machines each time step First, algorithm selects an action (machine) Then, observes the losses Job loads Goal: minimize the sum of losses Online Algorithms

Modeling Differences: Information • Information model: • what does the algorithm know when it selects action/machine • Known cost: • First observe costs then select action • job scheduling • Unknown cost: • First select action then observe costs • Regret Minimization

Modeling Differences: Performance • Theoretical Performance measure: • comparison class • job scheduling: best (offline) assignment • regret minimization: best static algorithm • Guarantees: • job scheduling: multiplicative • regret minimization: additive and vanishing. • Objective function: • job scheduling: global (makespan) • regret minimization: additive.

Formal Model • N actions • Each time step t algorithm ON • select a (fractional) action: pt(i) • observe losses ct(i)in [0,1] • Average losses of ON • for action i at time T: ONT(i) = (1/T) Σt<Tpt(i) ct(i) • Global cost function: • C∞(ONT(1), … , ONT(N)) = maxi ONT(i) • Cd(ONT(1), … , ONT(N)) = [ Σi (ONT(i))d ]1/d

Formal Model • Static Optimum: • Consider any fixed distribution α • Every time play α • Static optimum α* - minimizes cost C • Formally: • Let α ◊ L = (α(1)L(1) , … , α(N) L(N)) • Hadamard (or Schur) product. • best fixed α*(L) =arg minαC(α ◊ L ) • where LT(i) = (1/T) Σt ct(i) • static optimality C*(L) = C(α*(L) ◊ L)

Example • Two machines, makespan: L1 L2 α*(L) L1 L2 final loads observed loads 4 2 ( 1/3 , 2/3) 4/3 4/3

Our Results: Adversarial General • General Feasibility Result: • Assume C convex and C* concave • includes makespan and Ld norm for d>1. • There exists an online algorithm ON, which for any loss sequence L: C(ON) < C*(L) + o(1) • Rate of convergence about √N/T

Our Results: Adversarial Makespan • Makespan Algorithm • There exists an algorithm ON • for any loss sequence L C(ON) < C*(L) + O(log2 N / √T) • Benefits: • very simple and intuitive • improved regret bound Two actions Δ

Our Results: Adversarial Lower Bound • We show that for many non-convex C there is a non-vanishing regret • includes Ld norm for d<1 • Non-vanishing regret  ratio >1 There is a sequence of losses L, such that, C(ON) > (1+γ) C*(L), where γ>0

Preliminary: Local vs. Global time B1 B2 Bk Low regret in each block Overall low regret

Preliminary: Local vs. Global • LEMMA: • Assume C convex and C* concave, • Assume a partition of time to Bi • At each time block Biregret at most Ri Then: C(ON)-C*(L) ≤ Σi Ri

Preliminary: Local vs. Global Proof: C(ON) ≤ Σ C(ON(Bi))C is convex Σ C*(L(Bi))≤ C*(L)C* is concave C(ON(Bi)) – C*(L(Bi))≤ Rilow regret in each Bi Σ C(ON(Bi)) – C*(L(Bi))≤ ΣRi C(ON) – C*(L) ≤ ΣRi QED • Enough to bound the regret on subsets.

M1 M2 M1 M2 M1 M2 M1 M2 local opt α*: (1/3,2/3) (2/3,1/3) cost = 4/3 global offline opt: (0,1) (1,0) cost = 1 arrival losses Example t=2 t=1 static opt α*=(1/2,1/2) cost = 3/2

Stochastic case: • Each time t the costs are drawn from a joint distribution, • i.i.d over time steps, not between actions INTUITION: Assume two actions (machines) • Load Distribution: • With probability ½ : (1,0) • With probability ½ : (0,1) • Which policy minimizes makespan regret?! • Regret components: • MAX(L(1),L(2)) = sum/2 +|Δ|/2 • Sum=L(1)+L(2) & Δ=L(1)-L(2)

Stochastic case: Static OPT • Natural choice (model based) • Select always action ( ½, ½ ) • Observations: • Assume T/2+Δ times (1,0) and T/2-Δ times (0,1) • Loads (T/4+ Δ/2 , T/4-Δ/2) • Makespan = T/4+ Δ/2 > T/4 • Static OPT: T/4 – Δ2/T < T/4 • W.h.p. OPT is T/4-O(1) • sum=T/2 & E[|Δ|]= O(√T) • Regret = O(√T)

Can we do better ?!

Stochastic case: Least Loaded • Least loaded machine: • Select the machine with the lower current load • Observation: • Machines have same load (diff ≤ 1): |Δ| ≤ 1 • Sum of loads: E[sum] = T/2 • Expected makespan = T/4 • Regret • Least Loaded Makespan LLM=T/4 ± √T • Regret =MAX{LLM-T/4,0} = O(√T) • Regret considers only the “bad” regret

Regret Minimization and Job Scheduling

Regret Minimization and Job Scheduling

Presentation Transcript

Job-shop Scheduling

Job Shop Scheduling

Adaptive Regret Minimization in Bounded Memory Games

An Introduction to Counterfactual Regret Minimization

Resolve and Regret

Dissonance and Regret

Regret Minimization in Stochastic Games

Regret

Online Job Scheduling

Job Scheduling

Job Shop Scheduling

Regret Minimization and the Price of Total Anarchy

Job Scheduling and Chronobot

Job scheduling: Workload Management System and Job Submission

Job Scheduling

Job scheduling

Job Shop Scheduling

JOB SHOP SCHEDULING

Job Scheduling Software

Job Scheduling and Chronobot

Job Shop Scheduling

REGRET