1 / 70

Regret Minimization and Job Scheduling

Regret Minimization and Job Scheduling. Yishay Mansour Tel Aviv University. Regret Minimization: External. Decision Making under uncertainty . Online algorithms Stochastic models Competitive analysis Absolute performance criteria A different approach: Define “reasonable“ strategies

komala
Download Presentation

Regret Minimization and Job Scheduling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regret Minimizationand Job Scheduling YishayMansour Tel Aviv University

  2. Regret Minimization: External

  3. Decision Making under uncertainty • Online algorithms • Stochastic models • Competitive analysis • Absolute performance criteria • A different approach: • Define “reasonable“ strategies • Compete with the best (in retrospect) • Relative performance criteria

  4. Routing • Model: Each day 1. select a path from source to destination 2. observe the latencies. • Each day diff values • Strategies: All source-dest. paths • Loss: The average latency on the selected path • Performance Goal: match the latency of best single path

  5. Financial Markets: options • Model: stock or cash. Each day, set portfolio then observe outcome. • Strategies: invest either: all in stock or, all in cash • Gain: based on daily changes in stock • Performance Goal: Implements an option ! CASH STOCK

  6. Machine learning – Expert Advice • Model: each time 1. observe expert predictions 2. predict a label • Strategies: experts (online learning algorithms) • Loss: errors • Performance Goal: match the error rate of best expert • In retrospect 1 1 2 0 3 1 4 1

  7. Parameter Tuning • Model: Multiple parameters. • Strategies: settings of parameters • Optimization: any • Performance Goal: match the best setting of parameters

  8. Parameter Tuning • Development Cycle • develop product (software) • test performance • tune parameters • deliver “tuned” product • Challenge: can we combine • testing • tuning • runtime

  9. Regret Minimization: Model • Actions A={1, … ,N} • Time steps: t ∊{ 1, … , T} • At time step t: • Agent selects a distribution pt(i) over A • Environment returns costs ct(i)ε [0,1] • Adversarial setting • Online loss: lt(on) = Σict(i) pt(i) • Cumulative loss : LT(on) = Σtlt(on)

  10. External Regret • Relative Performance measure: • compares to the best strategy in A • The basic class of strategies • Online cumulative loss : LT(on) = Σtlt(on) • Actionicumulative loss : LT(i) = Σt ct(i) • Best action: LT(best) = MINi{LT(i) }=MINi{Σtct(i)} • External Regret = LT(on) – LT(best)

  11. External Regret Algorithm • Goal: Minimize Regret • Algorithm: • Track the regrets • Weights proportional to the regret • Formally: At time t • Compute the regret to each action • Yt(i)= Lt(on)- Lt(i), and rt(i) = MAX{ Yt(i),0} • pt+1(i) = rt(i) / Σirt(i) • If all rt(i) = 0 select pt+1 arbitrarily. • Rt = < rt(1), …,rt(N)> and ΔRt= Yt - Yt-1

  12. External Regret Algorithm: Analysis Rt = < rt(1), …,rt(N)> and ΔRt= Yt - Yt-1 • LEMMA: ΔRt ∙ Rt-1 = 0 Σi(ct(i) – lt(on)) rt-1(i) = Σict(i)rt-1(i)– Σilt(on)rt-1(i) Σilt(on) rt-1(i) = [Σi ct(i) pt(i) ]Σi rt-1(i) = Σi ct(i)rt-1(i) • LEMMA: Rt-1 Rt

  13. External regret: Bounds • Average regret goes to zero • No regret • Hannan [1957] • Explicit bounds • Littstone & Warmuth ‘94 • CFHHSW ‘97 • External regret = O(log N + √Tlog N)

  14. Regret Minimization: Internal/Swap

  15. Dominated Actions Cost Action y Cost Action x • Model: action y dominates x if y always better than x • Goal: Not to play dominated actions • Goal (unknown model): The fraction of times we play dominated actions is played is vanishing .3 .2 .8 .4 .9 .7 .6 .3 .3 .1

  16. Internal/Swap Regret • Internal Regret • Regret(x,y) = ∑t: a(t)=x ct(x) - ct(y) • Internal Regret = maxx,y Regret(x,y) • Swap Regret • Swap Regret = ∑xmaxy Regret(x,y) • Swap regret ≥ External Regret • ∑xmaxy Regret(x,y) ≥ maxy ∑x Regret(x,y) • Mixed actions • Regret(x,y) = ∑t (ct(x) - ct(y))pt(x)

  17. Dominated Actions and Regret • Assume action y dominates action x • For any t: ct(x) > ct(y)+δ • Assume we used action x for n times • Regret(x,y) > δ n • If SwapRegret < R then • Fraction of time dominated action used • At most R/δ

  18. Calibration Predict prob. of rain • Model: each step predict a probability and observe outcome • Goal: prediction calibrated with outcome • During time steps where the prediction is p the average outcome is (approx) p predictions outcome .3 .5 .3 .5 .3 Calibration: .3 1/3 .5 1/2

  19. Calibration to Regret • Reduction to Swap/Internal regret: • Discrete Probabilities • Say: {0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0} • Loss of action x at time t: (x – ct)2 • y*(x)= argmaxy Regret(x,y) • y*(x)=avg(ct|x) • Consider R(x,y*(x))

  20. Internal regret • No internal regret • [Foster & Vohra] , [Hart & Mas-Colell] • Based on the approachability theorem [Blackwell ’56] • Explicit Bounds • [Cesa-Bianchi & Lugasi’03] • Internal regret = O(log N + √T log N) • [Blum & Mansour] • Swap regret = O(log N + √T N)

  21. Regret: External vs Internal • External regret • You should have bought S&P 500 • Match boyi to girli • Internal regret • Each time you bought IBM you should have bought SUN • Stable matching • Limitations: • - No state • - Additive over time

  22. Regret Minimization: Dynamics [Even-Dar, Mansour, Nadav, 2009]

  23. f2,T e f1,L Latency on edge e = Le(f1,L + f2,T) Routing Games • Atomic • Finite number of players • Player i transfer flow from si to ti s1 f1, L f1 • Splittable flows f1, R • Costi = pε(si, ti)Latency(p) * flowi (p) f2, T t2 s2 t1 f2, B f2

  24. Cournot Oligopoly [Cournot 1838] • Firms select production level • Market price depends on the TOTAL supply • Firms maximize their profit = revenue - cost Market price Y X Cost1(X) Cost2(Y) P X y Overall quantity • Best response dynamics converges for 2 players [Cournot 1838] • Two player’s oligopoly is a super-modular game [Milgrom, Roberts 1990] • Diverges for n 5 [Theocharis 1960]

  25. ) f( - $25M U = Resource Allocation Games The best response dynamics generally diverges for linear resource allocation games • Advertisers set budgets: $25M $5M $10M $17M • Each advertiser wins a proportional market share 25 ‘s allocated rate = 5+10+17+25 • Utility: • Concave utility from allocated rate • Quasi-linear with money

  26. Properties of Selfish Routing, Cournot Oligopoly and Resource Allocation Games There exists 1,…,n > 0 Such that 1 u1 (x) + 2 u2(x)+…+n un(x) • Closed convex strategy set • A (weighted) social welfare isconcave • The utility of a player is convex in the vector of actions of other players R Socially Concave Games

  27. The relation between socially concave games and concave games • Concave Games [ Rosen 65] • The utility of a player is strictly concave in her own strategy • A sufficient condition for equilibrium uniqueness Atomic, splittable routing Normal Form Games (with mixed strategies) SociallyConcaveGames ConcaveGames Cournot Zero SumGames Resource Allocation Unique Nash Equilibrium

  28. The average action and average utility converge to NE If each player uses a procedure without regret in socially concave games then their joint play converges to Nash equilibrium: Theorem 1: The average action profile converges to NE Day 1 Day 2 Day 3 Day T Average of days 1…T Player 1: Player 2: (T) - Nash equilibrium Player n: Theorem 2: The average daily payoff of each player converges to her payoff in NE

  29. Convergence of the “average action” and “average payoff”are two different things! • Here the average action converges to (½,½) for every player • On Even Days • On Odd Days t s t s • But the average cost is 2, while the average cost in NE is 1 t s

  30. The Action Profile Itself Need Not Converge • On Even Days • On Odd Days t s t s

  31. Correlated Equilibrium • CE: A joint distribution Q • Each time t, a joint action drawn from Q • Each player action is BR • Theorem [HM,FV]: Multiple players playing low internal (swap) regret converge to CE Action x Action y .3 .2 .8 .4 .9 .7 .6 .3 .3 .1

  32. Regret Minimization and Job Scheduling [Even-Dar, Klienberg, Mannor, Mansour, 2009]

  33. Job Scheduling: Motivating Example GOAL: Minimize load on servers Load Balancer servers users

  34. Job scheduling N unrelated machines machine = action each time step: a job arrives has different loads on different machines algorithm schedules the job on some machine Given its loads Goal: minimize the loads makespan or L2 Regret minimization N actions machines each time step First, algorithm selects an action (machine) Then, observes the losses Job loads Goal: minimize the sum of losses Online Algorithms

  35. Modeling Differences: Information • Information model: • what does the algorithm know when it selects action/machine • Known cost: • First observe costs then select action • job scheduling • Unknown cost: • First select action then observe costs • Regret Minimization

  36. Modeling Differences: Performance • Theoretical Performance measure: • comparison class • job scheduling: best (offline) assignment • regret minimization: best static algorithm • Guarantees: • job scheduling: multiplicative • regret minimization: additive and vanishing. • Objective function: • job scheduling: global (makespan) • regret minimization: additive.

  37. Formal Model • N actions • Each time step t algorithm ON • select a (fractional) action: pt(i) • observe losses ct(i)in [0,1] • Average losses of ON • for action i at time T: ONT(i) = (1/T) Σt<Tpt(i) ct(i) • Global cost function: • C∞(ONT(1), … , ONT(N)) = maxi ONT(i) • Cd(ONT(1), … , ONT(N)) = [ Σi (ONT(i))d ]1/d

  38. Formal Model • Static Optimum: • Consider any fixed distribution α • Every time play α • Static optimum α* - minimizes cost C • Formally: • Let α ◊ L = (α(1)L(1) , … , α(N) L(N)) • Hadamard (or Schur) product. • best fixed α*(L) =arg minαC(α ◊ L ) • where LT(i) = (1/T) Σt ct(i) • static optimality C*(L) = C(α*(L) ◊ L)

  39. Example • Two machines, makespan: L1 L2 α*(L) L1 L2 final loads observed loads 4 2 ( 1/3 , 2/3) 4/3 4/3

  40. Our Results: Adversarial General • General Feasibility Result: • Assume C convex and C* concave • includes makespan and Ld norm for d>1. • There exists an online algorithm ON, which for any loss sequence L: C(ON) < C*(L) + o(1) • Rate of convergence about √N/T

  41. Our Results: Adversarial Makespan • Makespan Algorithm • There exists an algorithm ON • for any loss sequence L C(ON) < C*(L) + O(log2 N / √T) • Benefits: • very simple and intuitive • improved regret bound Two actions Δ

  42. Our Results: Adversarial Lower Bound • We show that for many non-convex C there is a non-vanishing regret • includes Ld norm for d<1 • Non-vanishing regret  ratio >1 There is a sequence of losses L, such that, C(ON) > (1+γ) C*(L), where γ>0

  43. Preliminary: Local vs. Global time B1 B2 Bk Low regret in each block Overall low regret

  44. Preliminary: Local vs. Global • LEMMA: • Assume C convex and C* concave, • Assume a partition of time to Bi • At each time block Biregret at most Ri Then: C(ON)-C*(L) ≤ Σi Ri

  45. Preliminary: Local vs. Global Proof: C(ON) ≤ Σ C(ON(Bi))C is convex Σ C*(L(Bi))≤ C*(L)C* is concave C(ON(Bi)) – C*(L(Bi))≤ Rilow regret in each Bi Σ C(ON(Bi)) – C*(L(Bi))≤ ΣRi C(ON) – C*(L) ≤ ΣRi QED • Enough to bound the regret on subsets.

  46. M1 M2 M1 M2 M1 M2 M1 M2 local opt α*: (1/3,2/3) (2/3,1/3) cost = 4/3 global offline opt: (0,1) (1,0) cost = 1 arrival losses Example t=2 t=1 static opt α*=(1/2,1/2) cost = 3/2

  47. Stochastic case: • Each time t the costs are drawn from a joint distribution, • i.i.d over time steps, not between actions INTUITION: Assume two actions (machines) • Load Distribution: • With probability ½ : (1,0) • With probability ½ : (0,1) • Which policy minimizes makespan regret?! • Regret components: • MAX(L(1),L(2)) = sum/2 +|Δ|/2 • Sum=L(1)+L(2) & Δ=L(1)-L(2)

  48. Stochastic case: Static OPT • Natural choice (model based) • Select always action ( ½, ½ ) • Observations: • Assume T/2+Δ times (1,0) and T/2-Δ times (0,1) • Loads (T/4+ Δ/2 , T/4-Δ/2) • Makespan = T/4+ Δ/2 > T/4 • Static OPT: T/4 – Δ2/T < T/4 • W.h.p. OPT is T/4-O(1) • sum=T/2 & E[|Δ|]= O(√T) • Regret = O(√T)

  49. Can we do better ?!

  50. Stochastic case: Least Loaded • Least loaded machine: • Select the machine with the lower current load • Observation: • Machines have same load (diff ≤ 1): |Δ| ≤ 1 • Sum of loads: E[sum] = T/2 • Expected makespan = T/4 • Regret • Least Loaded Makespan LLM=T/4 ± √T • Regret =MAX{LLM-T/4,0} = O(√T) • Regret considers only the “bad” regret

More Related