1 / 18

Regret to the Best vs. Regret to the Average

Regret to the Best vs. Regret to the Average. Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn) Yishay Mansour (Tel Aviv) Jenn Wortman (Penn). The No-Regret Setting. Learner maintains a weighting over N “experts”

loyal
Download Presentation

Regret to the Best vs. Regret to the Average

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regret to the Bestvs.Regret to the Average Eyal Even-Dar Computer and Information Science University of Pennsylvania Collaborators: Michael Kearns (Penn) Yishay Mansour (Tel Aviv) Jenn Wortman (Penn)

  2. The No-Regret Setting • Learner maintains a weighting over N “experts” • On each of T trials, learner observes payoffs for all K • Payoff to the learner = weighted payoff • Learner then dynamically adjusts weights • Let Ri,T be cumulative payoff of expert i on some sequence of T trials • Let RA,T be cumulative payoff of learning algorithm A • Classical no-regret results: We can produce a learning algorithm A such that on any sequence of trials, RA,T > max{Ri,T} – sqrt(log(N)*T) • “No regret”: per-trial regret sqrt(log(N)/T) approaches 0 as T grows

  3. This Work • We simultaneously examine: • Regret to best expert in hindsight • Regret to the average return of all experts • Note that no learning is required to achieve just this! • Why look at the average? • A “safety net” or “sanity check” • Simple algorithm outperforms • Future direction: S&P 500 • We assume a fixed horizon T • But this can easily be relaxed…

  4. Our Results • Every difference based algorithm with regret O(Tα) to the best expert has Ω(T1-α) regret to the average • There exists simple difference based algorithm achieving the tradeoff • Every algorithm with O(T1/2) regret to the best expert must have regret Ω(T1/2) regret to the average • We can produce an algorithm with O(logT T1/2) regret to the best and O(1) regret to the average

  5. Oscillations: The Cost of an Update • Consider 2 experts with instantaneous gains in {0,1} • Let w be the weight on first expert and initialize w = ½ • Suppose expert 1 gets a gain of 1 on the first time step, and expert 2 gets a gain of 1 on the second… w + D (1,0) (0,1) w w Best, worst, and average all earn 1 Algorithm earns w + (1 – w – D) = 1 – D Regret to Best = Regret to Worst = Regret to Average = D

  6. A Bad Sequence • Consider the following sequence • Expert 1: 1,0,1,0,1,0,1,0,…,1,0 • Expert 2: 0,1,0,1,0,1,0,1,…,0,1 • We can examine w over time for existing algorithms… • Follow the Perturbed Leader: ½,½ + 1/(T(1+ln(2))1/2- 1/2T,½,½ +1/(T(1+ln(2))1/2- 1/2T, ½, … • Weighted Majority: ½, ½ + (ln(2)/2T)1/2/(1+(ln(2)/2T)1/2), ½, ½+(ln(2)/2T)1/2/(1+(ln(2)/2T)1/2), ½, ... • Both will lose to best, worst, and average

  7. A Simple Trade-off: The (T) Barrier w = 2/3 …  Some Dt > 1/6L … w = ½ + D … … w = ½ + D w = ½ T steps, regret to average ~ (T/2)*(1/6L) ~ W(T/L) L steps, regret to best > L/3 • Again, consider 2 experts with instantaneous gains in {0,1} • Let w be the weight on first expert and initialize w = ½ • Will first examine algorithms that depend only on cumulative difference in payoffs • Insight holds more generally for aggressive updating Regret to Best * Regret to Average ~ W(T) ! (1,0) (1,0) (1,0) (1,0)

  8. Exponential Weights [F94] • Unnormalized weight on expert i at time t: wi,t = eηRi,t • Define Wt=∑ wi,t, so we have pi,t = wi,t /Wt • Let N be the number of experts • Setting η = O(1/T1/2)achievesO(T1/2) regret to the best • Setting η = O(1/T1/2+α)achievesO(T1/2+α) regret to the best • Can be shown that Setting η = O(1/T1/2+α) regret to the average is O(T1/2-α)

  9. So far… 1/2 cumulative difference algorithms 1/2 1 Regret to best ~ Tx Regret to average ~ Ty

  10. all algorithms 1/2 cumulative difference algorithms 1/2 1 Regret to best ~ Tx Regret to average ~ Ty An Unrestricted Lower Bound • Any algorithm achieving O(T1/2) regret to best must suffer (T1/2) regret to average • Any algorithm achieving O(log(T)T)1/2 regret to best must suffer (Teregret to the average • Not restricted to cumulative difference algorithms!

  11. A Simple Additive Algorithm • Once again, 2 experts with instantaneous gains in {0,1}, w initialized to ½ • Let Dt be difference in cumulative payoffs of the two experts at time t • The algorithm will make the following updates • If expert gains are (0,0) or (1,1):no change to w • If expert gains are (1,0):w  w + D • If expert gains are (0,1):w  w – D • Assume we never reach w =1 • For any difference Dt = d we have w = ½ + d D

  12. Breaking the (T) Barrier • While |Dt| < H • (0,0) or (1,1):no change to w • (1,0):w  w + D • (0,1):w  w – D • Play EW with h = T-1/3 Will analyze what happens: 1. If we stay in the loop 2. If we exit the loop

  13. Staying in the Loop w + D (1,0) (0,1) w w Lose D to Best & Average While |D_t| < H • (0,0) or (1,1):no change to w • (1,0):w  w + D • (0,1):w  w – D Distance Dt d+1 d Observe Rbest,t - Ravg,T < H Enough to compute regret to the average Time t Regret to the Average at most TD Regret to the Best at most TD+H

  14. Exiting the Loop w + D (1,0) w Lose 1-wto Best Gain w-½ over Average While |D_t| < H • (0,0) or (1,1):no change to w • (1,0):w  w + D • (0,1):w  w – D Play EW with h = T-1/3 Upon exit from loop: • Regret to the best: still at most H + TD • Gain over the average: (D + 2D + 3D + ... + HD) - TD ~ H2 D - TD • So e.g. H = T2/3 and D = 1/T gives • Regret to best: < T2/3 in loop or upon exit • Regret to average: constant in loop; but gain T1/3 upon exit • Now EW regret to the best T2/3 and to the average T1/3 Distance Dt d+1 d Time t

  15. all algorithms 1/2 cumulative difference algorithms 1/2 2/3 1 Regret to best ~ Tx Regret to avg ~ Ty

  16. Obliterating the (T) Barrier • Instead of playing additive algorithm inside the loop, we can play EW with η = Δ = 1/T • Instead of having one phase, we can have many Set η = 1/T, k = logT For i = 1 to k • Reset and run EW with the current value of η until Rbest,t – Ravg,t > H = O(T1/2) • Set η = η * 2 Reset and run EW with final value of η

  17. Extensions and Open Problems • Known Extensions to Our Algorithm: • Instead of average, can use any static weight inside the simplex • Future Goals: • Nicer dependence on the number of experts • Ours is O(logN), typically O(sqrt(logN)) • Generalization to the returns setting and to other loss functions

  18. Thanks! Questions?

More Related