1 / 21

No-Regret Algorithms for Online Convex Programs

No-Regret Algorithms for Online Convex Programs. Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007. Outline. Online learning setting Definition of Regret Safe Set Lagrangian Hedging (gradient form) Lagrangian Hedging (optimization form)

foy
Download Presentation

No-Regret Algorithms for Online Convex Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. No-Regret Algorithms for Online Convex Programs Geoffrey J. Gordon Carnegie Mellon University Presented by Nicolas Chapados 21 February 2007

  2. Outline • Online learning setting • Definition of Regret • Safe Set • Lagrangian Hedging (gradient form) • Lagrangian Hedging (optimization form) • Mention of Theoretical Results • Application: One-Card Poker

  3. Online Learning • Sequence of trials 1, 2, … • At each trial we must pick a hypothesis yi • Correct answer revealed in the form of a convex loss function lt(yt) • Just before seeing t-th example, total loss is given by

  4. Goal of Paper • Introduce Lagrangian Hedging algorithm • Generalization of other algorithms • Hedge (Freund and Schapire) • Weighted Majority (Littlestone and Warmuth) • External-regret Matching (Hart and Mas-Colell) • (CMU Technical Report is much clearer than NIPS paper)

  5. Regret • If we had used a fixed hypothesisy, the loss would have been • The regret is the difference between the total loss of the adaptive and fixed hypotheses: • Positive regret means that we should have preferred the fxed hypothesis

  6. Hypothesis Set • Assume that hypothesis set Y is a convex subset of Rd • For example, the simplex of probability distributions • The corners of Y represent pure actions and the middle region a probability distribution over actions

  7. Loss Function • Minimize a linear loss

  8. Regret Vector • Keep the state of the learning algorithm • Vector that keeps information about actual losses and gradient of loss function • Define regret vector st by the recursion • Arbitrary vector u which satisfiesfor all • Example: if y is a probability, then u can be the vector of all ones.

  9. Use of Regret Vector • Given any hypothesis y, we can use the regret vector to compute its regret:

  10. Safe Set • Region of the regret space in which the regret is guaranteed to be nonpositive for all hypotheses • Goal of the Lagrangian Hedging algorithm is to keep its regret vector « near » the safe set

  11. Safe Set (continued) Hypothesis set Y Safe Set S

  12. Unnormalized Hypotheses • Consider the cone of unnormalized hypotheses: • The safe set is a cone that is polar to this cone of unnormalized hypotheses:

  13. Lagrangian Hedging (Setting) • At each step, the algorithm chooses its play according to the current regret vector and a closed convex potential function F(s) • Define (sub)gradient of F(s) as f(s) • Potential function is what defines the problem to be solved • E.g. Hedge / Weighted Majority:

  14. Lagrangian Hedging (Gradient)

  15. Optimization Form • In practice, may be difficult to define, evaluate and differentiate an appropriate potential function • Optimization form: same pseudo-code as previously, but define F in terms of a simpler hedging functionW • Example corresponding to previous F1

  16. Optimization Form (cont’d) • Then may obtain F as: • And the (sub)gradient as: • Which we may plug into the previous pseudo-code

  17. Theoretical Results(In a nutshell: it all works)

  18. One-Card Poker • Hypothesis space is the set of sequence weight vectors • information about when it is player i’s turn to move and the actions available at that time • Two players: gambler and dealer • Ante = $1 / given 1 card from 13-card deck • Gambler Bets / Dealer Bets / Gambler Bets • A player may fold • If neither folds: player with highest card wins pot

  19. Why is it interesting? • Elements of more complicated games: • Incomplete information • Chance events • Multiple stages • Optimal play requires randomization and bluffing

  20. Results in Self-Play

  21. Results Against Fixed Opponent

More Related