Loading in 2 Seconds...
Loading in 2 Seconds...
Connections between Learning Theory, Game Theory, and Optimization Lecture 1, August 24 th 2010 Maria Florina (Nina) Balcan Big Picture Over the past decades, many important and deep connections between: machine learning theory algorithmic game theory combinatorial optimization
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Lecture 1, August 24th2010
Maria Florina (Nina) Balcan
Over the past decades, many important and deep connections between:
We will explore such connections, discussing:
Online learning. Combining expert advice.
Regret minimization (no external regret and no internal regret). Bandit algorithms.
1/2
1
0
Zero sum games. Nash equilibria.
0
1/2
1
Experts learning & Minimax theorem.
1
0
1/2
Nash equilibria and approximate nash equilibria in general sum bimatrix games.
Mechanism design (MD).
Combinatorial auctions. [Social welfare; revenue maximization]
Auctions for digital goods.
Algorithmic pricing problems.
Submodularity with connections to game theory and machine learning.
http://www.cc.gatech.edu/~ninamf/LGO10/
[50%]
[50%]
Online learning, minimizing regret, and combining expert advice.
N. Littlestone & M. Warmuth
A. Blum
Assume we want to predict the stock market.
Can we do nearly as well as best in hindsight?
Note: “expert” ´ someone with an opinion.
[Not necessairly someone who knows anything.]
Can we do nearly as well as best in hindsight?
Deterministic Majority Algorithm
Randomized versions of this algorithm can provide surprisingly strong guarantees
Note: Of course we might not know OPT, so if running T time steps, since OPT · T, set ² to get additive loss (2T log n)1/2
regret
[no regret algorithm]
E.g., what if have n options, not n predictors?
Other generalizations as well.
Other notions of no regret (e.g., no internal regret).
Game defined by a matrix M.
Assume wlog entries in [0,1].
Scissors
Rock
Paper
1/2
1
0
Rock
Paper
0
1/2
1
Row player (Mindy) chooses row i.
Scissors
1
0
1/2
Column player (Max) chooses column j (simultaneously).
Mindy’s goal: minimize her loss M(i,j).
Max’s goal: maximize this loss (zero sum).
Mindy chooses a distribution P over rows.
Max chooses a distribution Q over columns [simultaneously]
Mindy’s expected loss:
If i,j = pure strategies, and P,Q = mixed strategies
M(P,j)  Mindy’s expected loss when she plays P and Max plays j
M(i,Q)  Mindy’s expected loss when she plays i and Max plays Q
Say Mindy plays before Max. If Mindy chooses P, then Max will pick Q to maximize M(P,Q), so the loss will be
So, Mindy should pick P to minimize L(P). Loss will be:
Similarly, if Max plays first, loss will be:
Playing second cannot be worse than playing first
Mindy plays first
Mindy plays second
Von Neumann’s minimax theorem:
No advantage to playing second!
Von Neumann’s minimax theorem:
Value of the game
Optimal strategies:
Minmax strategy
Maxmin strategy
We will show how to use WM to prove this!
And to also find approximate minmax strategies quickly.
Von Neumann’s minimax theorem:
Value of the game
Optimal strategies:
Minmax strategy
Maxmin strategy
(P*, Q*) is Nash Equilibria (No player has an incentive to unilateraly deviate)
Central solution
concept we will study
Games with many players with interesting structure
"Potential Games", D. Monderer and L, S. Shapley , Games and Economic Behavior
Fair costsharing: n players in weighted directed graph G. Player i wants to get from si to ti, and they share cost of edges they use with others.
G
s
n
1
t
Good equilibrium: all use edge of cost 1.
(paying 1/n each)
Bad equilibrium: all use edge of cost n.
(paying 1 each)
Price of Anarchy (PoA): ratio of worst Nash equilibrium to OPT.
[KoutsoupiasPapadimitriou’99]
Price of Stability (PoS): ratio of best Nash equilibrium to OPT.
[Anshelevich et. al, 2004]
E.g., for fair costsharing, PoS is log(n), whereas PoA is n.
Significant effort spent on understanding these in CS.
“Algorithmic Game Theory”, Nisan, Roughgarden, Tardos, Vazirani
Learning in a distributional setting.
[With feature information]
Image Classification
Document Categorization
Speech Recognition
Protein Classification
Spam Detection
Branch Prediction
Fraud Detection
Decide which emails are spam and which are important.
Supervised classification
Not spam
spam
Goal: use emails seen so far to produce good prediction rule for future data.
+

+

+




Example: Supervised ClassificationRepresent each message by features. (e.g., keywords, spelling, etc.)
example
label
Reasonable RULES:
Predict SPAM if unknown AND (money OR pills)
Predict SPAM if 2money + 3pills –5 known > 0
Linearly separable
Algorithm Design. How to optimize?
Automatically generate rules that do well on observed data.
Optimization played a significant role in the recent years.
Confidence Bounds, Generalization Guarantees, Sample Complexity
Confidence for rule effectiveness on future data.
Well understood for passive supervised learning.
c*
h
Classic models: PAC (Valiant), SLT (Vapnik)
Classic models: PAC (Valiant), SLT (Vapnik)
Seller’s Goal: set prices to maximize revenue.
Algorithmic
Incentive Compatible Auction
Online Pricing
V={1,2, …, n}, f : 2V!R
Submodularity:
f(S)+f(T) ¸ f(S Å T) + f(S [ T) 8 S,Tµ V
Equivalent
Decreasing marginal values:
f(S [ {x})f(S) ¸ f(T [ {x})f(T) 8SµTµV, xT
Examples: