Tight Bounds for Strategyproof Classification

129 Views

Download Presentation
## Tight Bounds for Strategyproof Classification

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Tight Bounds forStrategyproofClassification**Jeff RosenscheinSchool of Computer Science and Engineering Hebrew University Joint work with:Reshef Meir, ShaullAlmagor, AssafMichaely, Ariel Procaccia**Introduction**Motivation Model Results Strategy-Proof Classification • An Example • Motivation • Our Model and some previous results • Filling the gap: proving a lower bound • The weighted case • Some generalization**Introduction**Motivation Model Results The Motivating Questions • Do “strategyproof” considerations apply to machine learning? • If agents have an incentive to lie, what can we do about it? • Approximation • Randomization**Introduction**Motivation Model Results Strategic labeling: an example ERM 5 errors**Introduction**Motivation Model Results There is a better classifier! (for me…)**Introduction**Motivation Model Results If I just change the labels… 2+5 = 7 errors**Classification**Introduction Motivation Model Results E(x,y)~D[ c(x)≠y ] The Supervised Classification problem: • Input: a set of labeled data points {(xi,yi)}i=1..m • output: a classifier c from some predefined concept class C( e.g., functions of the form f : X{-,+} ) • We usually want c to classify correctly not just the sample, but to generalize well, i.e., to minimize R(c) ≡ the expected number of errors w.r.t. the distribution D(the 0/1 loss function)**Classification (cont.)**Introduction Motivation Model Results A common approach is to return the ERM (Empirical Risk Minimizer), i.e., the concept in C that is the best w.r.t. the given samples (has the lowest number of errors) Generalizes well under some assumptions on the concept class C (e.g., linear classifiers tend to generalize well) With multiple experts, we can’t trust our ERM!**Where do we find “experts” with incentives?**Introduction Motivation Model Results Example 1: A firm learning purchase patterns • Information gathered from local retailers • The resulting policy affects them • “the best policy, is the policy that fits my pattern”**Introduction**Motivation Model Results Example 2:Internet polls / polls of experts Users Reported Dataset Classification Algorithm Classifier**Introduction**Motivation Model Results Motivation from other domains Aggregating partitions Judgment aggregation Facility location (on the n-dimensional binary cube)**Introduction**Motivation Model Results Input: Example + – – – – + - + + – – – - + + – - + – – + X Xm Y1 {-,+}m Y2 {-,+}m Y3 {-,+}m S = S1, S2,…, Sn = (X,Y1),…, (X,Yn)**Introduction**Motivation Model Results Mechanisms • A Mechanism M receives a labeled dataset S and outputs c = M(S) C • Private risk of i: Ri(c,S) = |{k: c(xik) yik}| / mi • Global risk: R(c,S) = |{i,k: c(xik) yik}| / m • (No payments) • We allow non-deterministic mechanisms • Measure the expected risk % of errors on Si % of errors on S**Introduction**Motivation Model Results ERM We compare the outcome of M to the ERM: c* = ERM(S) = argmin(R(c),S) r* = R(c*,S) c C Can our mechanism simply compute and return the ERM?**Requirements**Introduction Motivation Model Results MOST IMPORTANT SLIDE Are there any mechanisms that guarantee both SP and good approximation? (Lying) (Truth) • Good approximation: SR(M(S),S) ≤ α∙r* • Strategy-Proofness (SP): i,S,Si‘Ri(M(S-i, Si‘),S)≥ Ri(M(S),S) • ERM(S) is 1-approximating but not SP • ERM(S1) is SP but gives bad approximation • No monetary transfer**Related work**Introduction Motivation Model Results • A study of SP mechanisms in Regression learning • O. Dekel, F. Fischer and A. D. Procaccia, SODA (2008), JCSS (2009). [supervised learning] • No SP mechanisms for Clustering • J. Perote-Peña and J. Perote, Economics Bulletin (2003) [unsupervised learning] • Characterization of SP aggregation rules**Previous work**A simple case Introduction Motivation Model Results R. Meir, A. D. Procaccia and J. S. Rosenschein, StratgeyproofClassification under Constant Hypotheses: A Tale of Two Functions, AAAI 2008 • Tiny concept class: |C|= 2 • Either “all positive” or “all negative” Theorem: • There is an SP 2-approximation mechanism • There are no SP α-approximation mechanisms, for any α<2**Previous work**Proof Sketch of Lower Bound Introduction Motivation Model Results ? R. Meir, A. D. Procaccia and J. S. Rosenschein, StrategyproofClassification under Constant Hypotheses: A Tale of Two Functions, AAAI 2008 C = {“all positive”, “all negative”}**Previous work**General concept classes Introduction Motivation Model Results Theorem: Selecting a dictator at random is SP and guarantees approximation True for any concept class C Question #1: are there better mechanisms? Question #2: what if agents are weighted? Question #3: does this generalize for every distribution? Meir, Procacciaand Rosenschein, IJCAI 2009**A lower bound**Introduction Motivation Model Results • Main result from most recent work: • Matching the upper bound from IJCAI-09 • Proof by careful reduction to a voting scenario • Proof sketch below Theorem: There is a concept class C (where |C|=3), for which any SP mechanism has an approximation ratio of at least**Proof sketch**Introduction Motivation Model Results Gibbard [‘77] proved that every (randomized) SP voting rule for 3 candidates, must be a lottery over dictators.* We define X = {x,y,z}, and C as follows: We also restrict the agents, so that each agent can have mixed labels on just one point *not exactly…**Proof sketch (cont.)**Introduction Motivation Model Results • Suppose that M is SP**Proof sketch (cont.)**Introduction Motivation Model Results cz> cy > cx cx> cz> cy • Suppose that M is SP • M must be monotone on the mixed point • M must ignore the mixed point • M is a (randomized) voting rule**Proof sketch (cont.)**Introduction Motivation Model Results cz> cy > cx cx> cz> cy • By Gibbard [‘77], M is a random dictator • We construct an instance where random dictators perform poorly**Weighted agents**Introduction Motivation Model Results • We must select a dictator randomly • However, probability may be based on weight • Naïve approach: • Only gives a 3-approximation • An optimal SP algorithm: • Matches the lower bound of**Generalization and learning**Introduction Motivation Model Results • So far, we only compared our results to the ERM, i.e., to the data at hand • We want learning algorithms that can generalize well from sampled data • with minimal strategic bias • Can we ask for SP algorithms?**Generalization (cont.)**Introduction Motivation Model Model Results Results • There is a fixed distribution DX on X • Each agent holds a private function labeling the entire input space Yi : X {+,-} • Possibly non-deterministic • The algorithm is allowed to sample from DX and ask agents for their labels • We evaluate the result vs. the optimal risk, averaging over all agents, i.e.,**Generalization Mechanisms**Introduction Motivation Model Results Our mechanism is used as follows: • Sample m data points i.i.d. • Ask agents for their labels • Use the SP mechanism on the labeled data, and return the result • Does it work? • Depends on our game-theoretic and learning-theoretic assumptions**Generalization (cont.)**Introduction Motivation Model Model Results Results Y1 DX Y3 Y2**The “truthful approach”**Introduction Motivation Model Results • Assumption A: Agents do not lie unless they gain at least ε • Theorem:W.h.p. the following occurs • There is no ε-beneficial lie • Approximation ratio (if no one lies) is close to 3 - 2/n • Corollary: with enough samples, the expected approximation ratio is close to • The number of required samples is polynomial in n and 1/ε R. Meir, A. D. Procaccia and J. S. Rosenschein, Strategyproof Classification with Shared Inputs, IJCAI 2009**The “Rational approach”**Introduction Motivation Model Results Assumption B: Agents always pick a dominant strategy, if one exists. Theorem: with enough samples, the expected approximation ratio is close to 3 – 2/n The number of required samples is polynomial in 1/ε (and not on n) R. Meir, A. D. Procaccia and J. S. Rosenschein, Strategyproof Classification with Shared Inputs, IJCAI 2009**Future work**Introduction Motivation Model Results Alternative assumptions on structure of data Other models of strategic behavior To better understand the relation between our model and other domains, such as judgment aggregation Better characterization results on special cases Other concept classes Other loss functions (linear loss, quadratic loss,…) …**Talk Based on the Following Papers:**• Strategyproof Classification Under Constant Hypotheses: A Tale of Two Functions, Reshef Meir, Ariel D. Procaccia, and Jeffrey S. Rosenschein. The Twenty-Third National Conference on Artificial Intelligence (AAAI 2008), Chicago, Illinois, July 2008, pages 126-131. • Strategyproof Classification with Shared Inputs, Reshef Meir, Ariel D. Procaccia, and Jeffrey S. Rosenschein. The Twenty-First International Joint Conference on Artificial Intelligence (IJCAI 2009), Pasadena, California, July 2009, pages 220-225. • On the Limits of Dictatorial Classification, Reshef Meir, Ariel D. Procaccia, and Jeffrey S. Rosenschein. The Ninth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010), Toronto, May 2010. • Tight Bounds for Strategyproof Classification, Reshef Meir, ShaullAlmagor, AssafMichaely, and Jeffrey S. Rosenschein. The Tenth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2011), Taipei, Taiwan, May 2011, pages 319-326.