Maria-Florina Balcan Carnegie Mellon University

Margin-Based Active Learning Maria-Florina Balcan Carnegie Mellon University Joint with Andrei Broder & Tong Zhang Yahoo! Research Maria-Florina Balcan

Incorporating Unlabeled Data in the Learning Process • OCR, Image classification • Web page, document classification • All the classification problems at Yahoo! Research. Unlabeled data cheap & easy to obtain. Labeled data much more expensive. Maria-Florina Balcan

Semi-Supervised Passive Learning • Several SSL methods developed to use unlabeled data to improve performance, e.g.: • Transductive SVM[Joachims ’98] • Co-training[Blum & Mitchell ’98], • Graph-based methods[Blum & Chawla’01] • Unlabeled data - allows to focus on a priori reasonable classifiers. See Avrim’s talk at the “Open Problems” session. Maria-Florina Balcan

Active Learning • The learner can choose specific examples to be labeled: - The learner works harder to use fewer labeled examples. • Get a set of unlabeled examples from PX. This talk: linear separators. Setting • P distribution over X £ Y; hypothesis class C. Interactively request labels of any of these examples. Goal: find h with small error over P. Minimize the number of label requests. Maria-Florina Balcan

h3 h2 h1 h0 Can Adaptive Querying Help? [CAL ’92, Dasgupta ’04] C = {linear separators in R1}, realizable case. Active setting: O(log 1/) labels to find an -accurate threshold. Exponential improvement in sample complexity. In general,number of queries needed depends on C and P. C ={linear separators in R2}: for some target hyp. no improvement can be achieved. Learning to accuracy  requires 1/ labels. Maria-Florina Balcan

When Active Learning Helps In general,number of queries needed depends on C and P. C - homogeneous linear separators in Rd, PX - uniform distribution over unit sphere. Realizable case • O(d log 1/) labels to find a hypothesis with error . [Freund et al., ’97; Dasgupta, Kalai, Monteleoni ’05] Agnostic Case • low noise, O(d2 log 1/) labels to find a hypothesis with error . A2 algorithm [Balcan, Beygelzimer, Langford ’06] [Hanneke ’07] Maria-Florina Balcan

An Overview of Our Results Analyze a class of margin based active learning algorithms for learning linear separators. • C - homogeneous linear separators in Rd, PX - uniform distrib. over unit sphere get exponential improvement in the realizable case. • Naturally extend the analysis to the bounded noise setting. • Dimension independent bounds when we have a good margin distribution. Maria-Florina Balcan

Margin Based Active-Learning, Realizable Case Algorithm Draw m1 unlabeled examples, label them, add them to W(1). • iteratek=2, …, s • find a hypothesis wk-1 consistent with W(k-1). • W(k)=W(k-1). • sample mk unlabeled samples x • satisfying |wk-1¢ x| ·k-1 ; • label them and add them to W(k). • end iterate Maria-Florina Balcan

Margin Based Active-Learning, Realizable Case • Draw m1 unlabeled examples, label them, add them to W(1). • iteratek = 2, …, s • find a hypothesis wk-1 consistent with W(k-1). • W(k)=W(k-1). • sample mk unlabeled samples x satisfying |wk-1T¢ x| ·k-1 • label them and add them to W(k). 1 w2 w3 w1 2 Maria-Florina Balcan

u (u,v) v v u v  Margin Based Active-Learning, Realizable Case Theorem PX is uniform over Sd. If and then after iterations ws has error ·. Fact 1 Fact 2 If and Maria-Florina Balcan

w wk-1 w* k-1 Margin Based Active-Learning, Realizable Case • iteratek=2, … ,s • find a hypothesis wk-1 consistent with W(k-1). • W(k)=W(k-1). • sample mk unlabeled samples x • satisfying |wk-1T¢ x| ·k-1 • label them and add them to W(k). Proof Idea Induction: allw consistent with W(k) have error ·1/2k; so,wkhas error· 1/2k. For · 1/2k+1 Maria-Florina Balcan

w wk-1 w* k-1 Proof Idea Under the uniform distr. for · 1/2k+1 Maria-Florina Balcan

w wk-1 w* k-1 Proof Idea Under the uniform distr. for · 1/2k+1 Enough to ensure Can do with only labels. Maria-Florina Balcan

w wk-1 w* Realizable Case, Suboptimal Alternative Could imagine: zero Suboptimal Need need so and labels to find a hyp. with error . Similar to [CAL’92, BBL’06, H’07] Maria-Florina Balcan

Margin Based Active-Learning, Non-realizable Case Guarantee Assume PX is uniform over Sd. Assume that |P(Y=1|x)-P(Y=-1|x)| ¸ for all x. Assume w* is the Bayes classifier. Then The previous algorithm and proof extend naturally, and get again an exponential improvement. Maria-Florina Balcan

Summary • Analyze a class of margin based active learning algorithms for learning linear separators. Open Problems • Analyze a wider class of distributions, e.g. log-concave. • Characterize the right sample complexity terms for the Active Learning setting. Maria-Florina Balcan

Thank you ! Maria-Florina Balcan

Thank you ! Also, special thanks to: Alina Beygelzimer, Sanjoy Dasgupta, and John Langford for useful discussions. Maria-Florina Balcan

Maria-Florina Balcan Carnegie Mellon University

Maria-Florina Balcan Carnegie Mellon University

Presentation Transcript

Carnegie Mellon University

Maria-Florina Balcan Georgia Tech Avrim Blum Carnegie Mellon

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Avrim Blum Carnegie Mellon University Joint work with Maria-Florina Balcan and Yishay Mansour

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University

Carnegie Mellon University