1 / 33

The Informational Complexity of Interactive Machine Learning

The Informational Complexity of Interactive Machine Learning. Steve Hanneke. Passive Learning. Data Source. Expert / Oracle. Learning Algorithm. Raw Unlabeled Data. Labeled examples. Algorithm outputs a classifier. Learning by Interaction: The Big Picture. Data Source.

duane
Download Presentation

The Informational Complexity of Interactive Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Informational Complexity of Interactive Machine Learning Steve Hanneke

  2. Passive Learning Data Source Expert / Oracle Learning Algorithm Raw Unlabeled Data Labeled examples Algorithm outputs a classifier Steve Hanneke

  3. Learning by Interaction: The Big Picture Data Source Learning Algorithm Expert / Oracle Raw Unlabeled Data Learner asks a question about the data Expert answers the question Learner asks a question about the data Expert answers the question . . . Algorithm outputs a classifier Steve Hanneke

  4. Interactive Learning: A Manifesto • Machine learning is a collaborative effort between human and machine. • In passive learning, there is often a bottleneck on the human side (data annotation). • Conclusion: Passive algorithms are lazy collaborators. • Interactive algorithms may only require the human to expend effort providing relevant details, minimizing unnecessary redundancy. Steve Hanneke

  5. The Value of Interaction • But how much improvement can we expect for any particular learning problem? • How much interaction is necessary and sufficient for learning? Steve Hanneke

  6. Outline • Active learning with label requests • Disagreement Coefficient (Hanneke, ICML 2007) • Teaching Dimension (Hanneke, COLT 2007) • Class-conditional queries • Arbitrary Sample-based queries Steve Hanneke

  7. Active Learning with Label Requests Steve Hanneke

  8. Active Learning with Label Requests • This is clearly an upper bound on the label complexity of active learning. • Other than noise rate, VC dimension summarizes sample complexity. • The algorithm achieving this is ERM, and often must be approximated. Steve Hanneke

  9. Outline • Active learning with label requests • Disagreement Coefficient (Hanneke, ICML 2007) • Teaching Dimension (Hanneke, COLT 2007) • Class-conditional queries • Arbitrary Sample-based queries Steve Hanneke

  10. Reducing Uncertainty “Real knowledge is to know the extent of one’s ignorance.” -- Confucius “As we know, There are known knowns. There are things we know we know. We also know There are known unknowns. That is to say We know there are some things We do not know. But there are also unknown unknowns, The ones we don't know We don't know.” —Donald Rumsfeld, Feb. 12, 2002, Department of Defense news briefing Steve Hanneke

  11. Reducing Uncertainty DIS(B(h,r)) h Concepts in B(h,r) look like this Steve Hanneke

  12. Labeled Data Version Space Reducing Uncertainty: A2 Algorithm Version Space-based Passive Learning Add the labeled example to the data set. Repeat (x,y) x D x Sample an example from the distribution. h h h y Discard concepts we are statistically confident are suboptimal. Request its label from the Expert. Expert Steve Hanneke

  13. Reducing Uncertainty: A2 Algorithm • A2 (Balcan, Beygelzimer & Langford, 2006) Steve Hanneke

  14. Labeled Data Version Space Reducing Uncertainty: A2 Algorithm • A2 [BBL06] – (slightly oversimplified explanation) Add the labeled example to the data set. Version Space-based Agnostic Active Learning If it is not in the region of disagreement, ignore it (move on to next sample). Repeat (x,y) x x D x Sample an example from the distribution. h h h Discard concepts we are statistically confident are suboptimal (wrt the filtered distribution). y If it is in the region of disagreement, request its label from the Expert. Expert Steve Hanneke

  15. Reducing Uncertainty Steve Hanneke

  16. Outline • Active learning with label requests • Disagreement Coefficient (Hanneke, ICML 2007) • Teaching Dimension (Hanneke, COLT 2007) • Class-conditional queries • Arbitrary Sample-based queries Steve Hanneke

  17. Exact Learning: Halving Algorithm • Suppose we can hand the teacher a concept, and ask for an example that contradicts it if one exists. (Equivalence queries) • The Halving algorithm (Littlestone, 88): • Let hmaj be the majority vote concept of C • Ask for an example (X,Y) where hmaj is wrong • If no such example exists, return hmaj • Else remove from C any h with h(X)  Y • The Halving algorithm needs at most log|C| queries to identify any target function in C. Steve Hanneke

  18. Exact Learning: Membership Queries • Suppose, instead of equivalence queries, we can request the label of any example in X. • We still want to run the Halving algorithm. • How many label requests does it take to build an equivalence query? Steve Hanneke

  19. Teaching Dimension (Hegedüs, 95) Steve Hanneke

  20. Teaching Dimension for PAC Say V is linear separators. Sample U from D. A specifying set uniquely identifies (at most) one labeling in V[U]. As an example, take f to be this colored region. Steve Hanneke

  21. XTD and Label Complexity Steve Hanneke

  22. XTD and Label Complexity Conjecture: a bound of this form is valid, even with no knowledge of the noise rate (i.e., for agnostic learning). Steve Hanneke

  23. Outline • Active learning with label requests • Disagreement Coefficient (Hanneke, ICML 2007) • Teaching Dimension (Hanneke, COLT 2007) • Class-conditional queries • Arbitrary Sample-based queries Steve Hanneke

  24. What about other types of queries? • Ask the question you want answered For example, consider multiclass image classification. Perhaps learning would be easier if only the algorithm had an image of a car. What’s this a picture of? Horse Planet Person Car Steve Hanneke

  25. Class-Conditional Queries • Ask the question you want answered For example, consider multiclass image classification. Perhaps learning would be easier if only the algorithm had an image of a car. Click on a picture of a car, if there is one. Can do this for each class individually (except perhaps the “other” class) Steve Hanneke

  26. Class-Conditional Queries • A concrete example: Conjunctions (without noise). Steve Hanneke

  27. Outline • Active learning with label requests • Disagreement Coefficient (Hanneke, ICML 2007) • Teaching Dimension (Hanneke, COLT 2007) • Class-conditional queries • Arbitrary Sample-based queries Steve Hanneke

  28. Arbitrary Example-based Queries • Suppose we let the algorithm ask any question it wants about the data labels. Steve Hanneke

  29. Cost Complexity Steve Hanneke

  30. Questions?(cost = free )

  31. Open Problems for Label Queries • The value of having more unlabeled data? (especially for Agnostic learning). • “Optimal” agnostic active learning algorithm? Steve Hanneke

  32. Open Problems • Unknown cost functions E.g., maybe examples near the separator are more expensive to label. • Other types of queries: E.g., “give me a rule/explanation you used to decide the label of this example.” Steve Hanneke

  33. Definition of GIC • Say the teacher gets drunk, and doesn’t necessarily answer accurately. But she manages to scribble her answers to every question on a piece of paper. • We have a spy who steals the paper and photocopies it. • The spy tells us exactly which questions to ask so that using minimum cost there is at most one concept in C consistent with the answers. • Define GIC(C,c) as the worst-case cost of this game. Steve Hanneke

More Related