1 / 31

The True Sample Complexity of Active Learning

Joint work with. Maria-Florina Balcan Computer Science Department Carnegie Mellon University ninamf@cs.cmu.edu. Jennifer Wortman Computer and Information Science University of Pennsylvania wortmanj@seas.upenn.edu. and. The True Sample Complexity of Active Learning. Steve Hanneke

kelton
Download Presentation

The True Sample Complexity of Active Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Joint work with Maria-Florina Balcan Computer Science Department Carnegie Mellon University ninamf@cs.cmu.edu Jennifer Wortman Computer and Information Science University of Pennsylvania wortmanj@seas.upenn.edu and The True Sample Complexity of Active Learning Steve Hanneke Machine Learning Department Carnegie Mellon University shanneke@cs.cmu.edu TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA

  2. Passive Learning Data Source Expert / Oracle Learning Algorithm Raw Unlabeled Data Labeled examples Algorithm outputs a classifier Steve Hanneke 2

  3. Active Learning Data Source Learning Algorithm Expert / Oracle Raw Unlabeled Data Request for the label of an example The label of that example Request for the label of another example The label of that example . . . Algorithm outputs a classifier Steve Hanneke 3

  4. Active Learning Data Source Learning Algorithm Expert / Oracle How many label requests are required to learn? Sample Complexity Raw Unlabeled Data Request for the label of an example The label of that example Request for the label of another example The label of that example e.g., Das04, Das05, DKM05, BBL06, Kaa06, Han07a&b, BBZ07, DHM07 . . . Algorithm outputs a classifier Steve Hanneke 4

  5. Active Learning Sometimes Helps An Example: 1-dimensional threshold functions. - + Steve Hanneke 5

  6. Active Learning Sometimes Helps An Example: 1-dimensional threshold functions. Take m unlabeled examples Repeatedly request the label of the median point between -/+ boundaries. Take any threshold consistent with the observed labels. - - + + + + - - - - - - - - - + + + Used only log(m) label requests, but get a classifier consistent with all m examples! Exponential improvement over passive! Steve Hanneke 6

  7. Outline • Formal Model • An Illustrative Example • Exciting New Results  • Conclusions & Open Problems Steve Hanneke 7

  8. Formal Model Steve Hanneke 8

  9. Sample Complexity target-dependent budget Steve Hanneke 9

  10. A More Subtle Example: Intervals Suppose D is uniform on [0,1] - + - 0 1 Steve Hanneke 10

  11. A More Subtle Example: Intervals 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 } } } } } } } } } } } } } } } } } } } } } } } } } } } } } } } } Suppose D is uniform on [0,1] 0 1 - Suppose the target labels everything “-1” How many labels are needed to verify that a given h has er(h) < ? Need (1/) label requests to guarantee the target isn’t one of these  (1/) sample complexity? Steve Hanneke 11

  12. A More Subtle Example: Intervals Algorithm: Take a large number of unlabeled examples. Phase 1: Query random examples until we find a +1 example. (if use all n label requests before finding a +1 example, return the empty interval) Phase 2: Do binary searches to the left and right of the +1 point. After n total label requests, return any consistent hC. - + + + - - - - - - - - - Steve Hanneke 12

  13. Can Active Learning Always Help? Steve Hanneke 13

  14. Active Learning Always Helps! [HLW94] passive algorithm has O(1/) sample complexity. Steve Hanneke 14

  15. Proof Outline Claim 1: The result is certainly true for “threshold-esc” problems – where the problem gets easier the longer we work at it (based on [Hanneke07], “disagreement coefficient” analysis) Claim 2: Any C can be partitioned into C1,C2,C3,… with this property. Claim 3: There is an aggregation algorithm that uses all of C1,C2,C3,… but is never much worse than using just the Ci that contains the target f.

  16. How Much Better? In a sense, the o(1/) results is the strongest possible result at this level of generality. But can we prove stronger results for specific pairs (C,D)? Steve Hanneke 16

  17. Exponential Improvements It is often possible to achieve polylogarithmic sample complexity for all targets. For example: linear separators, under uniform distributions on an r-sphere Axis-aligned rectangles, under uniform distributions on [0,1]r Decision trees with axis-aligned splits, under uniform distributions on [0,1]r Finite unions of intervals on the real line (arbitrary distributions) Can also preserve polylog sample complexities under some transformations: Unions, “close” distributions, mixtures of distributions Steve Hanneke 17

  18. Conclusions Active learning can always achieve a strictly superior asymptotic sample complexity compared to passive learning. For most natural learning problems, the improvements are exponential. Often the amount of improvement is not detectable from observable quantities (e.g., for intervals). Steve Hanneke 18

  19. Open Problems Generalizing to agnostic learning General conditions for learnability at an exponential rate Reversing the order of quantifiers regarding D in the model Steve Hanneke 19

  20. Thank You Special thanks to Yishay Mansour for suggesting this perspective, and to Eyal Even-Dar, Michael Kearns, Larry Wasserman & Eric Xing for discussions and feedback. Steve Hanneke 20

  21. Models of Learning Number of labels before we can find a good classifier vs number of labels before we can both find a good classifier and know we have found one. Steve Hanneke 21

  22. How Much Better? In a sense, the o(1/) results is the strongest possible result at this level of generality. But can we prove stronger results for specific pairs (C,D)? 1 (2) 2 (3) (3) (3) (3) 3 … 0 Steve Hanneke 22

  23. Exponential Improvements We can extend these results by altering the hypothesis class or distribution in ways that preserve learnability at an exponential rate. Steve Hanneke 23

  24. Proof Sketch Steve Hanneke 24

  25. Proof Sketch Steve Hanneke 25

  26. Proof Sketch Steve Hanneke 26

  27. Proof Sketch Steve Hanneke 27

  28. Proof Sketch … … … … … … … Maximum recursion depth ≤ d Steve Hanneke 28

  29. Proof Sketch Steve Hanneke 29

  30. Proof Sketch Steve Hanneke 30

  31. Proof Sketch Steve Hanneke 31

More Related