1 / 19

Teaching Dimension and the Complexity of Active Learning

Teaching Dimension and the Complexity of Active Learning. Steve Hanneke Machine Learning Department Carnegie Mellon University shanneke@cs.cmu.edu. Passive Learning. Data Source. Expert / Oracle. Learning Algorithm. Raw Unlabeled Data. Labeled examples.

huyen
Download Presentation

Teaching Dimension and the Complexity of Active Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Teaching Dimension and the Complexity of Active Learning Steve Hanneke Machine Learning Department Carnegie Mellon University shanneke@cs.cmu.edu

  2. Passive Learning Data Source Expert / Oracle Learning Algorithm Raw Unlabeled Data Labeled examples Algorithm outputs a classifier Steve Hanneke 2

  3. Active Learning Data Source Learning Algorithm Expert / Oracle Raw Unlabeled Data How many label requests are needed to learn? Request for the label of an example The label of that example Request for the label of another example Label Complexity The label of that example . . . Algorithm outputs a classifier Steve Hanneke 3

  4. Outline • Formal Model • Extended Teaching Dimension • Generalization to statistical learning • Main Result: Upper Bound on the Label Complexity Steve Hanneke 4

  5. Formal Model Steve Hanneke 5

  6. History Exact Learning Extended Teaching Dimension (XTD(C)): [Hegedüs,95] Number of membership queries to build an equivalence query. Halving Algorithm  XTD(C) log |C| membership queries suffice. Such an R is called a “specifying set for f w.r.t. C” Unavoidable (due to lower bound) [Kääriäinen,06] Steve Hanneke 6

  7. An Example: Discrete Thresholds • Suppose C is thresholds on these points. - - - - + - + - - - + + + - - + - - + + + - + + + + - + + + Steve Hanneke 7

  8. An Example: Discrete Thresholds • Suppose C is thresholds on these points. • For each f, find a smallest set of examples s.t. there is at most one concept in C that agrees with f on them. (i.e., a minimal specifying set) • So, for discrete thresholds, XTD(C) = 2. - - - - + - - - - - - - - - + - - - - - + + + - + + - + - + - - - - - - - - + - - + - + + + + + + + + + + + + + Steve Hanneke 8

  9. Extended Teaching Dimension • What about PAC-style active learning? • Let’s look at Thresholds on [0,1]. • XTD(C) = ∞ • This doesn’t make sense anymore, but can we generalize it? 1 0 - + Steve Hanneke 9

  10. Extended Teaching Dimension • What about PAC-style active learning? • Let’s look at Thresholds on [0,1]. • Let’s use XTD for a finite sample U. • Now XTD(C,U)=2 again. 1 0 Steve Hanneke 10

  11. Extended Teaching Dimension • Formally, X e.g. linear separators f Steve Hanneke 11

  12. Obvious Bound (Realizable) • Formally, • “Obvious” bound: Steve Hanneke 12

  13. Distribution-Dependent (Realizable) • Suppose the target is f. hmaj hmaj Steve Hanneke 13

  14. What About Noisy Labels? • Two stage algorithm: “Reduction” and “Error Correction” • Reduction: Halving-like. Focus on hmaj. Reduce size of version space by constant factor on each iteration. -- produces a concept with a “decent” error rate. • Error Correction: Given a “decent” concept, use it to label a large unlabeled data set, but with some tricks to correct mistakes. Steve Hanneke 14

  15. What About Noisy Labels? Stage 1 Steve Hanneke 15

  16. Open Problem • Conjecture: The bound holds for Agnostic Learning, (up to constant factors). Steve Hanneke 16

  17. Thank You Steve Hanneke 17

  18. Shameless Plug • Hanneke, S. (2007). A Bound on the Label Complexity of Agnostic Active Learning. ICML 2007. • Disagreement Coefficient – Sometimes not as tight as XTD, but much simpler to calculate and comprehend, and applies directly to agnostic learning. Steve Hanneke 18

  19. How Do We Use This? • For example, for “p-balanced” Axis-aligned rectangles in n dimensions, under any continuous product distribution, To bound XTD(f,C,D,m,), there are several cases to consider. Case 1: f is very unbalanced -- easy. Case 2: f is very different from a rectangle – also easy. Case 3: f is pretty close to being a rectangle. Steve Hanneke 19

More Related