1 / 19

Transductive Rademacher Complexity and its Applications

Transductive Rademacher Complexity and its Applications. Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A A A A.

keiran
Download Presentation

Transductive Rademacher Complexity and its Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transductive Rademacher Complexity and its Applications Ran El-Yaniv and Dmitry Pechyony Technion – Israel Institute of Technology, Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAAAA

  2. Induction vs. Transduction Inductive learning: Goal: minimize learning algorithm training set Distribution of examples unlabeled examples hypothesis labels Transductive learning (Vapnik ’74,’98): Goal: minimize training set labels of the test set learning algorithm test set

  3. Distribution-free Model[Vapnik ’74,’98] X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X • Given: “Full sample” of unlabeled examples, each with its true (unknown) label.

  4. Distribution-free Model [Vapnik ’74,’98] X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X • Given: “Full sample” of unlabeled examples, each withits true (unknown) label. • Full sample is partitioned: • training set (m points) • test set (u points)

  5. Distribution-free Model[Vapnik ’74,’98] X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X • Given: “Full sample” of unlabeled examples, each with its true (unknown) label. • Full sample is partitioned: • training set (m points) • test set (u points) • Labels of the training examples are revealed.

  6. Distribution-free Model[Vapnik ’74,’98] ? ? ? ? • Goal: Label test examples ? ? X ? ? ? ? X ? ? ? X ? ? ? X ? ? ? ? X ? ? ? X ? ? ? ? ? ? ? X • Given: “Full sample” of unlabeled examples, each with its true (unknown) label. • Full sample is partitioned: • training set (m points) • test set (u points) • Labels of the training points are revealed.

  7. Rademacher complexity Induction Hypothesis space : set of functions . - training points. - i.i.d. random variables, Rademacher: Transduction(version 1) Hypothesis space : set of vectors,. - full sample with training and test points. - distributed as in induction. Rademacher:

  8. Transductive Rademacher complexity Version 1: - full sample with training and test points. - transductive hypothesis space. - i.i.d. random variables distributed by : . Rademacher complexity: Version 2: sparse distribution, , of Rademacher variables We develop risk bounds with . Lemma 1: .

  9. Risk bound Notation: - 0/1 error of on test examples . - empirical -margin error of on training examples . Theorem: For any , with probability at least over the random partition of the full sample into , for all hypotheses it holds that . Proof: based on and inspired by the results of [McDiarmid, ‘89], [Bartlett and Mendelson, ‘02] and [Meir and Zhang, ‘03]. Previous results: [Lanckriet et al., ‘04] - case of .

  10. Inductive vs. Transductive hypothesis spaces Induction: To use the risk bounds, the hypothesis space should be defined before observing the training set. Transduction: The hypothesis space can be defined after observing , but before observing the actual partition . Conclusion: Transduction allows for the choosing a data-dependent hypothesis space. For example, it can be optimized to have low Rademacher complexity. This cannot be done in induction!

  11. Another view on transductive algorithms learner Example: - inverse of graph Laplacian iff ; otherwise. matrix compute vector compute Unlabeled-Labeled Decomposition (ULD)

  12. Bounding Rademacher complexity Hypothesis space: the set of all , obtained by operating transductive algorithm on all possible partitions . Notation: , - set of ‘s generated by . - all singular values of . Lemma 2: Lemma 2 justifies the spectral transformations performed to improve the performance of transductive algorithms ([Chapelle et al.,’02], [Joachims,’03], [Zhang and Ando,‘05]). .

  13. Bounds for graph-based algorithms Consistency method [Zhou, Bousquet, Lal, Weston, Scholkopf, ‘03]: where are singular values of . Similar bounds for the algorithms of [Joachims,’03], [Belkin et al., ‘04], etc.

  14. Topics not covered • Bounding the Rademacher complexity when is a kernel matrix. • For some algorithms: data-dependent method of computing probabilistic upper and lower boundson Rademacher complexity. • Risk bound for transductive mixtures.

  15. Direction for future research Tighten the risk bound to allow effective model selection: • Bound depending on 0/1 empirical error. • Usage of variance information to obtain better convergence rate. • Local transductive Rademacher complexity. • Clever data-dependent choice of low-Rademacher hypothesis spaces.

  16. Questions ?

  17. Monte Carlo estimation of transductive Rademacher complexity Rademacher: . Draw uniformly vectors of Rademacher variables, . By Hoeffding inequality: for any , with prob. at least , . How to compute the supremum? For the Consistency Method of [Zhou et al., ‘03] can be computed in time. Symmetric Hoeffding inequality probabilistic lower bound on the transductive Rademacher complexity.

  18. Induction vs. Transduction: differences • Transduction • No unknown distribution. Each example has unique label. • Induction • Unknown underlying distribution • Dependent training and test examples. • Independent training examples. • Test examples not known. Will be sampled from the same distribution. • Test examples are known. • Generate a general hypothesis. • Want generalization! • Only classify given examples. • No generalization!

  19. Justification of spectral transformations , - set of ‘s generated by . - all singular values of . Lemma 2: . Lemma 2 justifies the spectral transformations performed to improve the performance of transductive algorithms ([Chapelle et al.,’02], [Joachims,’03], [Zhang and Ando,‘05]).

More Related