1 / 26

Universal Learning over Related Distributions and Adaptive Graph Transduction

Universal Learning over Related Distributions and Adaptive Graph Transduction. Erheng Zhong † , Wei Fan ‡ , Jing Peng * , Olivier Verscheure ‡ , and Jiangtao Ren † † Sun Yat-Sen University ‡ IBM T. J. Watson Research Center *Montclair State University.

madge
Download Presentation

Universal Learning over Related Distributions and Adaptive Graph Transduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong†, Wei Fan‡, Jing Peng*, Olivier Verscheure‡, and Jiangtao Ren† †Sun Yat-Sen University ‡IBM T. J. Watson Research Center *Montclair State University • Go beyond transfer learning to sample selection biasand uncertainty mining • Unified framework • One single solution: supervised case

  2. Standard Supervised Learning training (labeled) test (unlabeled) Classifier 85.5% New York Times New York Times 2

  3. Sample Selection Bias training (labeled) test (unlabeled) Classifier 78.5% 85.5% New York Times New York Times Have a different word vector distribution August: a lot about typhoon in Taiwan September: a lot about US Open 3

  4. Uncertainty Data Mining • Training Data: • Both feature vectors and class labels contain noise (usually Gaussian) • Common for data collected from sensor network • Testing data: • Feature vector contain noises

  5. Summary • Traditional supervised learning: • Training and testing data follow the identical distribution • Transfer learning: • from different domains • Sample selection bias: • from same domain but distribution is different • such as, missing not at random • Uncertain data mining: • data contains noise • In other words: in all three cases, training and testing data are from different distributions. • Traditionally, each problem is handled separately.

  6. Main Challenge Could one solve these different but similar problems under a uniform framework? With the same solution? Universal Learning

  7. Universal Learning • is the subsets of X that are the support of some hypothesis in a fixed hypothesis space ([Blitzer et al, 2008] • The distance between two distributions ([Blitzer et al, 2008]

  8. How to Handle Universal Learning? • Most traditional classifiers could not guarantee the performance when training and test distributions are different. • Could we find one classifier under weeker assumption? Graph Transduction?

  9. Advantage of Graph Transduction Weaker assumption that the decision boundary lies on the low density regions of the unlabeled data. Two-Gaussians vs. Two-arcs

  10. Just Graph Transduction? • “Un-smooth label”(more examples in low density region) and “class imbalance” problems ([Wang et al, 2008]) may mislead the decision boundary to go through the high density regions. • Bottom part closestred square • More red square thanblue square Sample Selection: which samples?

  11. Maximum Margin Graph Transduction In margin-terms, unlabeled data with low margin are likely misclassified! Good sample Bad sample

  12. Main Flow Predict the labels of unlabeled data Maximize the unlabeled data margin Lift the unlabeled data margin

  13. Properties Adaptive Graph Transduction can be bounded Training error in terms of approximating the ideal hypothesis Emprical distance between training and test distribution Error of the ideal hypothesis

  14. Properties If one classifier has larger unlabeled data margin, it will make the training error smaller (recall last theorem) Average ensemble is likely to achieve larger margin

  15. Experiment – Data Set Transfer Learning Reuters: 21758 Reuters news articles SyskillWebert: HTML source of web pages plus the ratings of a user on those web pages SyskillWebert Reuters Target-Domain Source-Domain Sheep Bands-recording First fill up the “GAP”, then use knn classifier to do classification Biomedical org place Goats Target-Domain org.subA place.subA First fill up the “GAP”, then use knn classifier to do classification org.subB place.subB Source-Domain

  16. Experiment – Data Set Sample Selection Bias Correction UCI data set: Ionosphere, Diabetes, Haberman, WDBC Feature 1 Feature 2 • Uncertainty Mining • Kent Ridge Biomedical Repository: high dimensional, low sample size (HDLSS) • Randomly select 50% of the features, and then sort the data set according to each selected features; • we attain top instances from every sorted list as training set; Generate two different Gaussian Noises and add them into training and test set

  17. Experiment -- Baseline methods Original graph transduction algorithm ([Zhu, 2005]) Using the entire training data set Variation: choosing a randomly selected sample whose size is equal to the one chosen by MarginGraph CDSC: transfer learning approach ([Ling et al, 2008]) find a mapping space which optimizes over consistency measure between the out-domain supervision and in-domain intrinsic structure BRSD-BK/BRSD-DB: bias correction approach ([Ren et al, 2008]) discover structure and re-balance using unlabeled data

  18. Performance--Transfer Learning

  19. Perform best on 5 of 6 data sets!

  20. Perform best on 5 of 6 data sets!

  21. Performance--Sample Selection Bias Accuracy: Best on all 4 data sets! AUC: Best on 2 of 4 data sets.

  22. Performance--Uncertainty Mining Accuracy: Best on all 4 data sets! AUC: Best on all 4 data sets!

  23. Margin Analysis MarginBase is the base classifier of MarginGraph in each iteration. LowBase is a “minimal margin classifier” which selects samples for building a classifier with minimal unlabeled data margin. LowGraph is the averaging ensemble of LowBase.

  24. Maximal margin is better than minimal margin Ensemble is better than any single classifiers

  25. Conclusion Cover different formulations where the training and test set are drawn from related but different distributions. Flow Step-1 Sample selection -- Select labeled data from different distribution which could maximize the unlabeled data margin Step-2 Label Propagation -- Label the unlabeled data Step-3 Ensemble -- Further lift the unlabeled data margin Code and data available fromhttp://www.cs.columbia.edu/~wfan

  26. Thank you Hvala lepa

More Related