320 likes | 885 Views
An introduction to self-taught learning. Presented by: Zenglin Xu 10-09-2007. [Raina et. al, 2007] Self-taught Learning: Transfer Learning from Unlabeled Data. Outline. Related learning paradigms A self-taught learning algorithm. Related learning paradigms. Semi-supervised learning
E N D
An introduction to self-taught learning Presented by: Zenglin Xu 10-09-2007 [Raina et. al, 2007] Self-taught Learning: Transfer Learning from Unlabeled Data
Outline • Related learning paradigms • A self-taught learning algorithm
Related learning paradigms • Semi-supervised learning • Transfer learning • Multi-task learning • Domain adaptation • Biased sample selection • Self-taught learning
Semi-supervised learning • Except the training data (labeled), a large set of test data (unlabeled) are available • The training data and test data are drawn from the same distribution • Unlabeled data can be assigned with supervised learning task’s class labels • Reference • [Chapelle, et. al, 2006 ] Semi-supervised learning • [Zhu, 2005] Semi-supervised learning literature survey
Transfer learning • Transfer Learning • The Theory of Transfer of Learning was introduced by Thorndike and Woodworth (1901). They explored how individuals would transfer learning in one context to another context that shared similar characteristics • Transfer of knowledge from one supervised task to other; Requires labeled data from a different but related task • E.g., transferring the knowledge from Newsgroup data to Reuters data • Related work in computer science • [Thrun & Mitchell, 1995] Learning one more thing • [Ando & Zhang, 2005] A framework for learning predictive structures from multiple tasks and unlabeled data
Multi-task learning • It learns a problem together with other related problems at the same time, using a shared representation. • This often leads to a better model for the main task, because it allows the learner to use the commonality among the tasks. • Multi-task learning is a kind of inductive transfer. • It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better. • Reference, • [Caruana, 1997] Multitask Learning • [Ben-David & Schuller, 2003] Exploiting task relatedness for multiple task learning
Domain adaptation • A term hot in language processing • Indeed, it can be called transfer learning • The supervised setting is usually like: • A large pool of out-of-domain labeled data • A small pool of in-domain labeled data • Reference • [Daume III, 2007] Frustratingly Easy Domain Adaptation • [Daume III & Marcu , 2006] Domain Adaptation for Statistical Classifiers • [Ben-David et. al, 2006 ] Analysis of Representations for Domain Adaptation
Biased sample selection • Also called Covariance Shift • It deals with the case that the training data and test data are selected from different distributions in the same domain • The objective is to correct the bias • Reference • [Shimodaira, 2000] Improving predictive inference under covariate shift… • [Zadrozny, 2004] Learning and evaluating classifiers under sample selection bias • [Bickel et. al, 2007] Discriminative learning for differing training and test distributions
Self-taught learning • Self-taught learning • Uses unlabeled data • Does not require unlabeled data to have same generative distribution • The unlabeled data can have different labels as those of the supervised learning task’s data. • Reference: • [Raina et. al] Self-taught learning: transfer learning from unlabeled data
Outline • Related learning paradigms • A self-taught learning algorithm • Algorithm • Experiment
Sparse coding – a self-taught learning algorithm • Learn high level feature representation using unlabeled data E. g. random unlabeled images usually contain basic visual patterns (like edges) that are similar to images (like that of elephant) which needs to be classified • Apply the representation to the labeled data and use it for classification
Step 1 – learning higher level representations Given unlabeled data Optimize the following where are the basis vectors and are the activations
Step 2: apply the representation to the labeled data and use it for classification
High-level features computed Using a set of 512 learned image bases (Fig 2 left), Figure 3 illustrates a solution to the previous optimization problem
Connection to PCA • PCA results in linear feature extraction, in that the features a(i)j are simply a linear function of the input. • The bases bj should be orthogonal, thus the number of PCA features cannot be greater than the dimension n of the input. Sparse coding does not have either of these limitations
Outline • Related Learning paradigms • A self-taught learning algorithm • Algorithm • Experiment
Compare with results using features trained on labeled data Table 7. Accuracy on the self-taught learning tasks when sparse coding bases are learned on unlabeled data (third column), or when principal components/sparse coding bases are learned on the labeled training set (fourth/fth column).
Discussion • Is it useful to learn a high-level feature representation in a unified process using both the labeled data and the unlabeled data? • How the similarity between the labeled data and the unlabeled data affect the performance? • And more?