An introduction to self-taught learning

An introduction to self-taught learning Presented by: Zenglin Xu 10-09-2007 [Raina et. al, 2007] Self-taught Learning: Transfer Learning from Unlabeled Data

Outline • Related learning paradigms • A self-taught learning algorithm

Related learning paradigms • Semi-supervised learning • Transfer learning • Multi-task learning • Domain adaptation • Biased sample selection • Self-taught learning

Semi-supervised learning • Except the training data (labeled), a large set of test data (unlabeled) are available • The training data and test data are drawn from the same distribution • Unlabeled data can be assigned with supervised learning task’s class labels • Reference • [Chapelle, et. al, 2006 ] Semi-supervised learning • [Zhu, 2005] Semi-supervised learning literature survey

Transfer learning • Transfer Learning • The Theory of Transfer of Learning was introduced by Thorndike and Woodworth (1901). They explored how individuals would transfer learning in one context to another context that shared similar characteristics • Transfer of knowledge from one supervised task to other; Requires labeled data from a different but related task • E.g., transferring the knowledge from Newsgroup data to Reuters data • Related work in computer science • [Thrun & Mitchell, 1995] Learning one more thing • [Ando & Zhang, 2005] A framework for learning predictive structures from multiple tasks and unlabeled data

Multi-task learning • It learns a problem together with other related problems at the same time, using a shared representation. • This often leads to a better model for the main task, because it allows the learner to use the commonality among the tasks. • Multi-task learning is a kind of inductive transfer. • It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better. • Reference, • [Caruana, 1997] Multitask Learning • [Ben-David & Schuller, 2003] Exploiting task relatedness for multiple task learning

Domain adaptation • A term hot in language processing • Indeed, it can be called transfer learning • The supervised setting is usually like: • A large pool of out-of-domain labeled data • A small pool of in-domain labeled data • Reference • [Daume III, 2007] Frustratingly Easy Domain Adaptation • [Daume III & Marcu , 2006] Domain Adaptation for Statistical Classifiers • [Ben-David et. al, 2006 ] Analysis of Representations for Domain Adaptation

Biased sample selection • Also called Covariance Shift • It deals with the case that the training data and test data are selected from different distributions in the same domain • The objective is to correct the bias • Reference • [Shimodaira, 2000] Improving predictive inference under covariate shift… • [Zadrozny, 2004] Learning and evaluating classifiers under sample selection bias • [Bickel et. al, 2007] Discriminative learning for differing training and test distributions

Self-taught learning • Self-taught learning • Uses unlabeled data • Does not require unlabeled data to have same generative distribution • The unlabeled data can have different labels as those of the supervised learning task’s data. • Reference: • [Raina et. al] Self-taught learning: transfer learning from unlabeled data

Outline • Related learning paradigms • A self-taught learning algorithm • Algorithm • Experiment

Sparse coding – a self-taught learning algorithm • Learn high level feature representation using unlabeled data E. g. random unlabeled images usually contain basic visual patterns (like edges) that are similar to images (like that of elephant) which needs to be classified • Apply the representation to the labeled data and use it for classification

Step 1 – learning higher level representations Given unlabeled data Optimize the following where are the basis vectors and are the activations

Bases learned from image patches and speech data

Step 2: apply the representation to the labeled data and use it for classification

High-level features computed Using a set of 512 learned image bases (Fig 2 left), Figure 3 illustrates a solution to the previous optimization problem

High-level features computed

Connection to PCA

Connection to PCA • PCA results in linear feature extraction, in that the features a(i)j are simply a linear function of the input. • The bases bj should be orthogonal, thus the number of PCA features cannot be greater than the dimension n of the input. Sparse coding does not have either of these limitations

Outline • Related Learning paradigms • A self-taught learning algorithm • Algorithm • Experiment

Experiment setting

Experimental results on image

Experimental results on characters

Experimental results on music data

Experimental results on text data

Compare with results using features trained on labeled data Table 7. Accuracy on the self-taught learning tasks when sparse coding bases are learned on unlabeled data (third column), or when principal components/sparse coding bases are learned on the labeled training set (fourth/fth column).

Discussion • Is it useful to learn a high-level feature representation in a unified process using both the labeled data and the unlabeled data? • How the similarity between the labeled data and the unlabeled data affect the performance? • And more?

An introduction to self-taught learning

An introduction to self-taught learning

Presentation Transcript

An Introduction To Service- Learning

Self-taught Learning Transfer Learning from Unlabeled Data

Tao An: Self-introduction

An Introduction to Reinforcement Learning

To Learn or to be Taught Harnessing Technology to Enhance Self Regulated Learning

An Introduction to Team-Based Learning

An Introduction to Machine Learning

An Introduction to Algorithmic Tile Self-Assembly

An Introduction to Adaptive Learning

An Introduction to self-employment

An Introduction to Acceleration: Learning Goal

An Introduction to E-learning

An Introduction to RWSN and Self Supply

Thomas Rekem Self Taught Designer

An Introduction to Acceleration: Learning Goal

an introduction to: Deep Learning

An introduction to self-similarity in physics

An introduction to Problem Based Learning

An introduction to learning technologies