Self taught learning transfer learning from unlabeled data l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

Self-taught Learning Transfer Learning from Unlabeled Data PowerPoint PPT Presentation


  • 132 Views
  • Uploaded on
  • Presentation posted in: General

Self-taught Learning Transfer Learning from Unlabeled Data. Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University. The “one learning algorithm” hypothesis.

Download Presentation

Self-taught Learning Transfer Learning from Unlabeled Data

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Self taught learning transfer learning from unlabeled data l.jpg

Self-taught LearningTransfer Learning from Unlabeled Data

Rajat Raina

Honglak Lee, Roger Grosse

Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer,

Narut Sereewattanawoot

Andrew Y. Ng

Stanford University


The one learning algorithm hypothesis l.jpg

The “one learning algorithm” hypothesis

  • There is some evidence that the human brain uses essentially the same algorithm to understand many different input modalities.

    • Example: Ferret experiments, in which the “input” for vision was plugged into auditory part of brain, and the auditory cortex learns to “see.” [Roe et al., 1992]

(Roe et al., 1992. Hawkins & Blakeslee, 2004)

Self-taught Learning


Slide3 l.jpg

The “one learning algorithm” hypothesis

  • There is some evidence that the human brain uses essentially the same algorithm to understand many different input modalities.

    • Example: Ferret experiments, in which the “input” for vision was plugged into auditory part of brain, and the auditory cortex learns to “see.” [Roe et al., 1992]

      If we could find this one learning algorithm,

      we would be done. (Finally!)

(Roe et al., 1992. Hawkins & Blakeslee, 2004)

Self-taught Learning


Finding a deep learning algorithm l.jpg

Finding a deep learning algorithm

  • If the brain really is one learning algorithm, it would suffice to just:

    • Find a learning algorithm for a single layer, and,

    • Show that it can build a small number of layers.

  • We evaluate our algorithms:

    • Against biology.

    • On applications.

  • e.g., Sparse RBMs for V2:

  • Poster yesterday (Lee et al.)

  • This talk

Self-taught Learning


Supervised learning l.jpg

Cars

Motorcycles

Supervised learning

Train

Test

Supervised learning algorithms may not work well with limited labeled data.

Self-taught Learning


Learning in humans l.jpg

Learning in humans

  • Your brain has 1014 synapses (connections).

  • You will live for 109 seconds.

  • If each synapse requires 1 bit to parameterize, you need to “learn” 1014 bits in 109 seconds.

  • Or, 105 bits per second.

    Human learning is largely unsupervised,

    and uses readily available unlabeled data.

(Geoffrey Hinton, personal communication)

Self-taught Learning


Supervised learning7 l.jpg

Cars

Motorcycles

Supervised learning

Train

Test

Self-taught Learning


Brain like learning l.jpg

“Brain-like” Learning

Train

Test

Cars

Motorcycles

Unlabeled images

(randomly downloaded from the Internet)

Self-taught Learning


Brain like learning9 l.jpg

“Brain-like” Learning

+

?

Labeled Webpages

Labeled Digits

Unlabeled English characters

+

?

Unlabeled newspaper articles

+

?

Unlabeled English speech

Labeled Russian Speech

Self-taught Learning


Self taught learning l.jpg

“Self-taught Learning”

+

?

Labeled Webpages

Labeled Digits

Unlabeled English characters

+

?

Unlabeled newspaper articles

+

?

Unlabeled English speech

Labeled Russian Speech

Self-taught Learning


Recent history of machine learning l.jpg

Cars

Cars

Cars

Motorcycles

Motorcycles

Motorcycles

Bus

Tractor

Aircraft

Helicopter

Motorcycle

Car

Natural scenes

Recent history of machine learning

  • 20 years ago: Supervised learning

  • 10 years ago: Semi-supervised learning.

  • 10 years ago: Transfer learning.

  • Next: Self-taught learning?


Self taught learning12 l.jpg

Self-taught Learning

  • Labeled examples:

  • Unlabeled examples:

  • The unlabeled and labeled data:

  • Need not share labels y.

  • Need not share a generative distribution.

  • Advantage: Such unlabeled data is often easy to obtain.

Self-taught Learning


A self taught learning algorithm l.jpg

A self-taught learning algorithm

Overview: Represent each labeled or unlabeled input as a sparse linear combination of “basis vectors” .

x = 0.8 * b87+ 0.3 * b376+ 0.5 * b411

= 0.8 * + 0.3 * + 0.5 *

Self-taught Learning


A self taught learning algorithm14 l.jpg

A self-taught learning algorithm

Key steps:

Learn good bases using unlabeled data .

Use these learnt bases to construct “higher-level” features for the labeled data.

Apply a standard supervised learning algorithm on these features.

x = 0.8 * b87+ 0.3 * b376+ 0.5 * b411

= 0.8 * + 0.3 * + 0.5 *

Self-taught Learning


Learning the bases sparse coding l.jpg

Learning the bases: Sparse coding

Given only unlabeled data, we find good bases b using sparse coding:

Reconstruction error

Sparsity penalty

(Efficient algorithms: Lee et al., NIPS 2006)

[Details: An extra normalization constraint on is required.]

Self-taught Learning


Example bases l.jpg

Example bases

Learnt bases: “Edges”

Natural images.

Learnt bases: “Strokes”

Handwritten characters.

Self-taught Learning


Constructing features l.jpg

Constructing features

  • Using the learnt bases b, compute features for the examples xlfrom the classification task by solving:

  • Finally, learn a classifer using a standard supervised learning algorithm (e.g., SVM) over these features.

Sparsity penalty

Reconstruction error

= 0.8 * + 0.3 * + 0.5 *

xl = 0.8 * b87+ 0.3 * b376+ 0.5 * b411

Self-taught Learning


Image classification l.jpg

Image classification

Large image

(Platypus from Caltech101 dataset)

Feature visualization

Self-taught Learning


Image classification19 l.jpg

Image classification

Platypus image

(Caltech101 dataset)

Feature visualization

Self-taught Learning


Image classification20 l.jpg

Image classification

Platypus image

(Caltech101 dataset)

Feature visualization

Self-taught Learning


Image classification21 l.jpg

Image classification

Platypus image

(Caltech101 dataset)

Feature visualization

Self-taught Learning


Image classification22 l.jpg

Image classification

Other reported results:

Fei-Fei et al, 2004: 16%

Berg et al., 2005: 17%

Holub et al., 2005: 40%

Serre et al., 2005: 35%

Berg et al, 2005: 48%

Zhang et al., 2006: 59%

Lazebnik et al., 2006: 56%

(15 labeled images per class)

36.0% error reduction

Self-taught Learning


Character recognition l.jpg

Character recognition

Digits

Handwritten English

English font

Handwritten English classification

(20 labeled images per handwritten character)

Bases learnt on digits

English font classification

(20 labeled images per font character)

Bases learnt on handwritten English

8.2% error reduction

2.8% error reduction

Self-taught Learning


Text classification l.jpg

Text classification

Reuters newswire

UseNet articles

Webpages

Webpage classification

(2 labeled documents per class)

Bases learnt on Reuters newswire

UseNet classification

(2 labeled documents per class)

Bases learnt on Reuters newswire

4.0% error reduction

6.5% error reduction

Self-taught Learning


Shift invariant sparse coding l.jpg

Shift-invariant sparse coding

Sparse features

Basis functions

Reconstruction

(Algorithms: Grosse et al., UAI 2007)

Self-taught Learning


Audio classification l.jpg

Audio classification

Speaker identification

(5 labels, TIMIT corpus, 1 sentence per speaker.)

Bases learnt on different dialects

Musical genre classification

(5 labels, 18 seconds per genre.)

Bases learnt on different genres, songs

8.7% error reduction

5.7% error reduction

(Details: Grosse et al., UAI 2007)

Self-taught Learning


Sparse deep belief networks l.jpg

Sparse deep belief networks

. . .

h: Hidden layer

Sparse RBM

W, b, c: Parameters

. . .

v: Visible layer

New

(Details: Lee et al., NIPS 2007. Poster yesterday.)

Self-taught Learning


Sparse deep belief networks28 l.jpg

Sparse deep belief networks

Image classification

(Caltech101 dataset)

3.2% error reduction

(Details: Lee et al., NIPS 2007. Poster yesterday.)

Self-taught Learning


Summary l.jpg

Cars

Motorcycles

Summary

  • Self-taught learning: Unlabeled data does not share the labels of the classification task.

  • Use unlabeled data to discover features.

  • Use sparse coding to construct an easy-to-classify, “higher-level” representation.

Unlabeled images

= 0.8 * + 0.3 * + 0.5 *

Self-taught Learning


The end l.jpg

THE END


Related work l.jpg

Related Work

  • Weston et al, ICML 2006

    • Make stronger assumptions on the unlabeled data.

  • Ando & Zhang, JMLR 2005

    • For natural language tasks and character recognition, use heuristics to construct a transfer learning task using unlabeled data.

Self-taught Learning


  • Login