self taught learning transfer learning from unlabeled data
Download
Skip this Video
Download Presentation
Self-taught Learning Transfer Learning from Unlabeled Data

Loading in 2 Seconds...

play fullscreen
1 / 31

Self-taught Learning Transfer Learning from Unlabeled Data - PowerPoint PPT Presentation


  • 180 Views
  • Uploaded on

Self-taught Learning Transfer Learning from Unlabeled Data. Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University. The “one learning algorithm” hypothesis.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Self-taught Learning Transfer Learning from Unlabeled Data' - amma


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
self taught learning transfer learning from unlabeled data

Self-taught LearningTransfer Learning from Unlabeled Data

Rajat Raina

Honglak Lee, Roger Grosse

Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer,

Narut Sereewattanawoot

Andrew Y. Ng

Stanford University

the one learning algorithm hypothesis
The “one learning algorithm” hypothesis
  • There is some evidence that the human brain uses essentially the same algorithm to understand many different input modalities.
    • Example: Ferret experiments, in which the “input” for vision was plugged into auditory part of brain, and the auditory cortex learns to “see.” [Roe et al., 1992]

(Roe et al., 1992. Hawkins & Blakeslee, 2004)

Self-taught Learning

slide3

The “one learning algorithm” hypothesis

  • There is some evidence that the human brain uses essentially the same algorithm to understand many different input modalities.
    • Example: Ferret experiments, in which the “input” for vision was plugged into auditory part of brain, and the auditory cortex learns to “see.” [Roe et al., 1992]

If we could find this one learning algorithm,

we would be done. (Finally!)

(Roe et al., 1992. Hawkins & Blakeslee, 2004)

Self-taught Learning

finding a deep learning algorithm
Finding a deep learning algorithm
  • If the brain really is one learning algorithm, it would suffice to just:
    • Find a learning algorithm for a single layer, and,
    • Show that it can build a small number of layers.
  • We evaluate our algorithms:
    • Against biology.
    • On applications.
  • e.g., Sparse RBMs for V2:
  • Poster yesterday (Lee et al.)
  • This talk

Self-taught Learning

supervised learning

Cars

Motorcycles

Supervised learning

Train

Test

Supervised learning algorithms may not work well with limited labeled data.

Self-taught Learning

learning in humans
Learning in humans
  • Your brain has 1014 synapses (connections).
  • You will live for 109 seconds.
  • If each synapse requires 1 bit to parameterize, you need to “learn” 1014 bits in 109 seconds.
  • Or, 105 bits per second.

Human learning is largely unsupervised,

and uses readily available unlabeled data.

(Geoffrey Hinton, personal communication)

Self-taught Learning

supervised learning7

Cars

Motorcycles

Supervised learning

Train

Test

Self-taught Learning

brain like learning
“Brain-like” Learning

Train

Test

Cars

Motorcycles

Unlabeled images

(randomly downloaded from the Internet)

Self-taught Learning

brain like learning9
“Brain-like” Learning

+

?

Labeled Webpages

Labeled Digits

Unlabeled English characters

+

?

Unlabeled newspaper articles

+

?

Unlabeled English speech

Labeled Russian Speech

Self-taught Learning

self taught learning
“Self-taught Learning”

+

?

Labeled Webpages

Labeled Digits

Unlabeled English characters

+

?

Unlabeled newspaper articles

+

?

Unlabeled English speech

Labeled Russian Speech

Self-taught Learning

recent history of machine learning

Cars

Cars

Cars

Motorcycles

Motorcycles

Motorcycles

Bus

Tractor

Aircraft

Helicopter

Motorcycle

Car

Natural scenes

Recent history of machine learning
  • 20 years ago: Supervised learning
  • 10 years ago: Semi-supervised learning.
  • 10 years ago: Transfer learning.
  • Next: Self-taught learning?
self taught learning12
Self-taught Learning
  • Labeled examples:
  • Unlabeled examples:
  • The unlabeled and labeled data:
  • Need not share labels y.
  • Need not share a generative distribution.
  • Advantage: Such unlabeled data is often easy to obtain.

Self-taught Learning

a self taught learning algorithm
A self-taught learning algorithm

Overview: Represent each labeled or unlabeled input as a sparse linear combination of “basis vectors” .

x = 0.8 * b87+ 0.3 * b376+ 0.5 * b411

= 0.8 * + 0.3 * + 0.5 *

Self-taught Learning

a self taught learning algorithm14
A self-taught learning algorithm

Key steps:

Learn good bases using unlabeled data .

Use these learnt bases to construct “higher-level” features for the labeled data.

Apply a standard supervised learning algorithm on these features.

x = 0.8 * b87+ 0.3 * b376+ 0.5 * b411

= 0.8 * + 0.3 * + 0.5 *

Self-taught Learning

learning the bases sparse coding
Learning the bases: Sparse coding

Given only unlabeled data, we find good bases b using sparse coding:

Reconstruction error

Sparsity penalty

(Efficient algorithms: Lee et al., NIPS 2006)

[Details: An extra normalization constraint on is required.]

Self-taught Learning

example bases
Example bases

Learnt bases: “Edges”

Natural images.

Learnt bases: “Strokes”

Handwritten characters.

Self-taught Learning

constructing features
Constructing features
  • Using the learnt bases b, compute features for the examples xlfrom the classification task by solving:
  • Finally, learn a classifer using a standard supervised learning algorithm (e.g., SVM) over these features.

Sparsity penalty

Reconstruction error

= 0.8 * + 0.3 * + 0.5 *

xl = 0.8 * b87+ 0.3 * b376+ 0.5 * b411

Self-taught Learning

image classification
Image classification

Large image

(Platypus from Caltech101 dataset)

Feature visualization

Self-taught Learning

image classification19
Image classification

Platypus image

(Caltech101 dataset)

Feature visualization

Self-taught Learning

image classification20
Image classification

Platypus image

(Caltech101 dataset)

Feature visualization

Self-taught Learning

image classification21
Image classification

Platypus image

(Caltech101 dataset)

Feature visualization

Self-taught Learning

image classification22
Image classification

Other reported results:

Fei-Fei et al, 2004: 16%

Berg et al., 2005: 17%

Holub et al., 2005: 40%

Serre et al., 2005: 35%

Berg et al, 2005: 48%

Zhang et al., 2006: 59%

Lazebnik et al., 2006: 56%

(15 labeled images per class)

36.0% error reduction

Self-taught Learning

character recognition
Character recognition

Digits

Handwritten English

English font

Handwritten English classification

(20 labeled images per handwritten character)

Bases learnt on digits

English font classification

(20 labeled images per font character)

Bases learnt on handwritten English

8.2% error reduction

2.8% error reduction

Self-taught Learning

text classification
Text classification

Reuters newswire

UseNet articles

Webpages

Webpage classification

(2 labeled documents per class)

Bases learnt on Reuters newswire

UseNet classification

(2 labeled documents per class)

Bases learnt on Reuters newswire

4.0% error reduction

6.5% error reduction

Self-taught Learning

shift invariant sparse coding
Shift-invariant sparse coding

Sparse features

Basis functions

Reconstruction

(Algorithms: Grosse et al., UAI 2007)

Self-taught Learning

audio classification
Audio classification

Speaker identification

(5 labels, TIMIT corpus, 1 sentence per speaker.)

Bases learnt on different dialects

Musical genre classification

(5 labels, 18 seconds per genre.)

Bases learnt on different genres, songs

8.7% error reduction

5.7% error reduction

(Details: Grosse et al., UAI 2007)

Self-taught Learning

sparse deep belief networks
Sparse deep belief networks

. . .

h: Hidden layer

Sparse RBM

W, b, c: Parameters

. . .

v: Visible layer

New

(Details: Lee et al., NIPS 2007. Poster yesterday.)

Self-taught Learning

sparse deep belief networks28
Sparse deep belief networks

Image classification

(Caltech101 dataset)

3.2% error reduction

(Details: Lee et al., NIPS 2007. Poster yesterday.)

Self-taught Learning

summary

Cars

Motorcycles

Summary
  • Self-taught learning: Unlabeled data does not share the labels of the classification task.
  • Use unlabeled data to discover features.
  • Use sparse coding to construct an easy-to-classify, “higher-level” representation.

Unlabeled images

= 0.8 * + 0.3 * + 0.5 *

Self-taught Learning

related work
Related Work
  • Weston et al, ICML 2006
    • Make stronger assumptions on the unlabeled data.
  • Ando & Zhang, JMLR 2005
    • For natural language tasks and character recognition, use heuristics to construct a transfer learning task using unlabeled data.

Self-taught Learning

ad