slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Tomasz Malisiewicz tomasz@cmu Advanced Machine Perception February 2006 PowerPoint Presentation
Download Presentation
Tomasz Malisiewicz tomasz@cmu Advanced Machine Perception February 2006

Loading in 2 Seconds...

play fullscreen
1 / 59

Tomasz Malisiewicz tomasz@cmu Advanced Machine Perception February 2006 - PowerPoint PPT Presentation


  • 136 Views
  • Uploaded on

A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images J. Sivic, B. Russell, A. Efros, A. Zisserman and B. Freeman. ICCV 2005. Tomasz Malisiewicz tomasz@cmu.edu Advanced Machine Perception

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Tomasz Malisiewicz tomasz@cmu Advanced Machine Perception February 2006' - yoland


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images J. Sivic, B. Russell, A. Efros, A. Zisserman and B. Freeman. ICCV 2005

Tomasz Malisiewicz

tomasz@cmu.edu

Advanced Machine Perception

February 2006

graphical models recent trend in machine learning
Graphical Models: Recent Trend in Machine Learning

Describing Visual Scenes using

Transformed Dirichlet Processes.

E. Sudderth, A. Torralba, W. Freeman,

and A. Willsky. NIPS, Dec. 2005.

outline
Outline
  • Goals of both vision papers
  • Techniques from statistical text modeling

- pLSA vs LDA

  • Scene Classification via LDA
  • Object Discovery via pLSA
goal learn and recognize natural scene categories
Goal: Learn and Recognize Natural Scene Categories

Classify a scene without first extracting

objects

Other techniques we know of:

-Global frequency (Oliva and Torralba)

-Texton Histogram (Renninger, Malik et al)

goal discover object categories
Goal: Discover Object Categories
  • Discover what objects are present in a collection of images in an unsupervised way
  • Find those same objects in novel images
  • Determine what local image features correspond to what objects; segmenting the image
enter the world of statistical text modeling
Enter the world of Statistical Text Modeling
  • D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003.
  • Bag-of-words approaches: the order of words in a document can be neglected
  • Graphical Model Fun
bag of words
Bag-of-words
  • A document is a collection of M words
  • A corpus (collection of documents) is summarized in a term-document matrix
slide8

Object

Bag of ‘words’

1990 latent semantic analysis lsa
1990: Latent Semantic Analysis (LSA)
  • Goal: map high-dimensional count vectors to a lower dimensional representation to reveal semantic relations between words
  • The lower dimensional space is called the latent semantic space
  • Dim( latent space ) = K
1990 latent semantic analysis lsa11

words

topics

topics

words

NxM

NxK

KxK

KxM

topics

topics

=

x

x

documents

documents

1990: Latent Semantic Analysis (LSA)
  • D = {d1,…,dN} N documents
  • W = {w1,…,wM} M words
  • Nij = #(di,wj) NxM co-occurrence term-document matrix
what did we just do

words

topics

topics

words

NxM

NxK

KxK

KxM

topics

topics

=

x

x

documents

documents

What did we just do?

Singular Value Decomposition

lsa summary
LSA summary
  • SVD on term-document matrix
  • Approximate N by thresholding all but the largest K singular values in W to zero
  • Produces rank-K optimal approximation to N in the L2-matrix or Frobenius norm sense
lsa and polysemy

According to this superposition

principle, LSA is unable to capture

multiple senses of a word

LSA and Polysemy
  • Polysemy: the ambiguity of an individual word or phrase that can be used (in different contexts) to express two or more different meanings
  • Under the LSA model, the coordinates of a word in latent space can be written as a linear superposition of the coordinates of the documents that contain the word
problems with lsa
Problems with LSA
  • LSA does not define a properly normalized probability distribution
  • No obvious interpretation of the directions in the latent space
  • From statistics, the utilization of L2 norm in LSA corresponds to a Gaussian Error assumption which is hard to justify in the context of count variables
  • Polysemy problem
plsa to the rescue
pLSA to the rescue
  • Probabilistic Latent Semantic Analysis
  • pLSA relies on the likelihood function of multinomial sampling and aims at an explicit maximization of the predictive power of the model
slide17

Decomposition into Probabilities!

Observed word

distributions

Topic distributions

per document

word distributions

per topic

pLSA to the rescue

Slide credit: Josef Sivic

slide18

Learning the pLSA parameters

Observed counts of word i in document j

Unlike LSA, pLSA does not minimize any type of ‘squared deviation.’

The parameters are estimated in a probabilistically sound way.

Maximize likelihood of data using EM.

Minimize KL divergence between empirical

distribution and model

Slide credit: Josef Sivic

em for plsa training on a corpus
EM for pLSA (training on a corpus)
  • E-step: compute posterior probabilities for the latent variables
  • M-step: maximize the expected complete data log-likelihood
graphical view of plsa

z

d

w

Graphical View of pLSA
  • pLSA is a generative model
  • Select a document di with prob P(di)
  • Pick latent class zk with prob P(zk|di)
  • Generate word wj with prob P(wj|zk)

Observed variables

Latent variables

Plates

how does plsa deal with previously unseen documents
How does pLSA deal with previously unseen documents?
  • “Folding-in” Heuristic
  • First train on Corpus to obtain
  • Now re-run same training EM algorithm, but don’t re-estimate and let D={dunseen}
problems with plsa
Problems with pLSA
  • Not a well-defined generative model of documents; d is a dummy index into the list of documents in the training set (as many values as documents)
  • No natural way to assign probability to a previously unseen document
  • Number of parameters to be estimated grows with size of training set
lda to the rescue

LDA

pLSA

LDA to the rescue
  • Latent Dirichlet Allocation treats the topic mixture weights as a k-parameter hidden random variable and places a Dirichlet prior on the multinomial mixing weights
  • Dirichlet distribution is conjugate to the multinomial distribution (most natural prior to choose: the posterior distribution is also a Dirichlet!)
corpus level parameters in lda
Corpus-Level parameters in LDA
  • Alpha and beta are corpus-level documents that are sampled once in the corpus creating generative model (outside of the plates!)
  • Alpha and beta must be estimated before we can find the topic mixing proportions belonging to a previously unseen document

LDA

getting rid of plates

1

2

K

z1

z2

z3

zN

z1

z2

z3

zN

z1

z2

z3

zN

w1

w2

w3

wN

w1

w2

w3

wN

w1

w2

w3

wN

b

Getting rid of plates

Thanks to Jonathan Huang for the un-plated LDA graphic

inference in lda
Inference in LDA
  • Inference = estimation of document-level parameters
  • Intractable to compute  must employ approximate inference
approximate inference in lda
Approximate Inference in LDA
  • Variational Methods: Use Jensen’s inequality to obtain a lower bound on the log likelihood that is indexed by a set of variational parameters
  • Optimal Variational Parameters (document-specific) are obtained by minimizing the KL divergence between the variational distribution and the true posterior

Variational Methods are one way of doing this.

Gibbs sampling (MCMC) is another way.

Variational distribution

look at some p w z produced by lda
Look at some P(w|z) produced by LDA
  • Show some pLSI and LDA results applied to text
  • An LDA project by Tomasz Malisiewicz and Jonathan Huang
  • Search for the word ‘drive’
plsa and lda applied to images
pLSA and LDA applied to Images
  • How can one apply these techniques to the images?
slide30

Hierarchical Bayesian

text models

z

d

w

N

D

z

c

w

N

D

Probabilistic Latent Semantic Analysis (pLSA)

Hoffman, 2001

Latent Dirichlet Allocation (LDA)

Blei et al., 2001

slide31

Hierarchical Bayesian

text models

z

d

w

N

D

“face”

Probabilistic Latent Semantic Analysis (pLSA)

Sivic et al. ICCV 2005

slide32

Hierarchical Bayesian

text models

“beach”

z

c

w

N

D

Latent Dirichlet Allocation (LDA)

Fei-Fei et al. ICCV 2005

how to generate an image
How to Generate an Image?

Choose a scene (mountain, beach, …)

Given scene generate an intermediate

probability vector over ‘themes’

For each word:

Determine current theme from mixture

of themes

Draw a codeword from that theme

inference
Inference
  • How to make decision on a novel image
  • Integrate over latent variables to get:
  • Approximate Variational Inference (not easy, but Gibbs sampling is supposed to be easier)
codebook
Codebook
  • 174 Local Image Patches
  • Detection:

Evenly Sampled Grid

Random Sampling

Saliency Detector

Lowe’s DoG Detector

  • Representation:

Normalized 11x11 gray values

128-dim SIFT

results average performance 64
Results: Average performance 64%
  • Confusion Matrix

100 training examples and 50 test examples

Rank statistic test:the probability of a test scene correctly

belong to one of the top N most probable categories

results the distributions
Results: The Distributions

Theme

distribution

Codeword

distribution

summary of detection and representation choices
Summary of detection and representation choices
  • SIFT outperforms pixel gray values
  • Sliding grid, which creates the largest number of patches, does best
visual words
Visual Words
  • Vector Quantized SIFT descriptors computed in regions
  • Regions come from elliptical shape adaptation around interest point, and from the maximally stable regions of Matas et al.
  • Both are elliptical regions at twice their detected scale
building a vocabulary46

K-means clustering of 300K regions

to get about 1K clusters for each of

Shape Adapted and Maximally Stable

regions

Building a Vocabulary

Vector quantization

Slide credit: Josef Sivic

plsa training
pLSA Training
  • Sanity Check: Remember what quantities must be estimated?
results 1 topic discovery
Results #1: Topic Discovery
  • This is just the training stage
  • Obtain P(zk|dj) for each image, then classify image as containing object k according to the max of P(zk|dj) over k

4 object categories

Plus background

results 2 classifying new images
Results #2: Classifying New Images
  • Object Categories learned on a corpus, then object categories found in new image

Anybody remember how this is done?

Remember the index d in

the graphical model

how does plsa deal with previously unseen documents51
How does pLSA deal with previously unseen documents?
  • “Folding-in” Heuristic
  • First train on Corpus to obtain
  • Now re-run same training EM algorithm, but don’t re-estimate and let D={dunseen}
results 2 classifying new images52
Results #2: Classifying New Images
  • Train on one set and test on another
results 3 segmentation
Results #3: Segmentation
  • Localization and Segmentation of Object
  • For a word occurrence in a particular document we can examine the probability of different topics
  • Find words with P(zk|dj,wi) > .8
results 3 segmentation54
Results #3: Segmentation

Note: words shown are not the most probable words

for a topic, but instead they are words that have a high

probability of occurring in a topic AND high probability of

occurring in the image

results 3 segmentation and doublets
Results #3: Segmentation and Doublets
  • Two class image dataset consisting of half the faces (218 images) and backgrounds (217 images)
  • A 4 topic pLSA model is learned for all training faces and training backgrounds with 3 fixed background topics, i.e. one (face) topic is learned in addition to the three fixed background topics
  • A doublet vocabulary is then formed from the top 100 visual words of the face topic. A second 4 topic pLSA model is then learned for the combined vocabulary of singlets and doublets with the background topics fixed.
doublets
Doublets

Face

Segmentation

Scores

Singleton: .49

Doublets: .61

Efros: didn’t work as much as you’d think

conclusions
Conclusions
  • Showed how both papers use bag-of-words approaches
  • We’re now ready to become experts on generative models like pLSA and LDA
  • Graphical Model Fun! (Carlos Guestrin teaches Graphical Models)
are you really into graphical models
Are you really into Graphical Models?
  • Describing Visual Scenes using Transformed Dirichlet Processes. E. Sudderth, A. Torralba, W. Freeman, and A. Willsky. NIPS, Dec. 2005.
references
References
  • A Bayesian Hierarchical Model for Learning Natural Scene Categories, Fei Fei Li et al
  • Describing Visual Scenes using Transformed Dirichlet Processes, Sudderth et al
  • Discovering objects and their location in images, Sivic et al
  • Latent Dirichlet Allocation, Blei et al
  • Unsupervised Learning by Probabilistic Latent Semantic Analysis, T. Hoffman