Class 3: Feature Coding and Pooling
This presentation is the property of its rightful owner.
Sponsored Links
1 / 58

Class 3: Feature Coding and Pooling PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on
  • Presentation posted in: General

Class 3: Feature Coding and Pooling. Liangliang Cao, Feb 7, 2013. EECS 6890 - Topics in Information Processing. Spring 2013, Columbia University. http://rogerioferis.com/VisualRecognitionAndSearch. Problem. The Myth of Human Vision. Can you understand the following?. 婴儿 बचचा. 아가 bebé.

Download Presentation

Class 3: Feature Coding and Pooling

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Class 3 feature coding and pooling

Class 3: Feature Coding and Pooling

Liangliang Cao, Feb 7, 2013

EECS 6890 - Topics in Information Processing

Spring 2013, Columbia University

http://rogerioferis.com/VisualRecognitionAndSearch


Class 3 feature coding and pooling

Problem

The Myth of Human Vision

Can you understand the following?

婴儿बचचा

아가bebé

People may have difficulties

to understand different

texts, but NOT images.

Photo courtesy to luster

Visual Recognition And Search

Columbia University, Spring 2013

2


Class 3 feature coding and pooling

Problem

Can the computer vision system

recognize objects or scenes like

human?

Visual Recognition And Search

Columbia University, Spring 2013

3


Class 3 feature coding and pooling

Why This Problem Important?

Mobile Apps

Traditional companies

Searching engines

Visual Recognition And Search

Columbia University, Spring 2013

4


Class 3 feature coding and pooling

Problem

Examples of Object Recognition Dataset

http://www.vision.caltech.edu

Visual Recognition And Search

Columbia University, Spring 2013

5


Class 3 feature coding and pooling

Problem

http://groups.csail.mit.edu/vision/SUN/

Visual Recognition And Search

Columbia University, Spring 2013

6


Class 3 feature coding and pooling

Overview of Classification Model

Uncertainty-Based

Fisher vector/Supervector

Histogram

Sparse

Coding

of SIFT

Quantization

Coding

GMMprobability

Vector

Soft

Soft

quantization quantizationestimation

quantization

Pooling

HistogramGMM

Histogramaggregation

Max poolingadptation

aggregation

Coding: to map local features into a compact representation

Pooling: to aggregate these compact representation together

Visual Recognition And Search

Columbia University, Spring 2013

7


Class 3 feature coding and pooling

Outline

Outlines

• Histogram of local features

• Bag of words model

• Soft quantization and sparse coding

• Fisher vector and supervector

Visual Recognition And Search

Columbia University, Spring 2013

8


Class 3 feature coding and pooling

Histogram of Local Features

Visual Recognition And Search

Columbia University, Spring 2013

9


Class 3 feature coding and pooling

Bag of Words Models

Recall of Last Class

• Powerful local features

- DoG

- Hessian, Harris

- Dense-sampling

Non-fixed number of

local regions per image!

Visual Recognition And Search

Columbia University, Spring 2013

10


Class 3 feature coding and pooling

Bag of Words Models

Recall of Last Class (2)

• Histograms can provide a fixed size representation ofimages

• Spatial pyramid/gridding can enhance histogrampresentation with spatial information

Visual Recognition And Search

Columbia University, Spring 2013

11


Class 3 feature coding and pooling

Bag of Words Models

Histogram of Local Features

…..

dim = # codewords

codewords

Visual Recognition And Search

Columbia University, Spring 2013

12


Class 3 feature coding and pooling

Bag of Words Models

Histogram of Local Features (2)

……

dim = #codewords x #grids

Visual Recognition And Search

Columbia University, Spring 2013

13


Class 3 feature coding and pooling

Bag of Words Models

Local Feature Quantization

Slide courtesy to Fei-Fei Li

Visual Recognition And Search

Columbia University, Spring 2013

14


Class 3 feature coding and pooling

Bag of Words Models

Local Feature Quantization

Visual Recognition And Search

Columbia University, Spring 2013

15


Class 3 feature coding and pooling

Bag of Words Models

Local Feature Quantization

- Vector quantization

- Dictionary learning

Visual Recognition And Search

Columbia University, Spring 2013

16


Class 3 feature coding and pooling

Histogram of Local Features

Dictionary for Codewords

Visual Recognition And Search

Columbia University, Spring 2013

17


Class 3 feature coding and pooling

Bag of Words Models

Most slides in this section are courtesy to Fei-Fei Li

Visual Recognition And Search

Columbia University, Spring 2013

18


Class 3 feature coding and pooling

Object

Bag of ‘words’

Visual Recognition And Search

Columbia University, Spring 2013

19


Class 3 feature coding and pooling

Bag of Words Models

Underlining Assumptions - Text

Of all the sensory impressions proceeding tothe brain, the visual experiences are the

China is forecasting a trade surplus of $90bn(£51bn) to $100bn this year, a threefold

dominant ones. Our perception of the worldaround us is based essentially on the

increase on 2004's $32bn. The Commerce

Ministry said the surplus would be created bya pred

messa

For a l

compa

imagesensory, brain,

$660bChina, trade,

center

annoy

visual, perception,

surplus, commerce,

movie

China'

retinal, cerebral cortex,

exports, imports, US,

image

delibe

eye, cell, opticalnerve, image

yuan, bank, domestic,foreign, increase,

discov

agree

know t

yuan i

percep

gover

Hubel, Wiesel

trade, value

more c

also n

followi

dema

to the

countr

Hubel

yuan a

demon

permit

image

the US

wise a

freely.

stored

it will t

has its

allowi

a spec

image.

Visual Recognition And Search

Columbia University, Spring 2013

20


Class 3 feature coding and pooling

Bag of Words Models

Underlining Assumptions - Image

Visual Recognition And Search

Columbia University, Spring 2013

21


Class 3 feature coding and pooling

learning

recognition

K-means

feature detection& representation

image representation

category models

category

(and/or) classifiers

decision


Class 3 feature coding and pooling

Bag of Words Models

Borrowing Techniques from Text Classification

• PLSA

• Naïve Bayesian Model

• wn: each patch in an image

- wn = [0,0,…1,…,0,0΁T

• w: a collection of all N patches in an image

- w = [w1,w2,…,wN]

• dj: the jth image in an image collection

• c: category of the image

• z: theme or topic of the patch

Visual Recognition And Search

Columbia University, Spring 2013

23


Class 3 feature coding and pooling

Bag of Words Models

Probabilistic Latent Semantic Analysis (pLSA)

z

d

w

N

D

Hoffman, 2001

Latent Dirichlet Allocation (LDA)

z

c

w

N

D

Blei et al., 2001

Visual Recognition And Search

Columbia University, Spring 2013

24


Class 3 feature coding and pooling

Bag of Words Models

Probabilistic Latent Semantic Analysis (pLSA)

z

d

w

N

D

“face”

Visual Recognition And Search

5

ColSivic et al. ICCV 200

25


Class 3 feature coding and pooling

Bag of Words Models

K

dzw

p(w| d

)

p(w

|z

)p(z

| d

)

j

N

i

i

k

k

j

D

k1

Observed codeword

Theme distributions

Codeword distributions

distributions

per image

per theme (topic)

Parameter estimated by EM or Gibbs sampling

Visual Recognition And Search

Columbia University, Spring 2013

26

Slide credit: Josef Sivic


Class 3 feature coding and pooling

Bag of Words Models

Recognition using pLSA

z

arg max p(z|d)

z

Visual Recognition And Search

Columbia UnSlide credit: Josef Siv

27

ic


Class 3 feature coding and pooling

Bag of Words Models

Scene Recognition using LDA

“beach”

Latent Dirichlet Allocation (LDA)

z

c

w

N

D

Fei-Fei et al. ICCV 2005

Visual Recognition And Search

Columbia University, Spring 2013

28


Class 3 feature coding and pooling

Bag of Words Models

Spatial-Coherent Latent Topic Model

Cao and Fei-Fei, ICCV 2007

Visual Recognition And Search

Columbia University, Spring 2013

29


Class 3 feature coding and pooling

Bag of Words Models

Simultaneous Segmentation and Recognition

Visual Recognition And Search

Columbia University, Spring 2013

30


Class 3 feature coding and pooling

Bag of Words Models

Pros and Cons

Images differ from texts!

Bag of Words Models are good in

- Modeling prior knowledge

- Providing intuitive interpretation

But these models suffer from

- Loss of information in quantization of “visual words”

- Loss of spatial information

Better coding

Better pooling

Visual Recognition And Search

Columbia University, Spring 2013

31


Class 3 feature coding and pooling

Soft Quantization and Sparse Coding

Visual Recognition And Search

Columbia University, Spring 2013

32


Class 3 feature coding and pooling

Soft Quantization

Hard Quantization

Visual Recognition And Search

Columbia University, Spring 2013

33


Class 3 feature coding and pooling

Soft Quantization

Uncertainty-Based Quantization

Model the uncertainty across multiple codewords

Gemert et al, Visual Word Ambiguity, PAMI 2009

Visual Recognition And Search

Columbia University, Spring 2013

34


Class 3 feature coding and pooling

Soft Quantization

Intuition of UNC

Hard quantization

Soft quantization based on uncertainty

Gemert et al, Visual Word Ambiguity, PAMI 2009

Visual Recognition And Search

Columbia University, Spring 2013

35


Class 3 feature coding and pooling

Soft Quantization

Improvement of UNC

Visual Recognition And Search

Columbia University, Spring 2013

36


Class 3 feature coding and pooling

Soft Quantization

Sparse Coding-Based Quantization

Hard quantization can be viewed as an “extremely

sparse representation”

s.t.

A more general but hard to solve representation

In practice we consider

Sparse coding

Visual Recognition And Search

Columbia University, Spring 2013

37


Class 3 feature coding and pooling

Soft Quantization

Sparse Coding-Based Quantization

Hard quantization can be viewed as an “extremely

sparse representation”

s.t.

A more general but hard to solve representation

In practice we consider

Sparse coding

Visual Recognition And Search

Columbia University, Spring 2013

38


Class 3 feature coding and pooling

Soft Quantization

Yang et al obtain goodrecognition accuracy bycombining sparse

coding with spatial

pyramid and dictionarytraining.

More details will be in

group presentation.

Yang et al, Linear Spatial Pyramid Matching using Sparse Coding for Image Classification, CVPR 2009

Visual Recognition And Search

Columbia University, Spring 2013

39


Class 3 feature coding and pooling

Fisher Vector and Supervector

One of the most powerful image/video classification techniques

Thanks to Zhen Li and Qiang Chenconstructive suggestions to this section

Visual Recognition And Search

Columbia University, Spring 2013

40


Class 3 feature coding and pooling

Fisher Vector and Supervector

Winning Systems

2009

2010

2011

2012

Classification task

2010

2011

Large Scale Visual

Recognition Challenge

Visual Recognition And Search

Columbia University, Spring 2013

41


Class 3 feature coding and pooling

Fisher Vector and Supervector

Literature

[1] Yan et al, Regression from patch-kernel. CVPR 2008[2] Zhou et al, SIFT-Bag kernel for video event analysis.

ACM Multimedia 2008

Supervector

[3] Zhou et al, Hierarchical Gaussianization for imageclassification. ICCV 2009: 1971-1977

[4] Zhou et al, Image classification using super-vector

coding of local image descriptors. ECCV 2010

[5] Perronnin et al, Improving the Fisher kernel for

large-scale image classification, ECCV 2010

Fisher Vector

[6] Jégou et al, Aggregating local image descriptorsinto compact codes PAMI 2011.

These papers are not very easy to read.

Let me take a simplified perspective via coding&pooling framework

Visual Recognition And Search

Columbia University, Spring 2013

42


Class 3 feature coding and pooling

Fisher Vector and Supervector

Coding without Information Loss

• Coding with hard assignment

• Coding with soft assignment

• How to keep all the information?

Visual Recognition And Search

Columbia University, Spring 2013

43


Class 3 feature coding and pooling

Fisher Vector and Supervector

Coding without Information Loss

• Coding with hard assignment

• Coding with soft assignment

• How to keep all the information?

Visual Recognition And Search

Columbia University, Spring 2013

44


Class 3 feature coding and pooling

Fisher Vector and Supervector

An Intuitive Illustration

Coding

Visual Recognition And Search

Columbia University, Spring 2013

45


Class 3 feature coding and pooling

Fisher Vector and Supervector

An Intuitive Illustration

Coding

Component 2

Component 3

Component 1

Visual Recognition And Search

Columbia University, Spring 2013

46


Class 3 feature coding and pooling

Fisher Vector and Supervector

An Intuitive Illustration

Component 1

Component 2

Component 3

+

+

+

+

+

+

Pooling

Visual Recognition And Search

Columbia University, Spring 2013

47


Class 3 feature coding and pooling

Fisher Vector and Supervector

Implementation of Supervector

In speech (speaker identification), supervector refer to

stacked means of adaptive GMMs.

Supervector =

Visual Recognition And Search

Columbia University, Spring 2013

48


Class 3 feature coding and pooling

Fisher Vector and Supervector

Interpretation with Supervector

Origin distribution

New distribution

Picture from Reynolds, Quatieri, and Dunn, DSP, 2001

Visual Recognition And Search

Columbia University, Spring 2013

49


Class 3 feature coding and pooling

Fisher Vector and Supervector

Normalization of Supervector

In practice, a normalization process using the

covariance matrix often improves the performance

Moreover, we can subtract the original mean vector forthe ease of normalization

The representation is also called Hierarchical Gaussianization

(HG).

Visual Recognition And Search

Columbia University, Spring 2013

50


Class 3 feature coding and pooling

Fisher Vector and Supervector

Fisher Vector

Let

be the probability density function with para

[Jaakkola and Haussler , NIPS 98] suggested X can be

described by the derivative subject to

Now we can define the Fisher Kernel

where

is called Fisher information matrix

Visual Recognition And Search

Columbia University, Spring 2013

51


Class 3 feature coding and pooling

Fisher Vector and Supervector

Fisher Vector with GMM

Consider the Gaussian Mixture Model

We consider

With GMM, Fisher vectors can be obtained:

Let

The Fisher vector

Visual Recognition And Search

Columbia University, Spring 2013

52


Class 3 feature coding and pooling

Fisher Vector and Supervector

Comparison

• Supervector

• Fisher vector

Visual Recognition And Search

Columbia University, Spring 2013

53


Class 3 feature coding and pooling

Fisher Vector and Supervector

Comparison

Diagonal covariance matrix

Posterior estimation of

Diagonal covariance with same derivation

The two representations are almost the same even with

different motivations.

Visual Recognition And Search

Columbia University, Spring 2013

54


Class 3 feature coding and pooling

Fisher Vector and Supervector

How to Code Your Own

• Learn from existing code

http://lear.inrialpes.fr/src/inria_fisher/ (Linux or Mac)

• Learn from public GMM code

• Be careful with pitfalls

- Probability is comparable to machine’s rounding error:compute logP instead of P

- Try different normalization strategy

- Try to make the code efficient

Visual Recognition And Search

Columbia University, Spring 2013

55


Class 3 feature coding and pooling

Summary

Uncertainty-

Fisher vector/

Histogram

Sparse

Based

Supervector

Coding

of SIFT

Quantization

Coding

GMM

Vector

Soft

Soft

probability

quantization quantizationestimation

quantization

Pooling

HistogramGMM

Histogramaggregation

Max poolingadptation

aggregation

Visual Recognition And Search

Columbia University, Spring 2013

56


Class 3 feature coding and pooling

Visual Recognition And Search

Columbia University, Spring 2013

57


Class 3 feature coding and pooling

Summary

• Simplest model: histogram of visual words

• Models with good illustration: PLSA, LDA, S-LTM, …

• Soft quantization: soft quantization and sparse coding

• Very good performance: fisher vector or super vector

Visual Recognition And Search

Columbia University, Spring 2013

58


  • Login