a theoretical analysis of feature pooling in visual recognition n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Theoretical Analysis of Feature Pooling in Visual Recognition PowerPoint Presentation
Download Presentation
A Theoretical Analysis of Feature Pooling in Visual Recognition

Loading in 2 Seconds...

play fullscreen
1 / 13

A Theoretical Analysis of Feature Pooling in Visual Recognition - PowerPoint PPT Presentation


  • 123 Views
  • Uploaded on

A Theoretical Analysis of Feature Pooling in Visual Recognition. Y-Lan Boureau, Jean Ponce and Yann LeCun ICML 2010. Presented by Bo Chen. Outline. 1. Max-pooling and average pooling 2. Successful stories about max-pooling 3. Pooling binary features 4. Pooling continuous sparse codes

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

A Theoretical Analysis of Feature Pooling in Visual Recognition


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a theoretical analysis of feature pooling in visual recognition

A Theoretical Analysis of Feature Pooling in Visual Recognition

Y-Lan Boureau, Jean Ponce and Yann LeCun

ICML 2010

Presented by Bo Chen

outline
Outline
  • 1. Max-pooling and average pooling
  • 2. Successful stories about max-pooling
  • 3. Pooling binary features
  • 4. Pooling continuous sparse codes
  • 5. Discussions
max pooling and average pooling
Max-pooling and Average-pooling

Max-pooling: take the maximum value in each block

Average-pooling: average all values in each block

After Pooling

successful stories about max pooling
Successful Stories about Max-Pooling
  • 1. Vector quantization + Spatial pyramid model (Lazebnik et al.,2006)
  • 2. Pooling spatial pyramid model after sparse coding or Hard quantization (Yang et al.,2009, Y-Lan Boureau., 2010)
  • 3. Convolutional (deep) Networks (LeCun et al., 1998, Ranzato et al., 2007, Lee et al, 2009)
why pooling
Why Pooling?
  • 1. In general terms, the objective of pooling is to transform the joint feature representation into a new, more usable one that preserves important information while discarding irrelevant detail, the crux of the matter being to determine what falls in which category.
  • 2. Achieving invariance to changes in position or lighting conditions, robustness to clutter, and compactness of representation, are all common goals of pooling.
pooling binary features
Pooling Binary Features

Model: Consider two distributions. The larger the distance of their means,

(or the smaller the variance), the better their separation.

Notations:

1. If the unpooled data is a P × k matrix of 1-of-k codes taken at P locations,

we extract a single P-dimensional column v of 0s and 1s, indicating the

absence or presence of the feature at each location.

2. The vector v is reduced by a pooling operation to a single scalar f(v)

3. Max pooling:

4. Given two classes C1 and C2, we examine the separation of conditional

distributions ,

Average pooling:

distribution separability

Average pooling: The sum over P i.i.d. Bernoulli variables of mean α follows a

binomial distribution B(P, α).

:

Consequently: The expected value of fa is independent of sample size P, and

the variance decreases like 1/P; therefore the separation ratio of means’ difference

over standard deviation decreases monotonically like

Max pooling:

: Bernoulli variable

The separation of class-conditional expectations of max-pooled features:

Distribution Separability

There exists a range of pooling cardinalities for which the distance is greater with max

pooling than average pooling if and only if PM > 1.

For variance, it’s increasing then decreasing and reaching its maximum of 0.5 at

empirical experiments and predictions
Empirical Experiments and Predictions
  • Max pooling is particularly well suited to the separation of features that are very
  • sparse (i.e., have a very low probability of being active).
  • 2. Using all available samples to perform the pooling may not be optimal.
  • 3. The optimal pooling cardinality should increase with dictionary size.
experiments about optimal pooling cardinality
Experiments about Optimal Pooling Cardinality

Empirical: an empirical average of the max over different subsamples.

Expectation: , here P has been a parameter.

pooling continuous sparse codes
Pooling Continuous Sparse Codes

Two Conclusions:

1. when no smoothing is performed, larger cardinalities provide a better

signal-to-noise ratio.

2. this ratio grows slower than when simply using the additional samples

to smooth the estimate.

transition from average to max pooling
Transition from Average to Max Pooling

1. P-norm:

2. Softmax Function:

3.

discussions
Discussions
  • By carefully adjusting the pooling step of feature extraction, relatively
  • simple systems of local features and classifiers can become competitive
  • to more complex ones.
  • 2. For binary case, max pooling may account for this good performance,
  • and shown that this pooling strategy was well adapted to features with a
  • low probability of activation.
  • (1) use directly the formula for the expectation of the maximum to obtain
  • a smoother estimate in the case of binary codes;
  • (2) pool over smaller samples and take the average.
  • 3. When using sparse coding, some limited improvement may be obtained
  • by pooling over subsamples of smaller cardinalities and averaging, and
  • conducting a search for the optimal pooling cardinality, but this is not always
  • the case.