Learning Mid-Level Features For Recognition

Learning Mid-Level Features For Recognition Y-Lan Boureau, Francis Bach, Yann LeCun and Jean Ponce Published in CVPR 2010 Presented by Bo Chen, 8.20,2010

Outline • 1. Classification System • 2. Brief introduction of each step • 3. Systematic evaluation of unsupervised mid-level features • 4. Learning discriminative dictionaries • 5. Average and max pooling • 6. Conclusions

Patches SIFT Coding Pooling Classifier SPM System Flow Chart Subsampling sliding patches to cover all of details Robust features invariant to some conditions Generate mid-level feature, such as sparse coding, vector quantization and deep network Max-pooling or average pooling Spatial pyramid model Linear or nonlinear SVM

Scale Invariant Feature Transform (D. Lowe, IJCV,2004) Motivations: Image matching, scale invariance, rotation invariance, illumination invariance and viewpoint invariance Figures from David Lee’s ppt

Calculate SIFT Descriptors Divide a 16x16 patch into 4 subregions, 8 bins in each subregion which leads to a 4x4x8=128 dimensional vector. (low-level) Figures from Jason Clemons’s ppt

Notations Question: How can we represent each region? Figure from S. Lazebnik et.al, CVPR06

Coding and Pooling • Vector quantization (Bag-of-features) Or • Sparse Coding

Systematic evaluation of unsupervised mid-level features

Macrofeatures and Denser SIFT Sampling • Parameterizations: • SIFT sampling density • macrofeature side length • subsampling parameter Results: Caltech101: 75.7% (4, 2, 4) Scene: 84.3% (8, 2, 1)

Results

Discriminative Dictionaries Algorithm: stochastic gradient descent Cons: high computational complexity Solutions: 1.approximate z(n) by pooling over a random sample of ten locations of the image. 2. Update only a random subset of coordinates at each iteration. Scenes dataset

Average and Max Pooling • Why pooling? • Pooling is used to achieve invariance to image transformations, • more compact representations, and better robustness to noise and • clutter so as to preserves important information while discarding • irrelevant detail, the crux of the matter being to determine what falls • in which category. • Max-pooling vs. Average-pooling • The authors show that using max pooling on hard vector quantized • features in a spatial pyramid brings the performance of linear • classification to the level of that obtained by Lazebnik et al. (2006) • with an intersection kernel, even though the resulting feature is • binary. • Our feeling • Pooling helps the learned codes sparse, which follows the human • visual function. Especially, for convolutional deep network, pooling • appears very necessary since there is correlations between neighbor • contents. Part conclusion from Y-Lan Boureau et. al, ICML 2010

Theoretical Comparison of Average and Max Pooling Experimental methodology: Binary classification (positive and negative)

Conclusions • 1. Give a comprehensive and systematic comparison across each step of mid-level feature extraction through several types of coding modules (hard and soft vector quantization, sparse coding) and pooling schemes (by taking the average, or the maximum), which obtains state-of-the-art performance or better on several recognition benchmarks. • 2. Supervised dictionary learning method for sparse coding • 3. Theoretical and empirical insight into the remarkable performance of max pooling.

Learning Mid-Level Features For Recognition

Learning Mid-Level Features For Recognition

Presentation Transcript

NRHM - Assam Issues for mid course learning

Spectral Features for Automatic Text-Independent Speaker Recognition

Mid-level Representations for Images and Videos

Professional Recognition for staff delivering HE level work

Instance-level recognition II.

Community Learning Recognition

Mid-Level Manager’s Conference

Recognition Using SIFT Features

Recognition of Prior Learning

Bag-of-features for category recognition

Bag-of-features for category recognition

Object Recognition with Invariant Features

MID- LEVEL

Arkansas Mid-Level FBLA

Features for handwriting recognition

Learning Structured Models for Phone Recognition

Learning Long-Term Temporal Features for Conversational Speech Recognition

Object Recognition with Invariant Features

Bag-of-features for category recognition

Recognition Using SIFT Features

NRHM - Assam Issues for mid course learning