1 / 30

Nicolas Pinto Massachusetts Institute of Technology David Cox

Beyond Simple Features: A Large-Scale Feature Search Approach to Unconstrained Face Recognition. International Conference on Automatic Face and Gesture Recognition (FG), 2011. Nicolas Pinto Massachusetts Institute of Technology David Cox

ehren
Download Presentation

Nicolas Pinto Massachusetts Institute of Technology David Cox

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Beyond Simple Features: A Large-Scale Feature Search Approach to Unconstrained Face Recognition International Conference on Automatic Face and Gesture Recognition (FG), 2011. Nicolas Pinto Massachusetts Institute of Technology David Cox The Rowland Institute at Harvard, Harvard University

  2. Outline • Introduction • Method • V1-like visual representation • High-throughput-derived multilayer visual representations • Kernel Combination • Experiment Result • Discussion

  3. Introduction • “Biologically-inspired” representation • capture aspects of the computational architecture of the brain and mimic its computational abilities

  4. Introduction • Large Scale Feature Search Framework • Generate models with different parameters then screening

  5. Method - V1-like visual representation • “Null model” - only represent first-order description of the primary visual cortex • Detail • Preprocessing: resize image to 150 pixels with aspect ratio preserved using bicubic interpolation • Input normalization: divide each pixel’s intensity value by the norm of the pixels in the 3x3 neighboring region • Gabor wavelet: 16 orientation, 6 spatial frequencies • Output normalization: divide by the norm of the pixels in the 3x3 neighboring region • Thresholding and Clipping: output value not in (0,1) is set to {0,1}

  6. V1-like visual representation • Gabor Filter

  7. Method - High-throughput-derived multilayer visual representations • Model architecture: • Candidate models were composed of a hierarchyof two (HT-L2) or three layers (HT-L3)

  8. High-throughput-derived multilayer visual representations • Input size • HT-L2: 100 x 100 pixels • HT-L3: 200 x 200 pixels • Input was converted into grayscale and locally normalized

  9. High-throughput-derived multilayer visual representations • Linear Filter • linearly filtered using a bank of filters to produce a stack of feature maps • this operation is analogous to the weighted integration of synaptic inputs, where each filter in the filterbank represents a different cell

  10. High-throughput-derived multilayer visual representations • Linear Filter (cont.) • Parameter: • The filter shapes were chosen randomly with {3, 5, 7, 9}, • Depending on the layer l considered, the number of filters was chosen randomly from the following sets: • In , • In , • In , • All filter kernels were fixed to random values drawn from a uniform distribution

  11. High-throughput-derived multilayer visual representations • Activation Function • Output values were clipped to be within a parametrically defined

  12. High-throughput-derived multilayer visual representations • Activation Function (cont.) • Parameter: • was randomly chosen to be or 0 • was randomly chosen to be 1 or

  13. High-throughput-derived multilayer visual representations • Pooling • neighboring region were then pooled together and the resulting outputs were spatially downsampled

  14. High-throughput-derived multilayer visual representations • Pooling (cont.) • Parameter: • The stride parameter was fixed to 2, resulting in a downsamplingfactor of 4. • The size of the neighborhood was randomly chosen from {3, 5, 7, 9}. • The exponent was randomly chosen from {1, 2, 10}. • = 1, equivalent to blurring • = 2 or 10, is -norm

  15. High-throughput-derived multilayer visual representations • Normalization • Draws biological inspiration from the competitive interactions observed in natural neuronal systems (e.g. contrast gain control mechanisms in cortical area V1, and elsewhere)

  16. High-throughput-derived multilayer visual representations • Normalization (cont.) • Parameter: • The size of the neighborhood region was randomly chosen from {3, 5, 7, 9} • The parameter was chosen from {0, 1} • The vector of neighboring values could also be stretched by gain values {, , } • The threshold value was randomly chosen from , , }

  17. Method - Evaluation • Binary hard-margin linear SVM • 4 feature vector

  18. Method • Model overview

  19. Method – Screening • Screening (model selection) • Select the best five models on LFW View1 aligned Set • Output dimension are ranged from 256 to 73984 • Number of models: • HT-L2 : 5915 • HT-L3 : 6917

  20. Feature Augmentation • Multiple rescaled crops • Three different centered crops • 250x250 • 150x150 • 125x75 • Resized to the standard input size • Train SVMs separately

  21. Kernel Combination • Three strategies • Blend kernels result from different crops • Simple kernel addition with each kernel being trace-normalized • Blend 5 models within the same class • Hierarchical blends across model class • Assign exponentially larger weight to higher-level representation (V1-like < HT-L2 < HT-L3)

  22. Kernel Combination • Kernel Method Example:

  23. Kernel Combination • The original formulation • Is Equivalent

  24. Kernel Combination • Multiple Kernel Learning (MKL) • learn the kernel directly from data

  25. Kernel Combination • Multiple Kernel Learning (MKL)

  26. Experiment • Screen model on LFW View1 • Train SVM and evaluate result using 10-cross validation on LFW View 2

  27. Result

  28. Result • Some error cases

  29. Discussion • Use whole image pixel value, not dealing with pose variation • take advantage on background information ? • Disturb by background • Performance increase when addingdifferent crops

  30. 16-GPU Monster-Class Supercomputer • Environment • GNU/Linux • Python, C, C++, Cython • CUDA, PyCuda

More Related