1 / 36

Decoding Human Face Processing

Decoding Human Face Processing. Ankit Awasthi Prof. Harish Karnick. Motivation. One of the most important goals of Computer Vision researchers is to come up with a algorithm which can process face images and classify into different categories (based on gender, emotions, identity etc.)

brier
Download Presentation

Decoding Human Face Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decoding Human Face Processing AnkitAwasthi Prof. Harish Karnick

  2. Motivation • One of the most important goals of Computer Vision researchers is to come up with a algorithm which can process face images and classify into different categories (based on gender, emotions, identity etc.) • Human are extremely good at these tasks • In order to match human performace and eventually beat it, it is imperative that we understand how humans do it

  3. Motivation • Moreover, similar cognitive processes might be involved in processing of other kinds of visual data or even data from other modalities • Discovery of computational basis of face processing might be a good indication of generic cognitive structures

  4. Where does our work fit in?? • A large number of neurological and psychological experimental findings • Implications for computer vision algorithms • Closing the loop

  5. Neural Networks (~1985) Compare outputs with correct answer to get error signal Back-propagate error signal to get derivatives for learning outputs hidden layers input vector

  6. Why Deep Learning?? • Brains have a deep architecture • Humans organize their ideas hierarchically, through composition of simpler ideas • Insufficiently deep architectures can be exponentially inefficient • Deep architectures facilitate feature and sub-feature sharing

  7. Restricted Boltzmann Machines (RBM) • We restrict the connectivity to make learning easier. • Only one layer of hidden units. • No connections between hidden units. • Energy of a joint configuration is defined as • (for binary visible units) • (for real visible units) Hidden(h) j i Visible(v)

  8. Training a deep network

  9. Sparse DBNs(Lee at. al. 2007) • In order to have a sparse hidden layer, the average activation of a hidden unit over the training is constrained to a certain small quantity • The optimization problem in the learning algorithm would look like

  10. Oriented edge detectors using DBNs

  11. Important observations about DBNs • We found that in our experiments that • Fine tuning was important only for construction of autoencoder • The final softmax layer can be learned on top of the learned with marginal loss in accuracy • Fine tuning the autoencoder is important

  12. Neural Underpinnings(Sinha et. al., 2006) • The human visual system appears to devote specialized resources for face perception • Latency of responses to faces in infero-temporal cortex is about 120 ms, suggesting a largely feed-forward computation • Facial identity and emotion might be processed separately • One of the reasons, we restricted ourselves to emotion and gender classification

  13. Experiments and Dataset • Gender and Emotion Recognition (happy,neutral) • Training images • 300, 50x50 images used • Test images • 98,50x50 images used

  14. Results on Normal images • Same network architecture used for all experiments (3000->1000->500->200->100) • Gender Recognition • 94% • Emotion Recognition • 93%

  15. Low vs High Spatial Frequency • A number of contradictory results • General Consensus • Low spatial frequency is more important than higher spatial frequencies • Hints at the importance of configural information • High frequency information by itself does not lead to good performance • How to reconcile this with observed recognizability of line drawings in everyday experience • Spatial frequency employed for emotions is higher than that employed for gender classification (Deruelle and Fagot,2004)

  16. Experiments • We cut off all spatial frequencies above 8cycles per face • Two cases each in gender and emotion recognition • A model trained on ‘normal’ images is tested on low spatial frequency images • A model trained on low spatial frequency images is tested on low spatial frequency images

  17. Results • Gender Recognition • Model trained on ‘normal’ images ~ 89% • Model trained on LSF images ~ 91% • Emotion Recognition • Model trained on ‘normal’ images ~ 87% • Model trained on LSF images ~ 90.5%

  18. Discussion • The decrease in the accuracy is not much considering the significant reduction in the amount of information • Implies low spatial frequency information can be used to classify a majority of images • Tests with different spatial frequencies need to be done to reach a conclusive answer • Importance of HSF is not apparent here because of the simplicity of the task • In some other experiments where we looked at only HSF images, the results weren’t good!

  19. Component and Configural Information • Facial features are processed holistically in recognition (Sinha et. al,2006) and in emotion recognition (Durand et. al., 2007) • The configural information affects how individual features are processed • On the other hand, there is evidence that we process face images by matching parts • Thatcher illusion Configural information affects individual features are processed

  20. Thatcher Illusion

  21. Experiments • Two kinds experiments • Models trained on ‘normal’ images tested on new images • Same set of training and test images

  22. Results (Gender Classification) • Models trained on ‘normal’ images ~ 91% ~80% ~ 70% ~ random!!

  23. Results(Gender Classification) • Same training and test images ~ 93% ~ 85% ~ 79%

  24. Results (Emotion Classification) • Models trained on ‘normal’ images ~ 87% ~81% ~ 87% ~ random!!

  25. Results(Emotion Classification) • Same training and test images ~ 92% ~ 84% ~ 82%

  26. Agreement with Human Performance • Preliminary results show that humans are • Perfect in case of normal images we are using • Error prone when the parts are removed (3 out of 20 images on an average) • Accuracy depends a lot upon the time of exposure • Proper timed experiments are expected yield results much similar to the algorithm

  27. Discussion • The importance of important features (eyes,mouth) evident • Eyes/eyebrows are important for gender recognition • When trained on ‘normal’ images the algorithm learns features corresponding to important parts • In absence of these features the algorithm learns to extract other features to increase

  28. Inversion Effect • One of the first findings which hinted at a dedicated face processing pathway • Another indicator configuralprccessing of face images • Inverted images take significantly longer to process

  29. Experiments and Results • Models trained on ‘normal’ images • The results are “random”!! • Training and testing on inverted is same doing it for ‘normal’ images • Results show that the face image processing is not part based

  30. Thatcher Illusion

  31. Experiment and Results • Models trained on ‘normal’ images • Random for both tasks!! • Same training and test images • Gender: 92% • Emotion: 91%

  32. High Level Features • Only few connections to the previous layer have their weights either too high or too low • Some of the largest weighted connections are used for linear combination • Overlooks the non-linearity in the network from one layer to the other

  33. Natural Extensions • More exhaustive set of experiments need to be done to verify our preliminary observations • It would be interesting to compare other models with Deep networks • Some of the problems or inconsistencies are due to lack of translation invariant features • Best solution is to use a Convolutional Model • Natural regularizer • Translational invariance • Biologically plausible

  34. Conclusion • We have done preliminary investigations with respect to various phenomenon • Observed results certainly hint at the cognitive relevance of the model

  35. References • Georey E. Hinton, Yee-WhyeTeh and Simon Osindero, A Fast Learning Algorithm forDeep Belief Nets. Neural Computation, pages 1527-1554, Volume 18, 2006. • Georey E. Hinton (2010). A Practical Guide to Training Restricted Boltzmann Machines,TechnicalReport,Volume 1 • DumitruErhan, YoshuaBengio, Aaron Courville, and Pascal Vincent (2010). Visualizing Higher-Layer Features of a Deep Network,Technical Report 1341 • Honglak Lee, Roger Grosse,RajeshRanganath, Andrew Y. Ng. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations,ICML 2009 • Geoffrey E. Hinton Learning multiple layers of representation,Trends in Cognitive Sciences Vol.11 No.10 ,2006 • Honglak Lee, ChaitanyaEkanadham, Andrew Y. Ng, Sparse deep belief net model for visual area V2, NIPS,2007

  36. References • Olshausen BA, Field DJ (1997) Sparse Coding with an Overcomplete Basis Set: A Strategy Employed by V1? Vision Research, 37: 3311-3325. • Karine Durand , Mathieu Gallay, AlixSeigneuric ,FabriceRobichon , Jean-Yves Baudouin ,The development of facial emotion recognition:The role of configural information, Jornal of child Psychology, 2007 • PrawalSinha, Benjamin Balas, Yuri Ostrovsky, Richard Russell,Face Recognition by Humans: Nineteen results all Computer Vision Researchers should know, • Christian Wallraven, Adrian Schwaninger,Heinrich H. Bulltoff, Learning from humans, Computational modeling of face recognition, Computation in Neural Systems • Christine Duerelle and Joel Faggot ,Categorizing facial indentities,emotions, and genders : Attention to high and low-spatial frequencies by children and adults

More Related