Bilinear Deep Learning for Image Classification

Bilinear Deep Learning for Image Classification Sheng-hua Zhong, Yan Liu, Yang Liu Department of Computing The Hong Kong Polytechnic University

Outline • Introduction • Research progress on deep learning • Proposed algorithm • Architecture of BDBN • Learning stages of BDBN • Bilinear discriminant initialization • Greedy layer-wise reconstruction • Global fine-tuning • Experiments and results • Experiment on Handwriting Dataset MNIST • Experiment on Complicated Object Dataset Caltech 101 • Experiments on Face Dataset CMU PIE • Conclusion and future work Bilinear Deep Learning for Image Classification

Introduction • Definition of image classification • A classical problem in multimedia content analysis, aims to understand the semantic meaning of visual information and determine the category of the images according to some predefined criteria • Related work on image classification • Parametric classifiers • Require an intensive training phase of the classifier parameters • SVM [Kumar et al, ICCV, 2007], Boosting [Opelt et al, ECCV, 2004], decision trees [Bosch et al, ICCV, 2007], web graphs [Mahaian et al, ACMMM, 2010] • Nonparametric classifiers • Make classification decisions directly on the data, and require no training of parameters [Boiman et al, CVPR, 2008] Bilinear Deep Learning for Image Classification

Research Progress on Deep Learning • Definition of deep learning • Models learning task using deep architectures composed of multiple layers nonlinear modules • Deep belief network (DBN) • A densely-connected, directed belief nets • Two stages: abstract input information layer by layer and fine-tune the whole deep network to the ultimate learning target [Hinton et al, NC, 2006] • Research progress • Deep architectures are thought as the best exemplified by neural networks [Cottrell, science, 2006] • DBN exhibits notable performance in different tasks, such as dimensionality reduction [Hinton et al, science, 2006] and classification [Salakhutdinov et al, AISTATS, 2007] Bilinear Deep Learning for Image Classification

Architecture of Deep Belief Network 1. The initial weighted connections are randomly constructed. 2. The size of every layer is determined based on intuition. 3. The parameter space is refined by the greedy layer-wise information reconstruction. 4. Repeat first to third stages until the parameter space in all layers is constructed. 5. The whole model is fine-tuned to minimize the classification error based on backpropagation. Fig. Structure of the deep belief network (DBN). Dimensionality Reduction for Video Content Analysis

Architecture of Bilinear Deep Belief Network 1. The initial weighted connections are constructed between adjacent layers based on the discriminant information. 2. The size of the next layer is determined based on the optimum dimension to retain discriminant information. 3. The parameter space is refined by the greedy layer-wise information reconstruction. 4. Repeat first to third stages until the parameter space in all layers is constructed. 5. The whole model is fine-tuned to minimize the classification error based on backpropagation Bilinear Deep Learning for Image Classification

Bilinear Discriminant Initialization • Latent representation with projection matrices U and V • Preserve discriminant information in the projected feature space by optimizing the objective function between-class weights within class weights Bilinear Deep Learning for Image Classification

Greedy Layer-Wise Reconstruction • Obtain the structure and initial weighted connections of the next layer based on bilinear discriminant information of the previous layer • A joint configuration ( , ) of the input layer and the first hidden layer has energy • Utilize the Contrastive Divergence algorithm to update the parameter Bilinear Deep Learning for Image Classification

Greedy Layer-Wise Reconstruction • The joint and conditional distributions between and • The log probability of the model assigned to in • Utilize the Contrastive Divergence algorithm to calculate the derivative with respect to the parameter • Update the parameter space Bilinear Deep Learning for Image Classification

Global Fine-Tuning by Backpropagation • Backward propagation of errors (backpropagation) • A better fine-tuning algorithm than global search • Limitation • The convergence obtained from backpropagation learning is very slow • The convergence in backpropagation learning is not guaranteed • The result may generally converge to any local minimum on the error surface • The backpropagation learning is associated with the problem of scaling • In proposed model • A search based on the bilinear discriminant initialization has been performed for a sensible and good region in the whole parameter space • Backpropagation adjusts the entire deep network to find good local optimum parameters Bilinear Deep Learning for Image Classification

Algorithm Bilinear Deep Learning for Image Classification

Experiment Setting Datasets Standard hand written digits dataset MNIST 60,000 training images,10,000 test images with the resolution of Subset of Caltech101 2,935 images from the first 5 categories CMU PIE dataset 11560 face images varying pose, illumination and expression of 68 subjects with the resolution of Compared algorithms K-nearest neighbor (KNN) Support vector machines (SVM) Transductive SVM (TSVM) [Collobert et al, JMLR, 2006] Neural network (NN) EmbedNN [Weston et al, ICML, 2008] Semi-DBN [Bengio et al, NIPS, 2006] DBN-rNCA [Salakhutdinov et al, AISTATS, 2007] DDBN [Liu et al, PR, 2009] DCNN [Jarrett et al, ICCV, 2009] Bilinear Deep Learning for Image Classification 16 16

Experiment on MNIST Sample images Classification experiment Table Classification accuracy rate (%) on the test data with different numbers of labeled data on MNIST Bilinear Deep Learning for Image Classification 17 17

Responses of V1 neurons Selective spatial information filters Similar to spatially local, complex Fourier transforms, Gabor transforms Weights of proposed BDBN Roughly represent different “strokes” Oriented, Gabor-like and resemble the receptive fields of V1 simple cells Simulate Primary Visual Cortex Samples of first layer weights Examples represent “strokes” of digital Bilinear Deep Learning for Image Classification 18 18

Sample images Classification experiment Experiment on Caltech 101 Table Classification accuracy rate (%) on the test data with different numbers of labeled data on Caltech 101 Bilinear Deep Learning for Image Classification 19 19

Efficiency Comparison Convergence of proposed BDBN with two other deep learning models both have the fine-tuning stage Fig. Convergence curve of Semi-DBN, DDBN and proposed BDBN on Caltech101 Bilinear Deep Learning for Image Classification 20 20

Experiments on CMU PIE Sample images Classification experiment Fig. Classification Accuracy rate (%) with different number of labeled data and different extents of noise. Bilinear Deep Learning for Image Classification 21 21

Layer-wise Reconstruction of BDBN Fig. The reconstruction in every layer. The first row shows the noisy images. The results of every layer of reconstruction are shown from the second to the fourth row. The original images are shown in the fifth row. Bilinear Deep Learning for Image Classification 22 22

Automatically Reinforce Important Features (a) Facial feature points (b) Reinforce regions are identical to facial feature regions Fig. Samples of first layer weights learned by BDBN, and the consistency of these weights with facial feature points. Bilinear Deep Learning for Image Classification 23 23

Conclusion and Future Work • Conclusion • Propose a novel deep learning model BDBN for classical multimedia task: image classification • The bilinear discriminant initialization of BDBN not only prevents the propagation of information from falling into a bad local optimum but also provides a more meaningful setting for deep architecture • The semi-supervised learning ability of BDBN causes the proposed deep techniques to work well with an insufficient number of labeled data • Future work • Utilizing deep learning for multimedia content analysis in a large scale dataset with noisy tags Bilinear Deep Learning for Image Classification

Reference [1] A. Opelt, M. Fussenegger, A. Pinz, and P. Auer, “Weak hypotheses and boosting for generic object detection and recognition”, In ECCV, 2004. [2] G. E. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, pp.1527-1554, 2006. [3] G. W. Cottrell, “New life for neural networks,” Science, vol. 313, pp. 454-455, July, 2006. [4] A. Kumar , C. Sminchisescu, “Support kernel machines for object recognition”, In ICCV, 2007. [5] A. Bosch, A. Zisserman, X. Munoz, “Image classification using random forests and ferns”, In ICCV, 2007. [6] R. R. Salakhutdinov and G. E. Hinton, “Learning a nonlinear embedding by preserving class neighbourhood structure,” in Proceedings of Eleventh International Conference on Artificial Intelligence and Statistics, 2007. [7] Y. Bengio, and Y. LeCun, “Scaling learning algorithms towards AI,” Large-Scale Kernel Machines, 2007. [8] E. K. Chen, X. K. Yang, H.Y. Zha, R. Zhang, and W. J. Zhang, “Learning object classes from image thumbnails through deep neural networks,” International Conference on Acoustics, Speech and Signal Processing, Las Vegas, NV, 2008. [9] Srenivas Varadarajan and Lina J. Karam, "An improved perception-based no-rReference objective image sharpness metric using iterative edge refinement," IEEE International Conference on Image Processing, pp. 401-404,Oct. 2008. [10] L. Ballan, A. Bazzica, M. Bertini, A. D. Bimbo, and G. Serra, “Deep networks for audio event classification in soccer videos,” IEEE International Conference on Multimedia & Expo, 2009. [11] J. Weston, F. Ratle, R. Collobert, “Deep learning via semi-supervised embedding”, In ICML, 2008. [12] Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, “Greedy layer-wise training of deep networks”, In NIPS, 2006. [13] R.R. Salakhutdinov, G.E. Hinton, “Learning a nonlinear embedding by preserving class neighbourhood structure”, In AISTATS, 2007. [14] Y. Liu, S. Zhou, and Q. Cheng, “Discriminative Deep Belief Networks for Classification with Few Labeled Data,” In PR., 2010. [15] K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y.L. Cun, “What is the best multi-stage architecture for object recognition?”, In ICCV, 2009. [16] D. Mahajan, and M. Slaney, “Image classification using the web graph”, In ACMMM, 2010. Bilinear Deep Learning for Image Classification

Q & A Thank You ! Bilinear Deep Learning for Image Classification 27

Bilinear Deep Learning for Image Classification

Bilinear Deep Learning for Image Classification

Presentation Transcript

Image Classification

Image Classification

Feature learning for image classification

Image classification

DIGITAL IMAGE CLASSIFICATION

Image Classification

Semi Automatic Image Classification through Image Segmentation for Land Cover Classification

Transfer Learning Algorithms for Image Classification

Image Classification

Image Classification

Transfer Learning for Image Classification

Bilinear Deep Learning for Image Classification

Deep Epitomic Nets and Scale /Position Search for Image Classification

Image Classification

Deep Convolutional Neural Network for Hyperspectral Image Classification

INF 5860 Machine learning for image classification

Deep Image Prior

Image Classification 영상분류

Deep Learning for X ray Image to Text Generation

Image Classification using Deep Learning