1 / 27

Bilinear Deep Learning for Image Classification

Bilinear Deep Learning for Image Classification. Sheng-hua Zhong , Yan Liu, Yang Liu Department of Computing The Hong Kong Polytechnic University. Outline. Introduction Research progress on deep learning Proposed algorithm Architecture of BDBN Learning stages of BDBN

frieda
Download Presentation

Bilinear Deep Learning for Image Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bilinear Deep Learning for Image Classification Sheng-hua Zhong, Yan Liu, Yang Liu Department of Computing The Hong Kong Polytechnic University

  2. Outline • Introduction • Research progress on deep learning • Proposed algorithm • Architecture of BDBN • Learning stages of BDBN • Bilinear discriminant initialization • Greedy layer-wise reconstruction • Global fine-tuning • Experiments and results • Experiment on Handwriting Dataset MNIST • Experiment on Complicated Object Dataset Caltech 101 • Experiments on Face Dataset CMU PIE • Conclusion and future work Bilinear Deep Learning for Image Classification

  3. Outline • Introduction • Research progress on deep learning • Proposed algorithm • Architecture of BDBN • Learning stages of BDBN • Bilinear discriminant initialization • Greedy layer-wise reconstruction • Global fine-tuning • Experiments and results • Experiment on Handwriting Dataset MNIST • Experiment on Complicated Object Dataset Caltech 101 • Experiments on Face Dataset CMU PIE • Conclusion and future work Bilinear Deep Learning for Image Classification

  4. Introduction • Definition of image classification • A classical problem in multimedia content analysis, aims to understand the semantic meaning of visual information and determine the category of the images according to some predefined criteria • Related work on image classification • Parametric classifiers • Require an intensive training phase of the classifier parameters • SVM [Kumar et al, ICCV, 2007], Boosting [Opelt et al, ECCV, 2004], decision trees [Bosch et al, ICCV, 2007], web graphs [Mahaian et al, ACMMM, 2010] • Nonparametric classifiers • Make classification decisions directly on the data, and require no training of parameters [Boiman et al, CVPR, 2008] Bilinear Deep Learning for Image Classification

  5. Outline • Introduction • Research progress on deep learning • Proposed algorithm • Architecture of BDBN • Learning stages of BDBN • Bilinear discriminant initialization • Greedy layer-wise reconstruction • Global fine-tuning • Experiments and results • Experiment on Handwriting Dataset MNIST • Experiment on Complicated Object Dataset Caltech 101 • Experiments on Face Dataset CMU PIE • Conclusion and future work Bilinear Deep Learning for Image Classification

  6. Research Progress on Deep Learning • Definition of deep learning • Models learning task using deep architectures composed of multiple layers nonlinear modules • Deep belief network (DBN) • A densely-connected, directed belief nets • Two stages: abstract input information layer by layer and fine-tune the whole deep network to the ultimate learning target [Hinton et al, NC, 2006] • Research progress • Deep architectures are thought as the best exemplified by neural networks [Cottrell, science, 2006] • DBN exhibits notable performance in different tasks, such as dimensionality reduction [Hinton et al, science, 2006] and classification [Salakhutdinov et al, AISTATS, 2007] Bilinear Deep Learning for Image Classification

  7. Architecture of Deep Belief Network 1. The initial weighted connections are randomly constructed. 2. The size of every layer is determined based on intuition. 3. The parameter space is refined by the greedy layer-wise information reconstruction. 4. Repeat first to third stages until the parameter space in all layers is constructed. 5. The whole model is fine-tuned to minimize the classification error based on backpropagation. Fig. Structure of the deep belief network (DBN). Dimensionality Reduction for Video Content Analysis

  8. Outline • Introduction • Research progress on deep learning • Proposed algorithm • Architecture of BDBN • Learning stages of BDBN • Bilinear discriminant initialization • Greedy layer-wise reconstruction • Global fine-tuning • Experiments and results • Experiment on Handwriting Dataset MNIST • Experiment on Complicated Object Dataset Caltech 101 • Experiments on Face Dataset CMU PIE • Conclusion and future work Bilinear Deep Learning for Image Classification

  9. Architecture of Bilinear Deep Belief Network 1. The initial weighted connections are constructed between adjacent layers based on the discriminant information. 2. The size of the next layer is determined based on the optimum dimension to retain discriminant information. 3. The parameter space is refined by the greedy layer-wise information reconstruction. 4. Repeat first to third stages until the parameter space in all layers is constructed. 5. The whole model is fine-tuned to minimize the classification error based on backpropagation Bilinear Deep Learning for Image Classification

  10. Bilinear Discriminant Initialization • Latent representation with projection matrices U and V • Preserve discriminant information in the projected feature space by optimizing the objective function between-class weights within class weights Bilinear Deep Learning for Image Classification

  11. Greedy Layer-Wise Reconstruction • Obtain the structure and initial weighted connections of the next layer based on bilinear discriminant information of the previous layer • A joint configuration ( , ) of the input layer and the first hidden layer has energy • Utilize the Contrastive Divergence algorithm to update the parameter Bilinear Deep Learning for Image Classification

  12. Greedy Layer-Wise Reconstruction • The joint and conditional distributions between and • The log probability of the model assigned to in • Utilize the Contrastive Divergence algorithm to calculate the derivative with respect to the parameter • Update the parameter space Bilinear Deep Learning for Image Classification

  13. Global Fine-Tuning by Backpropagation • Backward propagation of errors (backpropagation) • A better fine-tuning algorithm than global search • Limitation • The convergence obtained from backpropagation learning is very slow • The convergence in backpropagation learning is not guaranteed • The result may generally converge to any local minimum on the error surface • The backpropagation learning is associated with the problem of scaling • In proposed model • A search based on the bilinear discriminant initialization has been performed for a sensible and good region in the whole parameter space • Backpropagation adjusts the entire deep network to find good local optimum parameters Bilinear Deep Learning for Image Classification

  14. Algorithm Bilinear Deep Learning for Image Classification

  15. Outline • Introduction • Research progress on deep learning • Proposed algorithm • Architecture of BDBN • Learning stages of BDBN • Bilinear discriminant initialization • Greedy layer-wise reconstruction • Global fine-tuning • Experiments and results • Experiment on Handwriting Dataset MNIST • Experiment on Complicated Object Dataset Caltech 101 • Experiments on Face Dataset CMU PIE • Conclusion and future work Bilinear Deep Learning for Image Classification

  16. Experiment Setting Datasets Standard hand written digits dataset MNIST 60,000 training images,10,000 test images with the resolution of Subset of Caltech101 2,935 images from the first 5 categories CMU PIE dataset 11560 face images varying pose, illumination and expression of 68 subjects with the resolution of Compared algorithms K-nearest neighbor (KNN) Support vector machines (SVM) Transductive SVM (TSVM) [Collobert et al, JMLR, 2006] Neural network (NN) EmbedNN [Weston et al, ICML, 2008] Semi-DBN [Bengio et al, NIPS, 2006] DBN-rNCA [Salakhutdinov et al, AISTATS, 2007] DDBN [Liu et al, PR, 2009] DCNN [Jarrett et al, ICCV, 2009] Bilinear Deep Learning for Image Classification 16 16

  17. Experiment on MNIST Sample images Classification experiment Table Classification accuracy rate (%) on the test data with different numbers of labeled data on MNIST Bilinear Deep Learning for Image Classification 17 17

  18. Responses of V1 neurons Selective spatial information filters Similar to spatially local, complex Fourier transforms, Gabor transforms Weights of proposed BDBN Roughly represent different “strokes” Oriented, Gabor-like and resemble the receptive fields of V1 simple cells Simulate Primary Visual Cortex Samples of first layer weights Examples represent “strokes” of digital Bilinear Deep Learning for Image Classification 18 18

  19. Sample images Classification experiment Experiment on Caltech 101 Table Classification accuracy rate (%) on the test data with different numbers of labeled data on Caltech 101 Bilinear Deep Learning for Image Classification 19 19

  20. Efficiency Comparison Convergence of proposed BDBN with two other deep learning models both have the fine-tuning stage Fig. Convergence curve of Semi-DBN, DDBN and proposed BDBN on Caltech101 Bilinear Deep Learning for Image Classification 20 20

  21. Experiments on CMU PIE Sample images Classification experiment Fig. Classification Accuracy rate (%) with different number of labeled data and different extents of noise. Bilinear Deep Learning for Image Classification 21 21

  22. Layer-wise Reconstruction of BDBN Fig. The reconstruction in every layer. The first row shows the noisy images. The results of every layer of reconstruction are shown from the second to the fourth row. The original images are shown in the fifth row. Bilinear Deep Learning for Image Classification 22 22

  23. Automatically Reinforce Important Features (a) Facial feature points (b) Reinforce regions are identical to facial feature regions Fig. Samples of first layer weights learned by BDBN, and the consistency of these weights with facial feature points. Bilinear Deep Learning for Image Classification 23 23

  24. Outline • Introduction • Research progress on deep learning • Proposed algorithm • Architecture of BDBN • Learning stages of BDBN • Bilinear discriminant initialization • Greedy layer-wise reconstruction • Global fine-tuning • Experiments and results • Experiment on Handwriting Dataset MNIST • Experiment on Complicated Object Dataset Caltech 101 • Experiments on Face Dataset CMU PIE • Conclusion and future work Bilinear Deep Learning for Image Classification

  25. Conclusion and Future Work • Conclusion • Propose a novel deep learning model BDBN for classical multimedia task: image classification • The bilinear discriminant initialization of BDBN not only prevents the propagation of information from falling into a bad local optimum but also provides a more meaningful setting for deep architecture • The semi-supervised learning ability of BDBN causes the proposed deep techniques to work well with an insufficient number of labeled data • Future work • Utilizing deep learning for multimedia content analysis in a large scale dataset with noisy tags Bilinear Deep Learning for Image Classification

  26. Reference [1] A. Opelt, M. Fussenegger, A. Pinz, and P. Auer, “Weak hypotheses and boosting for generic object detection and recognition”, In ECCV, 2004. [2] G. E. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, pp.1527-1554, 2006. [3] G. W. Cottrell, “New life for neural networks,” Science, vol. 313, pp. 454-455, July, 2006. [4] A. Kumar , C. Sminchisescu, “Support kernel machines for object recognition”, In ICCV, 2007. [5] A. Bosch, A. Zisserman, X. Munoz, “Image classification using random forests and ferns”, In ICCV, 2007. [6] R. R. Salakhutdinov and G. E. Hinton, “Learning a nonlinear embedding by preserving class neighbourhood structure,” in Proceedings of Eleventh International Conference on Artificial Intelligence and Statistics, 2007. [7] Y. Bengio, and Y. LeCun, “Scaling learning algorithms towards AI,” Large-Scale Kernel Machines, 2007. [8] E. K. Chen, X. K. Yang, H.Y. Zha, R. Zhang, and W. J. Zhang, “Learning object classes from image thumbnails through deep neural networks,” International Conference on Acoustics, Speech and Signal Processing, Las Vegas, NV, 2008. [9] Srenivas Varadarajan and Lina J. Karam, "An improved perception-based no-rReference objective image sharpness metric using iterative edge refinement," IEEE International Conference on Image Processing, pp. 401-404,Oct. 2008. [10] L. Ballan, A. Bazzica, M. Bertini, A. D. Bimbo, and G. Serra, “Deep networks for audio event classification in soccer videos,” IEEE International Conference on Multimedia & Expo, 2009. [11] J. Weston, F. Ratle, R. Collobert, “Deep learning via semi-supervised embedding”, In ICML, 2008. [12] Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, “Greedy layer-wise training of deep networks”, In NIPS, 2006. [13] R.R. Salakhutdinov, G.E. Hinton, “Learning a nonlinear embedding by preserving class neighbourhood structure”, In AISTATS, 2007. [14] Y. Liu, S. Zhou, and Q. Cheng, “Discriminative Deep Belief Networks for Classification with Few Labeled Data,” In PR., 2010. [15] K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y.L. Cun, “What is the best multi-stage architecture for object recognition?”, In ICCV, 2009. [16] D. Mahajan, and M. Slaney, “Image classification using the web graph”, In ACMMM, 2010. Bilinear Deep Learning for Image Classification

  27. Q & A Thank You ! Bilinear Deep Learning for Image Classification 27

More Related