1 / 39

Deep Learning

Deep Learning. Why?. Source: Huang et al., Communications ACM 01/2014.

menefer
Download Presentation

Deep Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deep Learning

  2. Why?

  3. Source: Huang et al., Communications ACM 01/2014

  4. the 2013 International Conference on Learning Representations, the 2013 ICASSP’s special session on New Types of Deep Neural Network Learning for Speech Recognition and Related Applications, the 2013 ICML Workshop for Audio, Speech, and Language Processing, the 2012, 2011, and 2010 NIPS Workshops on Deep Learning and Unsupervised Feature Learning, 2013 ICML Workshop on Representation Learning Challenges, 2013 Intern. Conf. on Learning Representations, 2012 ICML Workshop on Representation Learning, 2011 ICML Workshop on Learning Architectures, Representations, and Optimization for Speech and Visual Information Processing, 2009 ICML Workshop on Learning Feature Hierarchies, 2009 NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, 2012 ICASSP deep learning tutorial, the special section on Deep Learning for Speech and Language Processing in IEEE Trans. Audio, Speech, and Language Processing (January 2012), the special issue on Learning Deep Architectures in IEEE Trans.. Pattern Analysis and Machine Intelligence (2013)

  5. Geoffrey HintonUniversity of Toronto ”A fast learning algorithm fordeep belief nets”-- Hinton et al., 2006 ”Reducing the dimensionality of data with neural networks” -- Hinton & Salakhutdinov

  6. How?

  7. Shallowlearning • SVM • Linear & Kernel Regression • Hidden Markov Models (HMM) • Gaussian Mixture Models (GMM) • Single hidden layer MLP • ... • Limited modeling capability of concepts • Cannot make use of unlabeled data

  8. Neuronal Networks • Machine Learning • Knowledge from high dimensional data • Classification • Input: featuresofdata • supervisedvsunsupervised • labeleddata • Neurons

  9. Multi Layer Perceptron output • Multiple Layers • Feed Forward • ConnectedWeights • 1-of-N Output [ Y1, Y2 ] 1 k wjk hidden 0 j 0 vij i input [ X1 , X2 , X3 ]

  10. Backpropagation • Minimizeerrorofcalculatedoutput • Adjustweights • Gradient Descent • Procedure • Forward Phase • Backpropagation oferrors • Foreach sample, multiple epochs k wjk j vij i

  11. Best Practice • Normalization • Preventvery high weights, Oscillation • Overfitting/Generalisation • Validation Set, Early Stopping • Mini-Batch Learning • update weightswith multipleinputvectorscombined

  12. Problems with Backpropagation • Multiple hiddenLayers • Get stuck in localoptima • startweightsfromrandompositions • Slow convergencetooptimum • large trainingsetneeded • Onlyuselabeleddata • mostdataisunlabeled Generative Approach

  13. Restricted Boltzmann Machines • Unsupervised • Find complexregularities in trainingdata • Bipartite Graph • visible, hiddenlayer • Binary stochasticunits • On/Off withprobability • 1 Iteration • Update Hidden Units • Reconstruct Visible Units • Maximum Likelihoodoftrainingdata hidden j wij i visible

  14. Restricted Boltzmann Machines • Training Goal: Best probable reproduction • unsuperviseddata • find latent factors of data set • Adjust weights to get maximum probability of input data hidden j wij i visible

  15. Training: ContrastiveDivergence • Start with a training vector on the visible units. • Update all the hidden units in parallel. • Update the all the visible units in parallel to get a “reconstruction”. • Update the hidden units again. j j i i t = 0 t = 1 reconstruction data

  16. Example: Handwritten 2s 50 binary neurons that learn features 50 binary neurons that learn features Decrement weights between an active pixel and an active feature Increment weights between an active pixel and an active feature 16 x 16 pixel image 16 x 16 pixel image data (reality) reconstruction

  17. The final 50 x 256 weights: Each unit grabs a different feature

  18. Example: Reconstruction Reconstruction from activated binary features Reconstruction from activated binary features Data Data Image from an unfamiliar digit class The network tries to see every image as a 2. New test image from the digit class that the model was trained on

  19. DeepArchitecture • Backpropagation, RBM asbuildingblocks • Multiple hiddenlayers • Motivation (whygodeep?) • Approximatecomplexdecisionboundary • Fewer computational units for same functional mapping • HierarchicalLearning • Increasinglycomplexfeatures • workwell in different domains • Vision, Audio, …

  20. Hierarchical Learning • Natural progression from low level to high level structure as seen in natural complexity • Easier to monitor what is being learnt and to guide the machine to better subspaces

  21. Stacked RBMs • First learn one layer at a time by stacking RBMs. • Treat this as “pre-training” that finds a good initial set of weights which can then be fine-tuned by a local search procedure. • Backpropagation can be used to fine-tune the model to be better at discrimination. Compose the two RBM models to make a single DBN model Then train this RBM copy binary state for each v Train this RBM first

  22. Uses Dimensionality reduction

  23. Dimensionality reduction • Use a stacked RBM as deep auto-encoder • Train RBM with images as input & output • Limit one layer to few dimensions  Information has to pass through middle layer

  24. Dimensionality reduction Olivetti face data, 25x25 pixel images reconstructed from 30 dimensions (625  30)

  25. Dimensionality reduction 804’414 Reuters news stories, reduction to 2 dimensions PCA Deep RBN

  26. Uses Classification

  27. Unlabeled data Unlabeled data is readily available Example: Images from the web • Download 10’000’000 images • Train a 9-layer DNN • Concepts are formed by DNN  70% better than previous state of the art Building High-level Features Using Large ScaleUnsupervised LearningQuoc V. Le, Marc’AurelioRanzato, RajatMonga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, and Andrew Y. Ng

  28. Uses AI

  29. Artificial intelligence Enduro, Atari 2600 Expert player: 368 points Deep Learning: 661 points Playing Atari withDeep Reinforcement LearningVolodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller

  30. Uses Generative (Demo)

  31. How to use it

  32. How to use it • Home pageof Geoffrey Hintonhttps://www.cs.toronto.edu/~hinton/ • Portalhttp://deeplearning.net/ • Accord.NEThttp://accord-framework.net/

More Related