1 / 41

Introduction to Deep Learning - From Perceptron to Neural Networks

Explore the basics of deep learning, from the linear and binary perceptron to multi-layer perceptron and image classification using artificial neural networks.

melvinm
Download Presentation

Introduction to Deep Learning - From Perceptron to Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CV192: Introduction to Deep LearningOren Freifeld Ron Shapira WeberComputer Science, Ben-Gurion University

  2. Contents • Introduction – What is Deep Learning? • Linear \ Binary Perceptron • Multi-Layer Perceptron [Figure from previous slide taken from https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html]

  3. What is Deep Learning? From perceptron to deep neural networks

  4. Example – Object recognition and localization [Andrej Karpathy Li Fei-Fei, (2015): Deep Visual-Semantic Alignments for Generating Image Descriptions]

  5. Some history – ImageNet challenge • 1.2 million images in the training set, each labeled with one of 1000 categories • Image classification problem https://cs.stanford.edu/people/karpathy/cnnembed/

  6. Some history – ImageNet challenge • One of the Top-5 guesses needs to be the correct one. https://blog.acolyer.org/2016/04/20/imagenet-classification-with-deep-convolutional-neural-networks/

  7. Increasing Depth on ImageNet challenge Trend of increasing depth (Img Credit: Kaiming He)

  8. ImageNet architecture comparison • Amount of operations for a single forward pass vs. top-1accuracy [Canziani et al., (2016). An analysis of deep neural network models for practical applications.]

  9. Supervised Learning • Data: • X – dataset: Images, Videos, Text, etc… • y – labels (cat, dog, platypus) • Image classification example: Probability distribution over classes Classifier (SVM, LDA, Deep neural network etc…) *We’ll also see variants of deep learning algorithms where it isn’t

  10. Supervised Learning • An example of a supervised learning algorithm we saw at this course? • Least-Squares Estimation in a Linear Model: • A known function, • Data: pairs of where • Define is a matrix). • Goal: find the optimal (in the least-square sense) parameter assuming the model In other words: • Note that in this framework we try to predict the label of the input .

  11. Un-supervised Learning • Solve some task given “unlabeled” data. • An example to unsupervised learning algorithm we saw at this course?

  12. Supervised Learning Framework: • Provide data, labels - • Split data into: • Training data: majority of the data (for instance, 60%) Used to train the model. • Validation set: a partition of the data (20%) used for tuning of the parameters. • Test data: a partition of the data (20%) used to test the accuracy of the model. • Define algorithm • Define a loss function: • In the case of Linear Regression, L2 norm: • Define an optimization method to find such that:

  13. Example: Deep Learning for Image label classification • Provide data, labels - • Split data into: • Training data • Validation set • Test data • Define algorithm: Artificial Neural Network, Convolutional NN, etc… • Define a lost function: • L2 norm • Cross-Entropy • Define an optimization method to find such that: • Usually there’s no closed form solution, can use iterative gradient-based methods .

  14. When working with images • Represent images as vectors: Image . Flatten image so that

  15. Perceptron ) . . .

  16. Some History • The perceptron algorithm was invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt • It is an extension of the perceptron which was first introduced in the 1950s. • In 1969 a famous book entitled “Perceptrons” by Marvin Minsky and Seymour Papert showed that it was impossible for perceptrons to learn an XOR function without adding an hidden layer. • Hence the term Multilayer perceptron. https://en.wikipedia.org/wiki/Perceptron

  17. Linear Perceptron – Single Output . . .

  18. Linear Perceptron • Try to predict by • This is a linear least squares problem: • Find: • Therefor there is a closed-form solution: • Where is the entire dataset (each row is a sample).

  19. (Vanilla) Binary Perceptron – Single Output . . .

  20. (Sigmoid) Binary Perceptron – Single Output ) . . .

  21. Binary Perceptron • The binary perceptron acts as a binary classifier And

  22. (Softmax) Binary Perceptron - Multiple Outputs A generalization of the sigmoid function called : ) . . . ) . . .

  23. Multiclass Binary Perceptron Probability distribution over classes ) . . . ) . . .

  24. Multiclass Binary Perceptron Correct class distribution ) . . . ) . . .

  25. Need to calculate loss. How different is ‘our’ probability distribution over the possible classes from the correct one. • Cross-entropy (not to be confused with the joint entropy of two random variables): • Since our target distribution is “one-hot encoded” This means it is equivalent to minimizing the KL divergence between the two distributions. • In other words, the cross-entropy objective ‘wants’ the predicted distribution to have all of its mass on the correct answer. • When using the SoftMax activation function, with the cross-entropy loss function we get: • Note: when implementing use the long-sum-exp trick. http://cs231n.github.io/linear-classify/

  26. Multilayer perceptron (MLP)

  27. The XOR (“exclusive OR”) problem • Given 4 points in , return: • Can we solve the problem with a linear/binary perceptron (with a single output)? • Is it linearly separable?

  28. The XOR problem [Figure from: Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning]

  29. The XOR problem • A single-layer perceptron is a linear combination of its inputs. • The classification of the input is given by a line which separates between the classes of the input. • If we look at the equations: • There is no solution to this linear system

  30. The XOR problem • We can also try to treat this problem as a least squares problems: • Loss function: • Model: • (Exercise) solving the normal equations we get: [Example from: Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning]

  31. The XOR problem • Adding a hidden layer can help solve the XOR problem. • We will add a vector of hidden units • The values of these hidden units are then used as input for the second/output layer. • Our model is now: [Example from: Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning]

  32. The XOR problem [Example from: Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning]

  33. The XOR problem • What should be our choice of • can’t be linear, otherwise: and Then: where • We must use a non-linear function for [Example from: Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning]

  34. The XOR problem • , which is known as Rectified Linear Unit (ReLU) [Figure from: Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning]

  35. The XOR problem • Our new model: You can find a complete walkthrough of the problem at: http://www.deeplearningbook.org/ chapter 6.1 [Figure from: Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning]

  36. No hidden layers http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

  37. MLP with one hidden layer http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

  38. MLP with one hidden layer http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

  39. MLP with one hidden layer [Lecun, Y., Bengio, Y., & Hinton, G. (2015)]

  40. How big should our hidden layer be? https://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

  41. Summary • Deep learning is a class of supervised learning algorithms. • Linear \ binary perceptron acts as a linear classifier. • Hidden layers (followed by non-linear activation function) allows for non-linear transformation of the input so that it could be linear separable. • The number of neurons and connections in each layer determine our model capacity.

More Related