1 / 64

Object Recognizing

Object Recognizing. Deep Learning. Success in 2012 DeepNet and speech processing. ImageNet. DL is providing breakthrough results in speech recognition and image classification …. From this Hinton et al 2012 paper:

lwalters
Download Presentation

Object Recognizing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object Recognizing

  2. Deep Learning • Success in 2012 DeepNet and speech processing

  3. David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html

  4. ImageNet

  5. DL is providing breakthrough results in speech recognition and image classification … From this Hinton et al 2012 paper: http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/38131.pdf go here: http://yann.lecun.com/exdb/mnist/ From here: http://people.idsia.ch/~juergen/cvpr2012.pdf

  6. Continuous improvement Micrososft Dec 2015 150 layers Error rate of 3.5% and a localization error of 9%.

  7. What are Deep Nets

  8. Neural networks in the brain Repeating layers Linear, non-linear, pooling Learning by modifying synapses

  9. Biology: Linear and non-linear operations

  10. Biology: Feed-forward, recurrent, feed-back DNN adopts the feed-forward path

  11. DNN Architecture

  12. General structure local connections, convolution, reduced sampling

  13. Multiple filters

  14. Repeating operations: Linear, Non-linear, Pooling

  15. Depth – multiple filters

  16. Repeating 3-layer arrangement

  17. History Deep Learning

  18. LeNet 1998 Essentially the same as the current generation

  19. MNIST data set

  20. Hinton Trends in Cognitive Science 2007 The goal: unsupervised Restricted Boltzmann Machines Combining generative model and inference CNN are feed-forward and massively supervised

  21. Back-propagation 1986

  22. The entire network is a large parametric function The parameters are the network weights (60M in AlexNet) The parameters are learned from example The learning algorithm: back-propagation Gradient descent in the space of parameters

  23. Back Propagation

  24. 6 5 output hidden 3 4 2 input 1

  25. N5 N6 dN/dL = N(1-N) N dL/dW = N L 6 output 5 w35 L = linear signal Lk = Σ wikNi N = nonlinear output N = σ(L) σ: y = 1 / (1 + e-αx) dy/dx = y(1-y) hidden 3 4 w13 2 input 1

  26. N5 N6 Error E = 1/2 [(T5 – N5)2 + (T6 – N6)2] L5 dE/d w35 = (chain rule along the path) dE/d N5 * dN5/d L5* dL5 /d w35 6 5 w35 (T5 – N5) N5(1-N5) * N3 * δ5 3 4 Call dE/d Lk = δk back-propagating error w13 Adjust the weight: δwik = δk Ni 2 1

  27. N5 N6 General rule: dE/d Lk = δk back-propagating error Adjusting weights: δwik= δk Ni 6 5 δ5 w35 N3 3 4 w13 2 1

  28. N5 N6 General rule: dE/d Lk = δk back-propagating error Adjusting weights: δwik = δk Ni 6 5 w35 3 4 True for w13 δ3 Compute δ3 w13 N1 2 1

  29. Adjusting δw13 dE/dw13 = dE/d L3 dL3/d w13 = δ3 N1 Compute δ3 L5 6 5 dE/dL3 = dE1/dL3 + dE2/dL3 = δ31 + δ32 N3 δ31 = dE1/dL3 = dE1/dN3* dN3/dL3 L3 3 4 dE1/dN3 = dE1/dL5 * dL5 / dN3 = δ5 * w35 w13 δ31 = δ5 w35N3 (1-N3) δ32 = δ6 w36 N3 (1-N3) 2 1 δw13 = (δ5 w35 + δ6 w36) N3 (1-N3) N1

  30. Adjusting δw13 δw13 = (δ5 w35 + δ6 w36) N3 (1-N3) N1 6 5 Propagate δ5 and δ6 Multiply by N3 (1-N3) δ5 * w35 δ6 * w36 Get δ3 N3 (1-N3) Adjust w13 by δ3 N1 3 4 δ3 Iterated for all weights over many examples Supervision is required w13 N1 2 1

  31. Dropout

  32. Dropout: An efficient way to average many large neural nets (http://arxiv.org/abs/1207.0580) • Consider a neural net with one hidden layer. • Each time we present a training example, we randomly omit each hidden unit with probability 0.5. • So we are randomly sampling from 2^H different architectures. • All architectures share weights.

  33. Dropout – Multi Layer For each example, set units at all levels to 0 with some probability, usually p = 0.5 Each example has a different ‘mask’ During feed-forward flow, these units are multiplied by 0, the do not participate in the computation. Similarly for the BP The intuition is to avoid over-fitting At test time all the units are used Most implementations no longer use dropout. The issue of overfitting is actively studied. For some reasons adding weights does not cause drop in test performance.

  34. Visualizing the features at different layers • Bob Fergus NIPS 2013 • Best 9 patches: showing at each layer responses of 48 units. Each unit is in fact a layer of units – copies of the same unit it different locations, covering the image (a ‘convolution’ filter) • They identify by a ‘deconvolution’ algorithm the patches that caused the largest activation of the unit, in a large set of test images. • Showing in a 3*3 small array the 9 top-patches for each unit.

  35. Layer 3 top-9 patches for each unit

  36. 1 2 3 5

  37. Different visual tasks

More Related