1 / 52

Advanced topics

Advanced topics. Outline. Self-taught learning Learning feature hierarchies (Deep learning) Scaling up. Self-taught learning. Testing: What is this? . Supervised learning. Cars. Motorcycles. Testing: What is this? . Motorcycle. Car. Semi-supervised learning.

paco
Download Presentation

Advanced topics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced topics

  2. Outline • Self-taught learning • Learning feature hierarchies (Deep learning) • Scaling up

  3. Self-taught learning

  4. Testing: What is this? Supervised learning Cars Motorcycles

  5. Testing: What is this? Motorcycle Car Semi-supervised learning Unlabeled images (all cars/motorcycles)

  6. Testing: What is this? Motorcycle Car Self-taught learning Unlabeled images (random internet images)

  7. Motorcycle Car Self-taught learning Sparse coding, LCC, etc. f1, f2, …, fk If have labeled training set is small, can give huge performance boost. Use learned f1, f2, …, fk to represent training/test sets. Using f1, f2, …, fk a1, a2, …, ak

  8. Learning feature hierarchies/Deep learning

  9. Why feature hierarchies object models object parts (combination of edges) edges pixels

  10. Deep learning algorithms • Stack sparse coding algorithm • Deep Belief Network (DBN) (Hinton) • Deep sparse autoencoders (Bengio) [Other related work: LeCun, Lee, Yuille, Ng …]

  11. Deep learning with autoencoders • Logistic regression • Neural network • Sparse autoencoder • Deep autoencoder

  12. Logistic regression Logistic regression has a learned parameter vector q. On input x, it outputs: where x1 x2 Draw a logistic regression unit as: x3 +1

  13. Neural Network a1 String a lot of logistic units together. Example 3 layer network: a2 x1 a3 x2 x3 Layer 3 +1 +1 Layer 1 Layer 3

  14. Neural Network Example 4 layer network with 2 output units: x1 x2 x3 +1 Layer 4 +1 +1 Layer 3 Layer 1 Layer 2

  15. Neural Network example [Courtesy of Yann LeCun]

  16. Training a neural network Given training set (x1, y1), (x2, y2), (x3, y3 ), …. Adjust parameters q (for every node) to make: (Use gradient descent. “Backpropagation” algorithm. Susceptible to local optima.)

  17. Unsupervised feature learning with a neural network x1 x1 • Autoencoder. • Network is trained to output the input (learn identify function). • Trivial solution unless: • Constrain number of units in Layer 2 (learn compressed representation), or • Constrain Layer 2 to be sparse. x2 x2 x3 x3 a1 x4 x4 x5 x5 a2 +1 x6 x6 a3 Layer 2 Layer 3 +1 Layer 1

  18. Unsupervised feature learning with a neural network Training a sparse autoencoder. Given unlabeled training set x1, x2, … a1 a2 a3 Reconstruction error term L1 sparsity term

  19. Unsupervised feature learning with a neural network x1 x1 x2 x2 a1 x3 x3 a2 x4 x4 a3 x5 x5 +1 x6 x6 Layer 2 Layer 3 +1 Layer 1

  20. Unsupervised feature learning with a neural network x1 x2 a1 x3 a2 x4 a3 x5 +1 New representation for input. x6 Layer 2 +1 Layer 1

  21. Unsupervised feature learning with a neural network x1 x2 a1 x3 a2 x4 a3 x5 +1 x6 Layer 2 +1 Layer 1

  22. Unsupervised feature learning with a neural network x1 x2 a1 b1 x3 a2 b2 x4 a3 b3 x5 +1 +1 x6 Train parameters so that , subject to bi’s being sparse. +1

  23. Unsupervised feature learning with a neural network x1 x2 a1 b1 x3 a2 b2 x4 a3 b3 x5 +1 +1 x6 Train parameters so that , subject to bi’s being sparse. +1

  24. Unsupervised feature learning with a neural network x1 x2 a1 b1 x3 a2 b2 x4 a3 b3 x5 +1 +1 x6 Train parameters so that , subject to bi’s being sparse. +1

  25. Unsupervised feature learning with a neural network x1 x2 a1 b1 x3 a2 b2 x4 a3 b3 x5 +1 +1 New representation for input. x6 +1

  26. Unsupervised feature learning with a neural network x1 x2 a1 b1 x3 a2 b2 x4 a3 b3 x5 +1 +1 x6 +1

  27. Unsupervised feature learning with a neural network x1 x2 a1 b1 c1 x3 a2 b2 c2 x4 a3 b3 c3 x5 +1 +1 +1 x6 +1

  28. Unsupervised feature learning with a neural network x1 x2 a1 b1 c1 x3 a2 b2 c2 x4 a3 b3 c3 x5 New representation for input. +1 +1 +1 x6 +1 Use [c1, c3, c3] as representation to feed to learning algorithm.

  29. Deep Belief Net Deep Belief Net (DBN) is another algorithm for learning a feature hierarchy. Building block: 2-layer graphical model (Restricted Boltzmann Machine). Can then learn additional layers one at a time.

  30. Restricted Boltzmann machine (RBM) a1 a2 a3 Layer 2. [a1, a2, a3] (binary-valued) x2 x1 x3 x4 Input [x1, x2, x3, x4] MRF with joint distribution: Use Gibbs sampling for inference. Given observed inputs x, want maximum likelihood estimation:

  31. Restricted Boltzmann machine (RBM) a1 a2 a3 Layer 2. [a1, a2, a3] (binary-valued) x2 x1 x3 x4 Input [x1, x2, x3, x4] Gradient ascent on log P(x) : [xiaj]obs from fixing x to observed value, and sampling a from P(a|x). [xiaj]prior from running Gibbs sampling to convergence. Adding sparsity constraint on ai’s usually improves results.

  32. Deep Belief Network Similar to a sparse autoencoder in many ways. Stack RBMs on top of each other to get DBN. Layer 3. [b1, b2, b3] Layer 2. [a1, a2, a3] Input [x1, x2, x3, x4] Train with approximate maximum likelihood (often with sparsity constraint on ai’s):

  33. Deep Belief Network Layer 4. [c1, c2, c3] Layer 3. [b1, b2, b3] Layer 2. [a1, a2, a3] Input [x1, x2, x3, x4]

  34. Deep learning examples

  35. Convolutional DBN for audio Max pooling unit Detection units Spectrogram

  36. Convolutional DBN for audio Spectrogram

  37. Probabilistic max pooling Convolutional DBN: Convolutional Neural net: 0 max {x1, x2, x3, x4} 0 0 0 0 max {x1, x2, x3, x4} Where xi are {0,1}, and mutually exclusive. Thus, 5 possible cases: 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 X1 X2 X3 X4 X1 X2 X3 X4 1 1 0 0 0 0 0 0 Where xi are real numbers. Collapse 2n configurations into n+1 configurations. Permits bottom up and top down inference.

  38. Convolutional DBN for audio Spectrogram

  39. Convolutional DBN for audio Max pooling Second CDBN layer Detection units Max pooling One CDBN layer Detection units

  40. CDBNs for speech Learned first-layer bases

  41. Convolutional DBN for Images ‘’max-pooling’’ node (binary) Max-pooling layer P Detection layer H Wk Hidden nodes (binary) “Filter” weights (shared) At most one hidden nodes are active. Input data V Visible nodes (binary or real)

  42. Convolutional DBN on face images object models object parts (combination of edges) edges Note: Sparsity important for these results. pixels

  43. Learning of object parts Examples of learned object parts from object categories Faces Cars Elephants Chairs

  44. Training on multiple objects Trained on 4 classes (cars, faces, motorbikes, airplanes). Second layer: Shared-features and object-specific features. Third layer: More specific features. Second layer bases learned from 4 object categories. Plot of H(class|neuron active) Third layer bases learned from 4 object categories.

  45. Hierarchical probabilistic inference Generating posterior samples from faces by “filling in” experiments (cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference. Input images Samples from feedforward Inference (control) Samples from Full posterior inference

  46. Key issue in feature learning: Scaling up

  47. Scaling up with graphics processors US$ 250 NVIDIA GPU Peak GFlops Intel CPU 2003 2004 2005 2006 2007 2008 (Source: NVIDIA CUDA Programming Guide)

  48. Scaling up with GPUs Approx. number of parameters (millions): Using GPU (Raina et al., 2009)

  49. Unsupervised feature learning: Does it work?

  50. Audio State-of-the-art task performance Images Video Multimodal (audio/video)

More Related