1 / 28

Neural Networks II

Explore the basics of neural networks with slides crediting experts like Hugo Larochelle and Andrew Ng. Dive into the core concepts of artificial neurons, activations functions like sigmoid and relu, backpropagation, initialization techniques, and practical tips such as early stopping and dropout for effective training. Discover the power of universal approximation theorems and learn to apply neural networks in solving complex problems. Stay informed about key practices like normalization of data, decaying learning rates, and using momentum for improved training efficiency.

gonzalos
Download Presentation

Neural Networks II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824 Spring 2019

  2. Neural Networks • Origins: Algorithms that try to mimic the brain. What is this?

  3. A single neuron in the brain Input Output Slide credit: Andrew Ng

  4. An artificial neuron: Logistic unit • Sigmoid (logistic) activation function “Bias unit” “Weights” “Parameters” “Output” “Input” Slide credit: Andrew Ng

  5. Visualization of weights, bias, activation function bias b only change the position of the hyperplane range determined by g(.) Slide credit: Hugo Larochelle

  6. Activation - sigmoid • Squashes the neuron’s pre-activation between 0 and 1 • Always positive • Bounded • Strictly increasing Slide credit: Hugo Larochelle

  7. Activation - hyperbolic tangent (tanh) • Squashes the neuron’s pre-activation between -1 and 1 • Can be positive or negative • Bounded • Strictly increasing Slide credit: Hugo Larochelle

  8. Activation - rectified linear(relu) • Bounded below by 0 • always non-negative • Not upper bounded • Tends to give neurons with sparse activities Slide credit: Hugo Larochelle

  9. Activation - softmax • For multi-class classification: • we need multiple outputs (1 output per class) • we would like to estimate the conditional probability • We use the softmax activation function at the output: Slide credit: Hugo Larochelle

  10. Universal approximation theorem ‘‘a single hidden layer neural network with a linear output unit can approximate any continuous function arbitrarily well, given enough hidden units’’ Hornik, 1991 Slide credit: Hugo Larochelle

  11. Neural network – Multilayer “Output” Layer 3 Layer 1 Layer 2 (hidden) Slide credit: Andrew Ng

  12. Neural network “activation” of unit in layer matrix of weights controlling function mapping from layer to layer unit in layer units in layer Size of ? Slide credit: Andrew Ng

  13. Neural network “Pre-activation” Why do we need g(.)? Slide credit: Andrew Ng

  14. Neural network “Pre-activation” Add Slide credit: Andrew Ng

  15. Flow graph - Forward propagation How do we evaluate our prediction? X Add

  16. Cost function Logistic regression: Neural network: Slide credit: Andrew Ng

  17. Gradient computation Need to compute: Slide credit: Andrew Ng

  18. Gradient computation Given one training example (add ) (add ) Slide credit: Andrew Ng

  19. Gradient computation: Backpropagation Intuition: “error” of node in layer For each output unit (layer L = 4) = 1 * Slide credit: Andrew Ng

  20. Backpropagationalgorithm Training set Set = 0 For Set Perform forward propagation to compute use to compute Compute Slide credit: Andrew Ng

  21. Activation - sigmoid • Partial derivative Slide credit: Hugo Larochelle

  22. Activation - hyperbolic tangent (tanh) • Partial derivative Slide credit: Hugo Larochelle

  23. Activation - rectified linear(relu) • Partial derivative Slide credit: Hugo Larochelle

  24. Initialization • For bias • Initialize all to 0 • For weights • Can’t initialize all weights to the same value • we can show that all hidden units in a layer will always behave the same • need to break symmetry • Recipe: U[-b, b] • the idea is to sample around 0 but break symmetry Slide credit: Hugo Larochelle

  25. Putting it together Pick a network architecture • No. of input units: Dimension of features • No. output units: Number of classes • Reasonable default: 1 hidden layer, or if >1 hidden layer, have same no. of hidden units in every layer (usually the more the better) • Grid search Slide credit: Hugo Larochelle

  26. Putting it together Early stopping • Use a validation set performance to select the best configuration • To select the number of epochs, stop training when validation set error increases Slide credit: Hugo Larochelle

  27. Other tricks of the trade • Normalizing your (real-valued) data • Decaying the learning rate • as we get closer to the optimum, makes sense to take smaller update steps • mini-batch • can give a more accurate estimate of the risk gradient • Momentum • can use an exponential average of previous gradients Slide credit: Hugo Larochelle

  28. Dropout • Idea: «cripple» neural network by removing hidden units • each hidden unit is set to 0 with probability 0.5 • hidden units cannot co-adapt to other units • hidden units must be more generally useful Slide credit: Hugo Larochelle

More Related