neural nets using backpropagation l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Neural Nets Using Backpropagation PowerPoint Presentation
Download Presentation
Neural Nets Using Backpropagation

Loading in 2 Seconds...

play fullscreen
1 / 28

Neural Nets Using Backpropagation - PowerPoint PPT Presentation


  • 324 Views
  • Uploaded on

Neural Nets Using Backpropagation. Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill. Agenda. Review of Neural Nets and Backpropagation Backpropagation: The Math Advantages and Disadvantages of Gradient Descent and other algorithms Enhancements of Gradient Descent

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Neural Nets Using Backpropagation' - Anita


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
neural nets using backpropagation
Neural Nets Using Backpropagation
  • Chris Marriott
  • Ryan Shirley
  • CJ Baker
  • Thomas Tannahill
agenda
Agenda
  • Review of Neural Nets and Backpropagation
  • Backpropagation: The Math
  • Advantages and Disadvantages of Gradient Descent and other algorithms
  • Enhancements of Gradient Descent
  • Other ways of minimizing error
review
Review
  • Approach that developed from an analysis of the human brain
  • Nodes created as an analog to neurons
  • Mainly used for classification problems (i.e. character recognition, voice recognition, medical applications, etc.)
review4

Output

Weightedinputs

Activation function = f(S(inputs * weight))

Review
  • Neurons have weighted inputs, threshold values, activation function, and an output
review5
Review

4 Input AND

Inputs

Threshold = 1.5

Outputs

Threshold = 1.5

Inputs

Threshold = 1.5

All weights = 1 and all outputs = 1 if active 0 otherwise

review6
Review
  • Output space for AND gate

Input 1

(1,1)

(0,1)

1.5 = w1*I1 + w2*I2

Input 2

(0,0)

(1,0)

review7
Review
  • Output space for XOR gate
  • Demonstrates need for hidden layer

Input 1

(1,1)

(0,1)

Input 2

(0,0)

(1,0)

backpropagation the math
Backpropagation: The Math
  • General multi-layered neural network

Output Layer

0

1

2

3

4

5

6

7

8

9

X9,0

X0,0

X1,0

Hidden Layer

0

1

i

Wi,0

W1,0

W0,0

Input Layer

0

1

backpropagation the math9
Backpropagation: The Math
  • Backpropagation
    • Calculation of hidden layer activation values
backpropagation the math10
Backpropagation: The Math
  • Backpropagation
    • Calculation of output layer activation values
backpropagation the math11
Backpropagation: The Math
  • Backpropagation
    • Calculation of error

dk = f(Dk) -f(Ok)

backpropagation the math12
Backpropagation: The Math
  • Backpropagation
    • Gradient Descent objective function
    • Gradient Descent termination condition
backpropagation the math13
Backpropagation: The Math
  • Backpropagation
    • Output layer weight recalculation

Learning Rate

(eg. 0.25)

Error at k

backpropagation the math14
Backpropagation: The Math
  • Backpropagation
    • Hidden Layer weight recalculation
backpropagation using gradient descent
Backpropagation Using Gradient Descent
  • Advantages
    • Relatively simple implementation
    • Standard method and generally works well
  • Disadvantages
    • Slow and inefficient
    • Can get stuck in local minima resulting in sub-optimal solutions
local minima
Local Minima

Local Minimum

Global Minimum

alternatives to gradient descent
Alternatives To Gradient Descent
  • Simulated Annealing
    • Advantages
      • Can guarantee optimal solution (global minimum)
    • Disadvantages
      • May be slower than gradient descent
      • Much more complicated implementation
alternatives to gradient descent18
Alternatives To Gradient Descent
  • Genetic Algorithms/Evolutionary Strategies
    • Advantages
      • Faster than simulated annealing
      • Less likely to get stuck in local minima
    • Disadvantages
      • Slower than gradient descent
      • Memory intensive for large nets
alternatives to gradient descent19
Alternatives To Gradient Descent
  • Simplex Algorithm
    • Advantages
      • Similar to gradient descent but faster
      • Easy to implement
    • Disadvantages
      • Does not guarantee a global minimum
enhancements to gradient descent
Enhancements To Gradient Descent
  • Momentum
    • Adds a percentage of the last movement to the current movement
enhancements to gradient descent21
Enhancements To Gradient Descent
  • Momentum
    • Useful to get over small bumps in the error function
    • Often finds a minimum in less steps
    • w(t) = -n*d*y + a*w(t-1)
      • w is the change in weight
      • n is the learning rate
      • d is the error
      • y is different depending on which layer we are calculating
      • a is the momentum parameter
enhancements to gradient descent22
Enhancements To Gradient Descent
  • Adaptive Backpropagation Algorithm
    • It assigns each weight a learning rate
    • That learning rate is determined by the sign of the gradient of the error function from the last iteration
      • If the signs are equal it is more likely to be a shallow slope so the learning rate is increased
      • The signs are more likely to differ on a steep slope so the learning rate is decreased
    • This will speed up the advancement when on gradual slopes
enhancements to gradient descent23
Enhancements To Gradient Descent
  • Adaptive Backpropagation
    • Possible Problems:
      • Since we minimize the error for each weight separately the overall error may increase
    • Solution:
      • Calculate the total output error after each adaptation and if it is greater than the previous error reject that adaptation and calculate new learning rates
enhancements to gradient descent24
Enhancements To Gradient Descent
  • SuperSAB(Super Self-Adapting Backpropagation)
    • Combines the momentum and adaptive methods.
    • Uses adaptive method and momentum so long as the sign of the gradient does not change
      • This is an additive effect of both methods resulting in a faster traversal of gradual slopes
    • When the sign of the gradient does change the momentum will cancel the drastic drop in learning rate
      • This allows for the function to roll up the other side of the minimum possibly escaping local minima
enhancements to gradient descent25
Enhancements To Gradient Descent
  • SuperSAB
    • Experiments show that the SuperSAB converges faster than gradient descent
    • Overall this algorithm is less sensitive (and so is less likely to get caught in local minima)
other ways to minimize error
Other Ways To Minimize Error
  • Varying training data
    • Cycle through input classes
    • Randomly select from input classes
  • Add noise to training data
    • Randomly change value of input node (with low probability)
  • Retrain with expected inputs after initial training
    • E.g. Speech recognition
other ways to minimize error27
Other Ways To Minimize Error
  • Adding and removing neurons from layers
    • Adding neurons speeds up learning but may cause loss in generalization
    • Removing neurons has the opposite effect
resources
Resources
  • Artifical Neural Networks, Backpropagation, J. Henseler
  • Artificial Intelligence: A Modern Approach, S. Russell & P. Norvig
  • 501 notes, J.R. Parker
  • www.dontveter.com/bpr/bpr.html
  • www.dse.doc.ic.ac.uk/~nd/surprise_96/journal/vl4/cs11/report.html