1 / 24

Two Approaches to Backpropagation Learning: Per-Pattern vs Per-Epoch

Learn about the two different approaches to backpropagation learning - per-pattern and per-epoch. Understand the advantages and disadvantages of each approach, and discover the Quickprop and Rprop algorithms that exploit gradient information for more efficient weight updates.

brucerowe
Download Presentation

Two Approaches to Backpropagation Learning: Per-Pattern vs Per-Epoch

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. About Assignment #3 • Two approaches to backpropagation learning: • 1. “Per-pattern” learning: • Update weights after every exemplar presentation. • 2. “Per-epoch” (batch-mode) learning: • Update weights after every epoch. During epoch, compute the sum of required changes for each weight across all exemplars. After epoch, update each weight using the respective sum. Neural Networks Lecture 17: Self-Organizing Maps

  2. About Assignment #3 • Per-pattern learning often approaches near-optimal network error quickly, but may then take longer to reach the error minimum. • During per-pattern learning, it is important to present the exemplars in random order. • Reducing the learning rate between epochs usually leads to better results. Neural Networks Lecture 17: Self-Organizing Maps

  3. About Assignment #3 • Per-epoch learning involves less frequent weight updates, which makes the initial approach to the error minimum rather slow. • However, per-epoch learning computes the actual network error and its gradient for each weight so that the network can make more informed decisions about weight updates. • Two of the most effective algorithms that exploit this information are Quickprop and Rprop. Neural Networks Lecture 17: Self-Organizing Maps

  4. The Quickprop Learning Algorithm • The assumption underlying Quickprop is that the network error as a function of each individual weight can be approximated by a paraboloid. • Based on this assumption, whenever we find that the gradient for a given weight switched its sign between successive epochs, we should fit a paraboloid through these data points and use its minimum as the next weight value. Neural Networks Lecture 17: Self-Organizing Maps

  5. The Quickprop Learning Algorithm assumed error function (paraboloid) • Illustration (sorry for the crummy paraboloid): slope: E’(t-1) E(t-1) E slope: E’(t) E(t) w(t-1) w w(t) w(t+1) w(t-1) Neural Networks Lecture 17: Self-Organizing Maps

  6. The Quickprop Learning Algorithm • Newton’s method: Neural Networks Lecture 17: Self-Organizing Maps

  7. The Quickprop Learning Algorithm • For the minimum of E we must have: Neural Networks Lecture 17: Self-Organizing Maps

  8. The Quickprop Learning Algorithm • Notice that this method cannot be applied if the error gradient has not decreased in magnitude and has not changed its sign at the preceding time step. • In that case, we would ascent in the error function or make an infinitely large weight modification. • In most cases, Quickprop converges several times faster than standard backpropagation learning. Neural Networks Lecture 17: Self-Organizing Maps

  9. Resilient Backpropagation (Rprop) • The Rprop algorithm takes a very different approach to improving backpropagation as compared to Quickprop. • Instead of making more use of gradient information for better weight updates, Rprop only uses the sign of the gradient, because its size can be a poor and noisy estimator of required weight updates. • Furthermore, Rprop assumes that different weights need different step sizes for updates, which vary throughout the learning process. Neural Networks Lecture 17: Self-Organizing Maps

  10. Resilient Backpropagation (Rprop) • The basic idea is that if the error gradient for a given weight wij had the same sign in two consecutive epochs, we increase its step size ij, because the weight’s optimal value may be far away. • If, on the other hand, the sign switched, we decrease the step size. • Weights are always changed by adding or subtracting the current step size, regardless of the absolute value of the gradient. • This way we do not “get stuck” with extreme weights that are hard to change because of the shallow slope in the sigmoid function. Neural Networks Lecture 17: Self-Organizing Maps

  11. Resilient Backpropagation (Rprop) • Formally, the step size update rules are: Empirically, best results were obtained with initial step sizes of 0.1, +=1.2, -=1.2, max=50, and min=10-6. Neural Networks Lecture 17: Self-Organizing Maps

  12. Resilient Backpropagation (Rprop) • Weight updates are then performed as follows: It is important to remember that, like in Quickprop, in Rprop the gradient needs to be computed across all samples (per-epoch learning). Neural Networks Lecture 17: Self-Organizing Maps

  13. Resilient Backpropagation (Rprop) • The performance of Rprop is comparable to Quickprop; it also considerably accelerates backpropagation learning. • Compared to both the standard backpropagation algorithm and Quickprop, Rprop has one advantage: • Rprop does not require the user to estimate or empirically determine a step size parameter and its change over time. • Rprop will determine appropriate step size values by itself and can thus be applied “as is” to a variety of problems without significant loss of efficiency. Neural Networks Lecture 17: Self-Organizing Maps

  14. Output layer Y1 Y2 Hiddenlayer H1 H2 H3 Input layer X1 X2 The Counterpropagation Network • Let us look at the CPN structure again. • How can this network determine its hidden-layer winner unit? Additional connections! Neural Networks Lecture 17: Self-Organizing Maps

  15. The Solution: Maxnet • A maxnet is a recurrent, one-layer network that uses competition to determine which of its nodes has the greatest initial input value. • All pairs of nodes have inhibitory connections with the same weight -, where typically   1/(# nodes). • In addition, each node has a self-excitatory connection to itself, whose weight  is typically 1. • The nodes update their net input and their output by the following equations: Neural Networks Lecture 17: Self-Organizing Maps

  16. Maxnet • All nodes update their output simultaneously. • With each iteration, the neurons’ activations will decrease until only one neuron remains active. • This is the “winner” neuron that had the greatest initial input. • Maxnet is a biologically plausible implementation of a maximum-finding function. • In parallel hardware, it can be more efficient than a corresponding serial function. • We can add maxnet connections to the hidden layer of a CPN to find the winner neuron. Neural Networks Lecture 17: Self-Organizing Maps

  17. Maxnet Example • Example of a Maxnet with five neurons and  = 1,  = 0.2: 0.5 0 0 0 0.9 0 0.24 0.07 0.07 0.24 0 0.9 Winner! 0 0.24 0.9 0.07 0.22 0.36 0.17 1 Neural Networks Lecture 17: Self-Organizing Maps

  18. Self-Organizing Maps (Kohonen Maps) As you may remember, the counterpropagation network employs a combination of supervised and unsupervised learning. We will now study Self-Organizing Maps (SOMs) as examples for completely unsupervised learning (Kohonen, 1980). This type of artificial neural network is particularly similar to biological systems (as far as we understand them). Neural Networks Lecture 17: Self-Organizing Maps

  19. Self-Organizing Maps (Kohonen Maps) In the human cortex, multi-dimensional sensory input spaces (e.g., visual input, tactile input) are represented by two-dimensional maps. The projection from sensory inputs onto such maps is topology conserving. This means that neighboring areas in these maps represent neighboring areas in the sensory input space. For example, neighboring areas in the sensory cortex are responsible for the arm and hand regions. Neural Networks Lecture 17: Self-Organizing Maps

  20. Self-Organizing Maps (Kohonen Maps) • Such topology-conserving mapping can be achieved by SOMs: • Two layers: input layer and output (map) layer • Input and output layers are completely connected. • Output neurons are interconnected within a defined neighborhood. • A topology (neighborhood relation) is defined on the output layer. Neural Networks Lecture 17: Self-Organizing Maps

  21. Self-Organizing Maps (Kohonen Maps) output vector o • Network structure: … O1 O2 O3 Om … x1 x2 xn input vector x Neural Networks Lecture 17: Self-Organizing Maps

  22. i i Neighborhood of neuron i Self-Organizing Maps (Kohonen Maps) Common output-layer structures: One-dimensional(completely interconnected) Two-dimensional(connections omitted, only neighborhood relations shown [green]) Neural Networks Lecture 17: Self-Organizing Maps

  23. position of k position of i Self-Organizing Maps (Kohonen Maps) A neighborhood function (i, k) indicates how closely neurons i and k in the output layer are connected to each other. Usually, a Gaussian function on the distance between the two neurons in the layer is used: Neural Networks Lecture 17: Self-Organizing Maps

  24. Unsupervised Learning in SOMs For n-dimensional input space and m output neurons: (1) Choose random weight vector wi for neuron i, i = 1, ..., m (2) Choose random input x (3) Determine winner neuron k: ||wk – x|| = mini ||wi – x|| (Euclidean distance) (4) Update all weight vectors of all neurons i in the neighborhood of neuron k: wi := wi + ·(i, k)·(x – wi) (wi is shifted towards x) (5) If convergence criterion met, STOP. Otherwise, narrow neighborhood function and learning parameter  and go to (2). Neural Networks Lecture 17: Self-Organizing Maps

More Related