Topic 3.

Topic 3. Learning Rules of the Artificial Neural Networks.

Multilayer Perceptron. • The first layer is the input layer, • and the last layer is the output layer. • All other layers with no direct connections from or to the outside are called hidden layers.

Multilayer Perceptron. • The input is processed and relayed from one layer to the next, until the final result has been computed. • This process represents the feedforward scheme.

Multilayer Perceptron. • structural credit assignment problem: when an error is made at the output of a network, how is credit (or blame) to be assigned to neurons deep within the network? • One of the most popular techniques to train the hidden neurons is error backpropagation, • whereby the error of output units is propagated back to yield estimates of how much a given hidden unit contributed to the output error.

Multilayer Perceptron. • The error function of multilayer perceptron: The best performance of the network corresponds to the minimum of the total squared error, and during the network training, we adjust the weights of connections in order to get to that minimum.

Multilayer Perceptron. • Combination of the weights, including that of hidden neurons, which minimises the error functionEis considered to be a solution of multiple layer perceptron learning problem .

Multilayer Perceptron. • The error function of multilayer perceptron: • The backpropagation algorithmlooks for the minimum of the multi-variable error function Ein the space of weights of connections w using the method of gradient descent.

Multilayer Perceptron. • Following calculus, a local minimum of a function of two or more variables is defined by equality to zero of its gradient: where is partial derivative of the error function Ewith respect to the weight of connection between h-th unit in the layer kand t-th unit in the previous layer number k-1.

Multilayer Perceptron. We would like to go in the direction opposite to to most rapidly minimiseE. Therefore, during the iterative process of gradient descenteach weight of connection, including the hidden ones, is updated: using the increment here Crepresents the learning rate.

Multilayer Perceptron. where Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the error function Ebe a differentiable function

Multilayer Perceptron. Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the error function Ebe a differentiable function, which requiresthe network output Xjpto be differentiable, which requiresthe activation functions f(S) to be differentiable: where

Multilayer Perceptron. Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the error function Ebe a differentiable function, which requires the network output Xjpto be differentiable, which requires the activation functions f(S) to be differentiable: This provides a powerful motivation for using continuous and differentiable activation functions f(w,a). where

Multilayer Perceptron. Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the activation functions f(S) to be differentiable. • To make a multiple layer perceptron to be “able to learn” here is a usefulgeneric sigmoid activation functionassociated with a hidden or output neuron: where

Multilayer Perceptron. Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the activation functions f(S) to be differentiable. • To make a multiple layer perceptron to be “able to learn” here is a usefulgeneric sigmoid activation functionassociated with a hidden or output neuron: Important thing about the generic sigmoid function is that it is differentiable, with a very simple and easy to compute derivative where

Multilayer Perceptron. Since calculus-based methods of minimisation rest on the taking of derivatives, their application to network training requires the activation functions f(S) to be differentiable. • To make a multiple layer perceptron to be “able to learn” here is a usefulgeneric sigmoid activation functionassociated with a hidden or output neuron: If all activation functions f(S) in the network are differentiable then, according to thechain ruleof calculus, differentiating the error functionEwith respect to the weight of connection in consideration we can express the corresponding partial derivative of the error function where

Multilayer Perceptron. Then…. where

Multilayer Perceptron. where

Multilayer Perceptron. where Thus, correction to the hidden weight of connection between h-th unit in the k-th layer and t-th unit in the previous (k-1)-th layer can be found by

Multilayer Perceptron Learning rule!!! where • The correction is defined by • the output layer errorsejp, • derivatives of activation functions of all • neurons in the upper layers with numbers p > k, • derivative of activation function of the neuron hitself in the layer k, • activation function of connected neuron t in the previous layer (k-1).

Multilayer Perceptron Learning rule!!! where We can easily measure the output errors of the network, and it is us to define all the activation functions. If we also know the derivatives of the activation functions, then we can easily find all the corrections to weights of connections of all neurons in the network, including the hidden ones, during the second run back through the network.

Multilayer Perceptron Training. The training process of multilayer perceptron consists oftwo phases. Initial values of the weights of connections set up randomly. Then, during the first, feedforward phase, starting from the input layer and further layer-by-layer, outputs of every unit in the network are computed together with the corresponding derivatives. Figure: Directions of two basic signal flows in multilayer perceptron: forward propagation of function signals and back-propagation of error signals.

Multilayer Perceptron Training. The training process of multilayer perceptron consists oftwo phases. Initial values of the weights of connections set up randomly. Then, during the first, feedforward phase, starting from the input layer and further layer-by-layer, outputs of every unit in the network are computed together with the corresponding derivatives. In the second, feedback phase corrections to all weights of connections of all units including the hidden ones are computed using the outputs and derivatives computed during the feedforward phase. Figure: Directions of two basic signal flows in multilayer perceptron: forward propagation of function signals and back-propagation of error signals.

Layer N 0 1 2 0 0 0 Unit N 1 1 1 2 2 input layer hidden layer output layer Multilayer Perceptron Training. To understand the second, error back-propagation phase of computing corrections to the weights, let us follow an example of a small three-layer perceptron.

Layer N 0 1 2 0 0 0 Unit N 1 1 1 2 2 input layer hidden layer output layer Multilayer Perceptron Training. To understand the second, error back-propagation phase of computing corrections to the weights, let us follow an example of a small three-layer perceptron. Suppose that we have found all outputs and corresponding derivatives of activation functions of all computing units including the hidden ones in the network.

Layer N 0 1 2 0 0 0 Unit N 1 1 1 2 2 input layer hidden layer output layer Multilayer Perceptron Training. We shall mark values of the layer in consideration, values of the layer previous to the one in consideration,

Layer N 0 1 2 0 0 0 Unit N 1 1 1 2 2 input layer hidden layer output layer Multilayer Perceptron Training. Weight of connection between unit number 1 (first lower index) in the output layer (layer number 2 shown as the upper index) and unit number 0 (second lower index) in the previous layer (number 1=2-1) after presentation of a training pattern would have a correction

Layer N 0 1 2 0 0 0 Unit N 1 1 1 2 2 input layer hidden layer output layer Multilayer Perceptron Training. Analogously, corrections to all six weights of connections between the output layer and the hidden layer are obtained as

Layer N 0 1 2 0 0 0 Unit N 1 1 1 2 2 input layer hidden layer output layer Multilayer Perceptron Training.Corrections to hidden units connections. We shall mark values ofthe layer in consideration, values of thelayerpreviousto the one in consideration, values of the layers above the one in consideration,

Layer N 0 1 2 0 0 0 Unit N 1 1 1 2 2 input layer hidden layer output layer Multilayer Perceptron Training.Corrections to hidden units connections. Weight of connection between unit number 1 (first lower index) in the hidden layer (layer number 1 shown in the upper index) and unit number 0 in the previous input layer (second lower index) would have a correction

Layer N 0 1 2 0 0 0 Unit N 1 1 1 2 2 input layer hidden layer output layer Multilayer Perceptron Training.Corrections to hidden units connections. Analogously, for all six weights of connections between the hidden layer and the input layer:

Layer N 0 1 2 0 0 0 Unit N 1 1 1 2 2 input layer hidden layer output layer Multilayer Perceptron Training.Corrections to hidden units connections. • In this way going backwards through the network, one obtain the corrections to all weights …,

Layer N 0 1 2 0 0 0 Unit N 1 1 1 2 2 input layer hidden layer output layer Multilayer Perceptron Training.Corrections to hidden units connections. • In this way going backwards through the network, one obtain the corrections to all weights …, • then update the weights.

Layer N 0 1 2 0 0 0 Unit N 1 1 1 2 2 input layer hidden layer output layer Multilayer Perceptron Training.Corrections to hidden units connections. • In this way going backwards through the network, one obtain the corrections to all weights …, • then update the weights. • After that, with the new weights go forward to get new outputs…

Layer N 0 1 2 0 0 0 Unit N 1 1 1 2 2 input layer hidden layer output layer Multilayer Perceptron Training.Corrections to hidden units connections. • In this way going backwards through the network, one obtain the corrections to all weights …, • then update the weights. • After that, with the new weights go forward to get new outputs… • Find new error, go backwards and so on…

Multilayer Perceptron Training. • In this way going backwards through the network, one obtain the corrections to all weights …, then update the weights. • After that, with the new weights go forward to get new outputs… • Find new error, go backwards and so on… • Hopefully,sooner or later the iterative procedure will come to output with the minimum error, i.e. the absolute minimum of the error function E.

Multilayer Perceptron Training. • In this way going backwards through the network, one obtain the corrections to all weights …, then update the weights. • After that, with the new weights go forward to get new outputs… • Find new error, go backwards and so on… • Hopefully,sooner or later the iterative procedure will come to output with the minimum error, i.e. the absolute minimum of the error function E. • Unfortunately, as a function of many variables,the error function might have more than one minimum,and one may get not to the absolute minimum but to a relative one.

Multilayer Perceptron Training. • Unfortunately, as a function of many variables,the • error function might have more than one minimum,and one may get not to the absolute minimum but to a relative one. • If it happens, the error function stops to decrease regardless of number of iteration. • Some measures must be taken to get out of the function relative minimum, for example, adding small random values, i.e. “noise”, to one or more of the weights. • Then the iterative procedure starts from that new point to get to the absolute minimum eventually.

Multilayer Perceptron Training. • Finally, after successful training, perceptron is able to produce the desired responses to all input patterns of the training set.

Multilayer Perceptron Training. • Finally, after successful training, perceptron is able to produce the desired responses to all input patterns of the training set. • Then all the network weights of connections are fixed,

Multilayer Perceptron Training. • Finally, after successful training, perceptron is able to produce the desired responses to all input patterns of the training set. • Then all the network weights of connections are fixed, • and the network is presented with inputs it must “recognise”, i.e. not the training set inputs.

Multilayer Perceptron Training. • Finally, after successful training, perceptron is able to produce the desired responses to all input patterns of the training set. • Then all the network weights of connections are fixed, • and the network is presented with inputs it must “recognise”, i.e. not the training set inputs. • If an input in consideration produces an output similar to one of the training set, such input is said to belong to the same type or clusterof inputs as the corresponding one of the training set.

Multilayer Perceptron Training. • Then all the network weights of connections are fixed, • and the network is presented with inputs it must “recognise”, i.e. not the training set inputs. • If an input in consideration produces an output similar to one of the training set, such input is said to belong to the same type or clusterof inputs as the corresponding one of the training set. • If the network produces an output not similar to any of the training set, then such an input is said not been recognised.

Multilayer Perceptron Training. Conclusion. • In 1969 Minsky and Papert not just found the solution to the XOR problem in a form of multilayer perceptron, they also gave a very thorough mathematical analysis of the time it takes to train such networks. • Minsky and Papert emphasized that training times increase very rapidly for certain problems as the number of input lines and weights of connections increases.

Multilayer Perceptron Training. Conclusion. • Minsky and Papert emphasized thattraining times increase very rapidly for certain problems as the number of input lines and weights of connections increases. • The difficulties were seized upon by opponents of the subject. In particular, this was true of those working in the field of artificial intelligence (AI), who at that time did not want to concern themselves with the underlying “wetware” of the brain, but only with the functional aspects – regarded by them solely as logical processing. • Due to the limitations of funding, competition between AI and neural network communities could have only one victor.

Multilayer Perceptron Training. Conclusion. • Due to the limitations of funding, competition between AI and neural network communities could have only one victor. • Neural networks then went into a relative quietude for more then fifteen years, with only a few devotees still working on it.

Multilayer Perceptron Training. Conclusion. • Due to the limitations of funding, competition between AI and neural network communities could have only one victor. • Neural networks then went into a relative quietude for more then fifteen years, with only a few devotees still working on it. • Then new vigour came from various sources. One was from the increasing power of computers, allowing simulations of otherwise intractable problems.

Multilayer Perceptron Training. Conclusion. • New vigour came from various sources. One was from the increasing power of computers, allowing simulations of otherwise intractable problems. • Finally, established by the mid 80s the backpropagation algorithm solved the difficulty of training hidden neurons.

Multilayer Perceptron Training. Conclusion. • New vigour came from various sources. One was from the increasing power of computers, allowing simulations of otherwise intractable problems. • Finally, established by the mid 80s the backpropagation algorithm solved the difficulty of training hidden neurons. • Nowadays, Perceptron is an effective tool for recognising protein and amino-acid sequences and processing other complex biological data.

Topic 3.

Topic 3.

Presentation Transcript

Topic 3

Topic 3

Topic 3

Topic-3

Topic 3

Topic 3

Topic 3

Topic 3

Topic 3

Topic 3

TOPIC 3

Topic 3

TOPIC 3

Topic 3

Topic-3

Topic 3

TOPIC 3

Topic 3

Topic 3

Topic 3

Topic 3

TOPIC 3