1 / 48

Artificial Intelligence Techniques

Artificial Intelligence Techniques. Multilayer Perceptrons. Overview. The multi-layered perceptron Back-propagation Introduction to training Uses. Pattern space - linearly separable. X2. X1. Non-linearly separable problems.

mervin
Download Presentation

Artificial Intelligence Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Artificial Intelligence Techniques Multilayer Perceptrons

  2. Overview • The multi-layered perceptron • Back-propagation • Introduction to training • Uses

  3. Pattern space - linearly separable X2 X1

  4. Non-linearly separable problems • If a problem is not linearly separable, then it is impossible to divide the pattern space into two regions • A network of neurons is needed • Until fairly recently, it was not known how to train a multi-layered network

  5. Pattern space - non linearly separable X2 Decision surface X1

  6. The multi-layered perceptron (MLP) Hidden layer Output layer Input layer

  7. Complex decision surface • The MLP has the ability to emulate any function using one hidden layer with a sigmoid function, and a linear output layer • A 3-layered network can therefore produce any complex decision surface • However, the number of neurons in the hidden layer cannot be calculated

  8. The multi-layered perceptron (MLP) Hidden layer Output layer Input layer

  9. Network architecture • All neurons in one layer are connected to all neurons in the next layer • The network is a feedforward network, so all data flows from the input to the output • The architecture of the network shown is described as 3:4:2 • All neurons in the hidden and output layers have a bias connection

  10. Input layer • Receives all of the inputs • Number of neurons equals the number of inputs • Does no processing • Connects to all the neurons in the hidden layer

  11. Hidden layer • Could be more than one layer, but theory says that only one layer is necessary • The number of neurons is found by experiment • Processes the inputs • Connects to all neurons in the output layer • The output is a sigmoid function

  12. Output layer • Produces the final outputs • Processes the outputs from the hidden layer • The number of neurons equals the number of outputs • The output could be linear or sigmoid

  13. Problems with networks • Originally the neurons had a hard-limiter on the output • Although an error could be found between the desired output and the actual output, which could be used to adjust the weights in the output layer, there was no way of knowing how to adjust the weights in the hidden layer

  14. The invention of back-propagation • By introducing a smoothly changing output function, it was possible to calculate an error that could be used to adjust the weights in the hidden layer(s)

  15. Output function The sigmoid function 1.2 1 0.8 y 0.6 0.4 0.2 0 1 2 3 4 5 -5 -4 -3 -2 -1 -0 0.5 1.5 2.5 3.5 4.5 -4.5 -3.5 -2.5 -1.5 -0.5 net

  16. Sigmoid function • The sigmoid function goes smoothly from 0 to 1 as net increases • The value of y when net=0 is 0.5 • When net is negative, y is between 0 and 0.5 • When net is positive, y is between 0.5 and 1.0

  17. Back-propagation • The method of training is called the back-propagation of errors • The algorithm is an extension of the delta rule, called the generalised delta rule

  18. Generalised delta rule • The equation for the generalised delta rule is ΔWi = ηXiδ • δ is the defined according to which layer is being considered. • For the output layer, δ is y(1-y)(d-y). • For the hidden layer δ is a more complex.

  19. Pattern recognition • Many problems can be described as pattern recognition • For example, voice recognition, face recognition, optical character recognition

  20. Pattern classification • A more precise definition is pattern classification • In pattern classification a system is shown examples of a number of objects • Each object is given a label or class • The task of the system is to correctly classify objects that it hasn’t seen before

  21. Example of 2-input data

  22. Pattern space

  23. Training a network • The problem could not be implemented on a single layer - nonlinearly separable • A 3 layer MLP was tried with 4 neurons in the hidden layer - which trained • The number of neurons in the hidden layer was reduced to 2 and still trained • With 1 neuron in the hidden layer it failed to train

  24. The weights • The weights for the 2 neurons in the hidden layer are -9, 3.6 and 0.1 and 6.1, 2.2 and -7.8 • These weights can be shown in the pattern space as two lines • The lines divide the space into 4 regions

  25. The hidden neurons

  26. Training and Testing • Starting with a data set, the first step is to divide the data into a training set and a test set • Use the training set to adjust the weights until the error is acceptably low • Test the network using the test set, and see how many it gets right

  27. A better approach • Critics of this standard approach have pointed out that training to a low error can sometimes cause “overfitting”, where the network performs well on the training data but poorly on the test data • The alternative is to divide the data into three sets, the extra one being the validation set

  28. Validation set • During training, the training data is used to adjust the weights • At each iteration, the test data is also passed through the network and the error recorded but the weights are not adjusted • The training stops when the error for the test set starts to increase

  29. Stopping criteria error Stop here Test set Training set time

  30. Architecture Hidden layer Output layer Input layer

  31. Back-propagation • The method of training is called the back-propagation of errors • The algorithm is an extension of the delta rule, called the generalised delta rule

  32. Generalised delta rule • The equation for the generalised delta rule is ΔWi = ηXiδ • δ is the defined according to which layer is being considered. • For the output layer, δ is y(1-y)(d-y). • For the hidden layer δ is a more complex.

  33. Hidden Layer • We have to deal with the error from the output layer being feedback backwards to the hidden layer. • Lets look at example the weight w2(1,2) • Which is the weight connecting neuron 1 in the input layer with neuron 2 in the hidden layer.

  34. Δw2(1,2)=ηX1(1)δ2(2) • Where • X1(1) is the output of the neuron 1 in the hidden layer. • δ2(2) is the error on the output of neuron 2 in the hidden layer. • δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)

  35. δ3(1) = y(1-y)(d-y) = x3(1)[1-x3(1)][d-x3(1)] • So we start with the error at the output and use this result to ripple backwards altering the weights.

  36. Example • Exclusive OR using the network shown earlier: 2:2:1 network • Initial weights • W2(0,1)=0.862518 W2(1,1)=-0.155797 W2(2,1)=0.282885 • W2(0,2)=0.834986 w2(1,2)=-0.505997 w2(2,2)=-0.864449 • W3(0,1)=0.036498 w3(1,1)=-0.430437 w3(2,1)=0.48121

  37. Feedforward – hidden layer (neuron 1) • So if • X1(0)=1 (the bias) • X1(1)=0 • X1(2)=0 • The output of weighted sum inside neuron 1 in the hidden layer=0.862518 • Then using sigmoid function • X2(1)=0.7031864

  38. Feedforward – hidden layer (neuron 2) • So if • X1(0)=1 (the bias) • X1(1)=0 • X1(2)=0 • The output of weighted sum inside neuron 2 in the hidden layer=0.834986 • Then using sigmoid function • X2(2)=0.6974081

  39. Feedforward – output layer • So if • X2(0)=1 (the bias) • X2(1)=0.7031864 • X2(2)=0.6974081 • The output of weighted sum inside neuron 2 in the hidden layer=0.0694203 • Then using sigmoid function • X3(1)=0.5173481 • Desired output=0

  40. δ3(1)=x3(1)[1-x3(1)][d-x3(1)] =-0.1291812 • δ2(1)=X2(1)[1-X2(1)]w3(1,1) δ3(1)=0.0116054 • δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)=-0.0131183 • Now we can use the delta rule to calculate the change in the weights • ΔWi = ηXiδ

  41. Examples • If we set η=0.5 • ΔW2(0,1) = ηX1(0)δ2(1) =0.5 x 1 x 0.0116054 =0.0058027 • ΔW3(2,1) = ηX2(1)δ3(1) =0.5 x 0.7031864 x –0.1291812 =-0.04545192

  42. What would be the results of the following? • ΔW2(2,1) = ηX1(2)δ2(1) • ΔW2(2,2) = ηX1(2)δ2(2)

  43. ΔW2(2,1) = ηX1(2)δ2(1) =0.5x0x0.0116054 =0 • ΔW2(2,2) = ηX1(2)δ2(2) =0.5 x 0 x –0.131183 =0

  44. New weights • W2(0,1)=0.868321 W2(1,1)=-0.155797 W2(2,1)=0.282885 • W2(0,2)=0.828427 w2(1,2)=-0.505997 w2(2,2)=-0.864449 • W3(0,1)=0.028093 w3(1,1)=-0.475856 w3(2,1)=0.436164

  45. Conclusions • Train using training, test and validation sets • An MLP can be used to recognise (classify) complex data • It uses supervised learning with back-propagation to adjust the weights • It divides the pattern space in the hidden layer

  46. Conclusions • Extending the delta rule to do back propagation • Need to calculate the error at the outputs of neurones in the hidden and output layers • δ3(1)=x3(1)[1-x3(1)][d-x3(1)] • δ2(1)=X2(1)[1-X2(1)]w3(1,1) δ3(1) • δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)

  47. Once you have the error values (δ’s) for the neurones you then use the delta rule to calculate the actual change in the weights. • ΔWi = ηXiδ

More Related