Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6). SCCS451 Artificial Intelligence Week 12. Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath Pupacdi. Agenda. Multilayer Neural Network Hopfield Network.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Chapter 6: Artificial Neural NetworksPart 2 of 3 (Sections 6.4 – 6.6)
SCCS451 Artificial Intelligence
Week 12
Asst. Prof. Dr. SukanyaPongsuparb
Dr. SrisupaPalakvangsa Na Ayudhya
Dr. BenjarathPupacdi
Singlelayer VS Multilayer Neural Networks
Output is a real number in the [0, 1] range.
Hard limit functions often used for decisionmaking neurons for classification and pattern recognition
Popular in backpropagation networks
Often used for linear approximation
1. Compute the net weighted input
2. Pass the result to the activation function
0.98
0.98
Input Signals
2
0.1
0.2
0.98
5
j
0.5
0.98
1
0.3
0.98
Let θ= 0.2
8
0.98
X = (0.1(2) + 0.2(5) + 0.5(1) + 0.3(8)) – 0.2
= 3.9
Y = 1 / (1 + e3.9) = 0.98
Error at an output neuron k at iteration p
Iteration p
2
0.1
0.2
5
k
0.98
0.5
1
0.3
Suppose the expected output is 1.
8
ek(p) = 1 – 0.98 = 0.02
Error Signals
Step 1: Initialization
Randomly define weightsandthreshold θsuch that the numbers are within a small range
where Fi is the total number of inputs of neuron i. The weight initialization is done on a neuronbyneuron basis.
Step 2: Activation
Propagate the input signals forward from the input layer to the output layer.
0.98
2
0.1
Let θ= 0.2
Input Signals
0.2
0.98
5
j
0.5
0.98
1
0.3
0.98
8
0.98
X = (0.1(2) + 0.2(5) + 0.5(1) + 0.3(8)) – 0.2
= 3.9
Y = 1 / (1 + e3.9) = 0.98
Step 3: Weight Training
There are 2 types of weight training
***It is important to understand that first the input signals propagate forward, and then the errors propagate backward to help train the weights. ***
In each iteration (p + 1), the weights are updated based on the weights from the previous iteration p.
The signals keep flowing forward and backward until the errors are below some preset threshold value.
1
w1,k
Iteration p
w2,k
2
k
yk(p)
wj,k
j
wm,k
ek(p) = yd,k(p)  yk(p)
m
δ = error gradient
We want to compute this
We know this
We know how to compute this
predefined
1
w1,k
Iteration p
We know how to compute these
w2,k
2
k
yk(p)
wj,k
j
wm,k
ek(p) = yd,k(p)  yk(p)
m
We do the above for each of the weights of the outer layer neurons.
1
Iteration p
1
w1,j
w2,j
2
2
j
wi,j
i
k
wn,j
n
l
We want to compute this
We know this
input
predefined
We know this
1
Iteration p
1
We know how to compute these
w1,j
w2,j
2
2
Propagates from the outer layer
j
wi,j
i
k
We do the above for each of the weights of the hidden layer neurons.
wn,j
n
l
P = 1
P = 1
Weights trained
Weights trained
After the weights are trained in p = 1, we go back to Step 2 (Activation) and compute the outputs for the new weights.
If the errors obtained via the use of the updated weights are still above the error threshold, we start weight training for p = 2.
Otherwise, we stop.
P = 2
P = 2
Weights trained
Weights trained
x2 or input2
(0, 1)
(1, 1)
(0, 0)
(1, 0)
x1 or input1
XOR is not a linearly separable function.
A singlelayer ANN or the perceptron cannot deal with problems that are not linearly separable. We cope with these problem using multilayer neural networks.
Let α = 0.1
(Noncomputing)
Calculate y3 = sigmoid(0.5+0.40.8) = 0.5250
0.5
1
0.5250
e= 0 – 0.5097
= – 0.5097
0.4
0.63
0.9
0.9689
1
0.8808
1.0
Calculate y4= sigmoid(0.9+1.0+0.1) = 0.8808
Let α = 0.1
0.8
0.5
1
1.2
0.3
0.4
0.5097
0
0.9
1
1.1
1.0
0.1
y5 = sigmoid(0.63+0.96890.3) = 0.5097
26
0.5
1
0.5250
0.4
0.63
0.9
0.9689
1
0.8808
1.0
Δwj,k(p) = α x yj(p) x δk(p)
Let α = 0.1
Δw3,5 (1) = 0.1 x 0.5250 x (0.1274) = 0.0067
wj,k(p+1) = wj,k(p) + Δwj,k(p)
w3,5 (2) = 1.2 – 0.0067 = 1.2067
0.8
0.5
1
1.2067
1.2
0.3
e= – 0.5097
0.4
y5 = 0.5097
0
0.9
1
δ = y5x (1y5) x e
1.1
= 0.5097 x (10.5097) x (0.5097)
= 0.1274
1.0
0.1
27
0.5
1
0.5250
0.4
0.63
0.9
0.9689
1
0.8808
1.0
Let α = 0.1
0.8
0.5
1
1.2067
1.2
0.3
e= – 0.5097
0.4
y5 = 0.5097
0
0.9
1
δ =0.1274
1.1
1.0888
1.0
Δwj,k(p) = α x yj(p) x δk(p)
0.1
Δw4,5 (1) = 0.1 x 0.8808 x (0.1274) = 0.0112
wj,k(p+1) = wj,k(p) + Δwj,k(p)
w4,5 (2) = 1.1 – 0.0112 = 1.0888
28
0.5
1
0.5250
0.4
0.63
0.9
0.9689
1
0.8808
1.0
Let α = 0.1
Δθk(p) = α x y(p) x δk(p)
Δθ5 (1) = 0.1 x 1 x (0.1274) = 0.0127
θ5 (p+1) = θ5 (p) + Δ θ5 (p)
θ5 (2) = 0.3 + 0.0127= 0.3127
0.8
0.5
1
1.2067
0.3127
1.2
0.3
e= – 0.5097
0.4
y5 = 0.5097
0
0.9
1
δ =0.1274
1.1
1.0888
1.0
0.1
29
0.5
1
0.5250
0.4
0.63
0.9
0.9689
1
0.8808
1.0
Let α = 0.1
δj(p) = yi(p) x (1yi (p)) x ∑ [αk(p) wj,k(p)], all k’s
Δwi,j(p) = α x xi(p) x δj(p)
δ3(p) = 0.525 x (1 0.525) x (0.1274 x 1.2)
= 0.0381
Δw1,3 (1) = 0.1 x 1 x 0.0381 = 0.00381
wi,j(p+1) = wi,j(p) + Δwi,j(p)
w1,3 (2) = 0.5 + 0.00381 = 0.5038
0.8
0.5038
0.5
1
1.2067
0.3127
1.2
0.3
e= – 0.5097
0.4
y5 = 0.5097
0
0.9
1
δ =0.1274
1.1
1.0888
1.0
0.1
30
0.5
1
0.5250
0.4
0.63
0.9
0.9689
1
0.8808
1.0
Let α = 0.1
δj(p) = yi(p) x (1yi (p)) x ∑ [αk(p) wj,k(p)], all k’s
Δwi,j(p) = α x xi(p) x δj(p)
δ4(p) = 0.8808 x (1 0.8808) x (0.1274 x 1.1)
= 0.0147
Δw1,4 (1) = 0.1 x 1 x 0.0147 = 0.0015
wi,j(p+1) = wi,j(p) + Δwi,j(p)
w1,4 (2) = 0.9 0.0015 = 0.8985
0.8
0.5038
0.5
1
1.2067
0.3127
1.2
0.3
e= – 0.5097
0.4
y5 = 0.5097
0.8985
0
0.9
1
δ =0.1274
1.1
1.0888
1.0
0.1
31
0.5
1
0.5250
0.4
0.63
0.9
0.9689
1
0.8808
1.0
Let α = 0.1
δ3(p) = 0.0381δ4(p) = 0.0147
Δwi,j(p) = α x xi(p) x δj(p)
Δw2,3 (1) = 0.1 x 1 x 0.0381 = 0.0038
wi,j(p+1) = wi,j(p) + Δwi,j(p)
w2,3 (2) = 0.4 + 0.0038 = 0.4038
0.8
0.5038
0.5
1
1.2067
0.3127
1.2
0.3
e= – 0.5097
0.4
0.4038
y5 = 0.5097
0.8985
0
0.9
1
δ =0.1274
1.1
1.0888
1.0
0.1
32
0.5
1
0.5250
0.4
0.63
0.9
0.9689
1
0.8808
1.0
Let α = 0.1
δ3(p) = 0.0381δ4(p) = 0.0147
Δwi,j(p) = α x xi(p) x δj(p)
Δw2,4 (1) = 0.1 x 1 x 0.0147 = 0.0015
wi,j(p+1) = wi,j(p) + Δwi,j(p)
w2,4 (2) = 1 – 0.0015 = 0.9985
0.8
0.5038
0.5
1
1.2067
0.3127
1.2
0.3
e= – 0.5097
0.4
0.4038
y5 = 0.5097
0.8985
0
0.9
1
δ =0.1274
1.1
1.0888
1.0
0.1
0.9985
33
0.5
1
0.5250
0.4
0.63
0.9
0.9689
1
0.8808
1.0
Let α = 0.1
δ3(p) = 0.0381δ4(p) = 0.0147
Δθk(p) = α x y(p) x δk(p)
Δθ3 (1) = 0.1 x 1 x 0.0381 = 0.0038
θ3 (p+1) = θ3 (p) + Δ θ3 (p)
θ3 (2) = 0.8  0.0038 = 0.7962
0.7962
0.8
0.5038
0.5
1
1.2067
0.3127
1.2
0.3
e= – 0.5097
0.4
0.4038
y5 = 0.5097
0.8985
0
0.9
1
δ =0.1274
1.1
1.0888
1.0
0.1
0.9985
34
0.5
1
0.5250
0.4
0.63
0.9
0.9689
1
0.8808
1.0
Let α = 0.1
δ3(p) = 0.0381δ4(p) = 0.0147
Δθk(p) = α x y(p) x δk(p)
Δθ4 (1) = 0.1 x 1 x (0.0147) = 0.0015
θ4 (p+1) = θ4 (p) + Δ θ4 (p)
θ4 (2) = 0.1 + 0.0015 = 0.0985
0.7962
0.8
0.5038
0.5
1
1.2067
0.3127
1.2
0.3
e= – 0.5097
0.4
0.4038
y5 = 0.5097
0.8985
0
0.9
1
δ =0.1274
1.1
1.0888
1.0
0.1
0.9985
0.0985
35
α = 0.1
0.7962
0.8
0.5038
0.5
1.2067
0.3127
1.2
0.3
0.4
0.4038
0.8985
0.9
1.1
1.0888
1.0
0.1
0.9985
Now the 1st iteration (p = 1) is finished. Weight training process is repeated until the sum of squared errors is less than 0.001 (threshold).
36
0.0985
The curve shows ANN learning speed.
224 epochs or 896 iterations were required.
7.3
0.8
4.7
0.5
10.4
4.6
1.2
0.3
0.4
4.8
6.4
0.9
Training again with different initial values may result differently. It works so long as the sum of squared errors is below the preset error threshold.
1.1
9.8
1.0
0.1
6.4
38
2.8
Different result possible for different initial.
But the result always satisfies the criterion.
Activation function: sign function
(a) Decision boundary constructed by hidden neuron 3;
(b) Decision boundary constructed by hidden neuron 4; (c) Decision boundaries constructed by the complete
threelayer network
where a and b are constants.
Suitable values: a = 1.716 and b = 0.667
where is a positive number (0 1) called the
momentum constant. Typically, the momentum constant is set to 0.95.
This equation is called the generalized delta rule.
Reduced from 224 to 126 epochs
I n p u t S i g n a l s
O u t p u t S i g n a l s

é
ù
é
1
ù
1
ê
ú
ê
ú
=

Y
1
=
Y
1
ê
ú
2
ê
ú
1
ê
ú

1
ê
ú
ë
û
1
ë
û

é
ù
é
1
ù
1
T
ê
ú
T
ê
ú
=
=



Y
1
1
1
Y
1
1
1
=

Y
1
=
Y
1
1
2
ê
ú
2
ê
ú
1
ê
ú

1
ê
ú
ë
û
1
ë
û
é
ù
1
0
0
ê
ú
=
I
0
1
0
ê
ú
ê
ú
0
0
1
ë
û

é
ù
é
ù
é
ù
é
ù
1
1
1
0
0
0
2
2
ê
ú
ê
ú
ê
ú
ê
ú
=
+





=
W
1
1
1
1
1
1
1
1
2
0
1
0
2
0
2
ê
ú
ê
ú
ê
ú
ê
ú
ê
ú
ê
ú
ê
ú
ê
ú

1
1
0
0
1
2
2
0
ë
û
ë
û
ë
û
ë
û
Ym = sign(W Xm – θ), m = 1, 2, …, M
Θis the threshold matrix
In this case all thresholds are set to zero.
X = (+1, +1, 1, +1, +1)
It is very similar to X1, but the network recalls it as X3.