Chapter 6 artificial neural networks part 2 of 3 sections 6 4 6 6
Download
1 / 64

Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) - PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on

Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6). SCCS451 Artificial Intelligence Week 12. Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath Pupacdi. Agenda . Multi-layer Neural Network Hopfield Network.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6)' - oprah


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Chapter 6 artificial neural networks part 2 of 3 sections 6 4 6 6

Chapter 6: Artificial Neural NetworksPart 2 of 3 (Sections 6.4 – 6.6)

SCCS451 Artificial Intelligence

Week 12

Asst. Prof. Dr. SukanyaPongsuparb

Dr. SrisupaPalakvangsa Na Ayudhya

Dr. BenjarathPupacdi


Agenda
Agenda

  • Multi-layer Neural Network

  • Hopfield Network


Multilayer neural networks
Multilayer Neural Networks

  • A multilayer perceptron is a feedforward neural network with ≥ 1 hidden layers.

Single-layer VS Multi-layer Neural Networks


Roles of layers
Roles of Layers

  • Input Layer

    • Accepts input signals from outside world

    • Distributes the signals to neurons in hidden layer

    • Usually does not do any computation

  • Output Layer (computational neurons)

    • Accepts output signals from the previous hidden layer

    • Outputs to the world

    • Knows the desired outputs

  • Hidden Layer (computational neurons)

    • Determines its own desired outputs


Hidden middle layers
Hidden (Middle) Layers

  • Neurons in hidden layers unobservable through input and output of the networks.

  • Desired output unknown (hidden) from the outside and determined by the layer itself

  • 1 hidden layer for continuous functions

  • 2 hidden layers for discontinuous functions

  • Practical applications mostly use 3 layers

  • More layers are possible but each additional layer exponentially increases computing load


How do multilayer neural networks learn
How do multilayer neural networks learn?

  • More than a hundred different learning algorithms are available for multilayer ANNs

  • The most popular method is back-propagation.


Back propagation algorithm
Back-propagation Algorithm

  • In a back-propagation neural network, the learning algorithm has 2 phases.

    • Forward propagation of inputs

    • Backward propagation of errors

  • The algorithm loops over the 2 phases until the errors obtained are lower than a certain threshold.

  • Learning is done in a similar manner as in a perceptron

    • A set of training inputs is presented to the network.

    • The network computes the outputs.

    • The weights are adjusted to reduce errors.

  • The activation function used is a sigmoid function.


Common activation functions
Common Activation Functions

Output is a real number in the [0, 1] range.

Hard limit functions often used for decision-making neurons for classification and pattern recognition

Popular in back-propagation networks

Often used for linear approximation



How a neuron determines its output
How a neuron determines its output

  • Very similar to the perceptron

1. Compute the net weighted input

2. Pass the result to the activation function

0.98

0.98

Input Signals

2

0.1

0.2

0.98

5

j

0.5

0.98

1

0.3

0.98

Let θ= 0.2

8

0.98

X = (0.1(2) + 0.2(5) + 0.5(1) + 0.3(8)) – 0.2

= 3.9

Y = 1 / (1 + e-3.9) = 0.98


How the errors propagate backward
How the errors propagate backward

  • The errors are computes in a similar manner to the errors in the perceptron.

  • Error = The output we want – The output we get

Error at an output neuron k at iteration p

Iteration p

2

0.1

0.2

5

k

0.98

0.5

1

0.3

Suppose the expected output is 1.

8

ek(p) = 1 – 0.98 = 0.02

Error Signals


Back propagation training algorithm
Back-Propagation Training Algorithm

Step 1: Initialization

Randomly define weightsandthreshold θsuch that the numbers are within a small range

where Fi is the total number of inputs of neuron i. The weight initialization is done on a neuron-by-neuron basis.


Back propagation training algorithm1
Back-Propagation Training Algorithm

Step 2: Activation

Propagate the input signals forward from the input layer to the output layer.

0.98

2

0.1

Let θ= 0.2

Input Signals

0.2

0.98

5

j

0.5

0.98

1

0.3

0.98

8

0.98

X = (0.1(2) + 0.2(5) + 0.5(1) + 0.3(8)) – 0.2

= 3.9

Y = 1 / (1 + e-3.9) = 0.98


Back propagation training algorithm2
Back-Propagation Training Algorithm

Step 3: Weight Training

There are 2 types of weight training

  • For the output layer neurons

  • For the hidden layer neurons

    ***It is important to understand that first the input signals propagate forward, and then the errors propagate backward to help train the weights. ***

    In each iteration (p + 1), the weights are updated based on the weights from the previous iteration p.

    The signals keep flowing forward and backward until the errors are below some preset threshold value.


3 1 weight training output layer neurons
3.1 Weight Training (Output layer neurons)

1

w1,k

Iteration p

w2,k

2

  • These formulas are used to perform weight corrections.

k

yk(p)

wj,k

j

wm,k

ek(p) = yd,k(p) - yk(p)

m

δ = error gradient


We want to compute this

We know this

We know how to compute this

predefined

1

w1,k

Iteration p

We know how to compute these

w2,k

2

k

yk(p)

wj,k

j

wm,k

ek(p) = yd,k(p) - yk(p)

m

We do the above for each of the weights of the outer layer neurons.


3 2 weight training hidden layer neurons
3.2 Weight Training (Hidden layer neurons)

1

Iteration p

1

w1,j

w2,j

2

2

  • These formulas are used to perform weight corrections.

j

wi,j

i

k

wn,j

n

l


We want to compute this

We know this

input

predefined

We know this

1

Iteration p

1

We know how to compute these

w1,j

w2,j

2

2

Propagates from the outer layer

j

wi,j

i

k

We do the above for each of the weights of the hidden layer neurons.

wn,j

n

l



P = 1

Weights trained

Weights trained


After the weights are trained in p = 1, we go back to Step 2 (Activation) and compute the outputs for the new weights.

If the errors obtained via the use of the updated weights are still above the error threshold, we start weight training for p = 2.

Otherwise, we stop.


P = 2 (Activation) and compute the outputs for the new weights.


P = 2 (Activation) and compute the outputs for the new weights.

Weights trained

Weights trained


Example 3 layer ann for xor
Example: 3-layer ANN for XOR (Activation) and compute the outputs for the new weights.

x2 or input2

(0, 1)

(1, 1)

(0, 0)

(1, 0)

x1 or input1

XOR is not a linearly separable function.

A single-layer ANN or the perceptron cannot deal with problems that are not linearly separable. We cope with these problem using multi-layer neural networks.


Example 3 layer ann for xor1
Example: 3-layer ANN for XOR (Activation) and compute the outputs for the new weights.

Let α = 0.1

(Non-computing)


Example 3 layer ann for xor2

Calculate (Activation) and compute the outputs for the new weights. y3 = sigmoid(0.5+0.4-0.8) = 0.5250

0.5

1

0.5250

e= 0 – 0.5097

= – 0.5097

0.4

-0.63

0.9

0.9689

1

0.8808

1.0

Calculate y4= sigmoid(0.9+1.0+0.1) = 0.8808

Example: 3-layer ANN for XOR

  • Training set: x1= x2= 1 and yd,5= 0

Let α = 0.1

0.8

0.5

1

-1.2

0.3

0.4

0.5097

0

0.9

1

1.1

1.0

-0.1

y5 = sigmoid(-0.63+0.9689-0.3) = 0.5097

26


Example 3 layer ann for xor 2

0.5 (Activation) and compute the outputs for the new weights.

1

0.5250

0.4

-0.63

0.9

0.9689

1

0.8808

1.0

Example: 3-layer ANN for XOR (2)

  • Back-propagation of error (p = 1, output layer)

Δwj,k(p) = α x yj(p) x δk(p)

Let α = 0.1

Δw3,5 (1) = 0.1 x 0.5250 x (-0.1274) = -0.0067

wj,k(p+1) = wj,k(p) + Δwj,k(p)

w3,5 (2) = -1.2 – 0.0067 = -1.2067

0.8

0.5

1

-1.2067

-1.2

0.3

e= – 0.5097

0.4

y5 = 0.5097

0

0.9

1

δ = y5x (1-y5) x e

1.1

= 0.5097 x (1-0.5097) x (-0.5097)

= -0.1274

1.0

-0.1

27


Example 3 layer ann for xor 3

0.5 (Activation) and compute the outputs for the new weights.

1

0.5250

0.4

-0.63

0.9

0.9689

1

0.8808

1.0

Example: 3-layer ANN for XOR (3)

  • Back-propagation of error (p = 1, output layer)

Let α = 0.1

0.8

0.5

1

-1.2067

-1.2

0.3

e= – 0.5097

0.4

y5 = 0.5097

0

0.9

1

δ =-0.1274

1.1

1.0888

1.0

Δwj,k(p) = α x yj(p) x δk(p)

-0.1

Δw4,5 (1) = 0.1 x 0.8808 x (-0.1274) = -0.0112

wj,k(p+1) = wj,k(p) + Δwj,k(p)

w4,5 (2) = 1.1 – 0.0112 = 1.0888

28


Example 3 layer ann for xor 4

0.5 (Activation) and compute the outputs for the new weights.

1

0.5250

0.4

-0.63

0.9

0.9689

1

0.8808

1.0

Example: 3-layer ANN for XOR (4)

  • Back-propagation of error (p = 1, output layer)

Let α = 0.1

Δθk(p) = α x y(p) x δk(p)

Δθ5 (1) = 0.1 x -1 x (-0.1274) = 0.0127

θ5 (p+1) = θ5 (p) + Δ θ5 (p)

θ5 (2) = 0.3 + 0.0127= 0.3127

0.8

0.5

1

-1.2067

0.3127

-1.2

0.3

e= – 0.5097

0.4

y5 = 0.5097

0

0.9

1

δ =-0.1274

1.1

1.0888

1.0

-0.1

29


Example 3 layer ann for xor 5

0.5 (Activation) and compute the outputs for the new weights.

1

0.5250

0.4

-0.63

0.9

0.9689

1

0.8808

1.0

Example: 3-layer ANN for XOR (5)

  • Back-propagation of error (p = 1, input layer)

Let α = 0.1

δj(p) = yi(p) x (1-yi (p)) x ∑ [αk(p) wj,k(p)], all k’s

Δwi,j(p) = α x xi(p) x δj(p)

δ3(p) = 0.525 x (1- 0.525) x (-0.1274 x -1.2)

= 0.0381

Δw1,3 (1) = 0.1 x 1 x 0.0381 = 0.00381

wi,j(p+1) = wi,j(p) + Δwi,j(p)

w1,3 (2) = 0.5 + 0.00381 = 0.5038

0.8

0.5038

0.5

1

-1.2067

0.3127

-1.2

0.3

e= – 0.5097

0.4

y5 = 0.5097

0

0.9

1

δ =-0.1274

1.1

1.0888

1.0

-0.1

30


Example 3 layer ann for xor 6

0.5 (Activation) and compute the outputs for the new weights.

1

0.5250

0.4

-0.63

0.9

0.9689

1

0.8808

1.0

Example: 3-layer ANN for XOR (6)

  • Back-propagation of error (p = 1, input layer)

Let α = 0.1

δj(p) = yi(p) x (1-yi (p)) x ∑ [αk(p) wj,k(p)], all k’s

Δwi,j(p) = α x xi(p) x δj(p)

δ4(p) = 0.8808 x (1- 0.8808) x (-0.1274 x 1.1)

= -0.0147

Δw1,4 (1) = 0.1 x 1 x -0.0147 = -0.0015

wi,j(p+1) = wi,j(p) + Δwi,j(p)

w1,4 (2) = 0.9 -0.0015 = 0.8985

0.8

0.5038

0.5

1

-1.2067

0.3127

-1.2

0.3

e= – 0.5097

0.4

y5 = 0.5097

0.8985

0

0.9

1

δ =-0.1274

1.1

1.0888

1.0

-0.1

31


Example 3 layer ann for xor 7

0.5 (Activation) and compute the outputs for the new weights.

1

0.5250

0.4

-0.63

0.9

0.9689

1

0.8808

1.0

Example: 3-layer ANN for XOR (7)

  • Back-propagation of error (p = 1, input layer)

Let α = 0.1

δ3(p) = 0.0381δ4(p) = -0.0147

Δwi,j(p) = α x xi(p) x δj(p)

Δw2,3 (1) = 0.1 x 1 x 0.0381 = 0.0038

wi,j(p+1) = wi,j(p) + Δwi,j(p)

w2,3 (2) = 0.4 + 0.0038 = 0.4038

0.8

0.5038

0.5

1

-1.2067

0.3127

-1.2

0.3

e= – 0.5097

0.4

0.4038

y5 = 0.5097

0.8985

0

0.9

1

δ =-0.1274

1.1

1.0888

1.0

-0.1

32


Example 3 layer ann for xor 8

0.5 (Activation) and compute the outputs for the new weights.

1

0.5250

0.4

-0.63

0.9

0.9689

1

0.8808

1.0

Example: 3-layer ANN for XOR (8)

  • Back-propagation of error (p = 1, input layer)

Let α = 0.1

δ3(p) = 0.0381δ4(p) = -0.0147

Δwi,j(p) = α x xi(p) x δj(p)

Δw2,4 (1) = 0.1 x 1 x -0.0147 = -0.0015

wi,j(p+1) = wi,j(p) + Δwi,j(p)

w2,4 (2) = 1 – 0.0015 = 0.9985

0.8

0.5038

0.5

1

-1.2067

0.3127

-1.2

0.3

e= – 0.5097

0.4

0.4038

y5 = 0.5097

0.8985

0

0.9

1

δ =-0.1274

1.1

1.0888

1.0

-0.1

0.9985

33


Example 3 layer ann for xor 9

0.5 (Activation) and compute the outputs for the new weights.

1

0.5250

0.4

-0.63

0.9

0.9689

1

0.8808

1.0

Example: 3-layer ANN for XOR (9)

  • Back-propagation of error (p = 1, input layer)

Let α = 0.1

δ3(p) = 0.0381δ4(p) = -0.0147

Δθk(p) = α x y(p) x δk(p)

Δθ3 (1) = 0.1 x -1 x 0.0381 = -0.0038

θ3 (p+1) = θ3 (p) + Δ θ3 (p)

θ3 (2) = 0.8 - 0.0038 = 0.7962

0.7962

0.8

0.5038

0.5

1

-1.2067

0.3127

-1.2

0.3

e= – 0.5097

0.4

0.4038

y5 = 0.5097

0.8985

0

0.9

1

δ =-0.1274

1.1

1.0888

1.0

-0.1

0.9985

34


Example 3 layer ann for xor 10

0.5 (Activation) and compute the outputs for the new weights.

1

0.5250

0.4

-0.63

0.9

0.9689

1

0.8808

1.0

Example: 3-layer ANN for XOR (10)

  • Back-propagation of error (p = 1, input layer)

Let α = 0.1

δ3(p) = 0.0381δ4(p) = -0.0147

Δθk(p) = α x y(p) x δk(p)

Δθ4 (1) = 0.1 x -1 x (-0.0147) = 0.0015

θ4 (p+1) = θ4 (p) + Δ θ4 (p)

θ4 (2) = -0.1 + 0.0015 = -0.0985

0.7962

0.8

0.5038

0.5

1

-1.2067

0.3127

-1.2

0.3

e= – 0.5097

0.4

0.4038

y5 = 0.5097

0.8985

0

0.9

1

δ =-0.1274

1.1

1.0888

1.0

-0.1

0.9985

-0.0985

35


Example 3 layer ann for xor 91
Example: 3-layer ANN for (Activation) and compute the outputs for the new weights. XOR (9)

α = 0.1

0.7962

0.8

0.5038

0.5

-1.2067

0.3127

-1.2

0.3

0.4

0.4038

0.8985

0.9

1.1

1.0888

1.0

-0.1

0.9985

Now the 1st iteration (p = 1) is finished. Weight training process is repeated until the sum of squared errors is less than 0.001 (threshold).

36

-0.0985


Learning curve for xor
Learning Curve for XOR (Activation) and compute the outputs for the new weights.

The curve shows ANN learning speed.

224 epochs or 896 iterations were required.


Final results
Final Results (Activation) and compute the outputs for the new weights.

7.3

0.8

4.7

0.5

-10.4

4.6

-1.2

0.3

0.4

4.8

6.4

0.9

Training again with different initial values may result differently. It works so long as the sum of squared errors is below the preset error threshold.

1.1

9.8

1.0

-0.1

6.4

38

2.8


Final results1
Final Results (Activation) and compute the outputs for the new weights.

Different result possible for different initial.

But the result always satisfies the criterion.


Mcculloch pitts model xor op
McCulloch-Pitts Model: XOR Op. (Activation) and compute the outputs for the new weights.

Activation function: sign function


Decision boundary
Decision Boundary (Activation) and compute the outputs for the new weights.

(a) Decision boundary constructed by hidden neuron 3;

(b) Decision boundary constructed by hidden neuron 4; (c) Decision boundaries constructed by the complete

three-layer network


Problems of back propagation
Problems of Back-Propagation (Activation) and compute the outputs for the new weights.

  • Not similar to the process of a biological neuron

  • Heavy computing load


Accelerated learning in multi layer nn 1
Accelerated Learning in (Activation) and compute the outputs for the new weights. Multi-layer NN (1)

  • Represent sigmoid function by hyperbolic tangent:

    where a and b are constants.

    Suitable values: a = 1.716 and b = 0.667


Accelerated learning in multi layer nn 2
Accelerated Learning in Multi-layer NN (2) (Activation) and compute the outputs for the new weights.

  • Include a momentum term in the delta rule

    where  is a positive number (0  1) called the

    momentum constant. Typically, the momentum constant is set to 0.95.

    This equation is called the generalized delta rule.


Learning with momentum
Learning with Momentum (Activation) and compute the outputs for the new weights.

Reduced from 224 to 126 epochs


Accelerated learning in multi layer nn 3
Accelerated Learning in Multi-layer NN (3) (Activation) and compute the outputs for the new weights.

  • Adaptive learning rate: Idea

    • small  smooth learning curve

    • large  fast learning, possibly instable

  • Heuristic rule:

    • increase learning rate when the change of the sum of squared errors has the same algebraic sign for several consequent epochs.

    • decrease learning rate when the sign alternates for several consequent epochs


Effect of adaptive learning rate
Effect of Adaptive Learning Rate (Activation) and compute the outputs for the new weights.


Momentum adaptive learning rate
Momentum + Adaptive Learning Rate (Activation) and compute the outputs for the new weights.


The hopfield network
The Hopfield Network (Activation) and compute the outputs for the new weights.

  • Neural networks were designed on an analogy with the brain, which has associative memory.

  • We can recognize a familiar face in an unfamiliar environment.  Our brain can recognize certain patterns even though some information about the patterns differ from what we have remembered.

  • Multilayer ANNs are not intrinsically intelligent.

  • Recurrent Neural Networks (RNNs) are used to emulate human’s associative memory.

  • Hopfield network is a RNN.


The hopfield network goal
The Hopfield Network: Goal (Activation) and compute the outputs for the new weights.

  • To recognize a pattern even if some parts are not the same as what it was trained to remember.

  • The Hopfield network is a single-layer network.

  • It is recurrent. The network outputs are calculated and then fed back to adjust the inputs. The process continues until the outputs become constant.

  • Let’s see how it works.


Single layer n neuron hopfield network

I n p u t S i g n a l s (Activation) and compute the outputs for the new weights.

O u t p u t S i g n a l s

Single-layer n-neuron Hopfield Network


Activation function
Activation Function (Activation) and compute the outputs for the new weights.

  • If the neuron’s weighted input is greater than zero, the output is +1.

  • If the neuron’s weighted input is less than zero, the output is -1.

  • If the neuron’s weighted input is zero, the output remains in its previous state.


Hopfield network current state
Hopfield Network Current State (Activation) and compute the outputs for the new weights.

  • The current state of the network is determined by the current outputs, i.e. the state vector.


What can it recognize
What can it recognize? (Activation) and compute the outputs for the new weights.

  • n = the number of inputs = n

  • Each input can be +1 or -1

  • There are 2n possible sets of input/output, i.e. patterns.

  • M = total number of patterns that the network was trained with, i.e. the total number of patterns that we want the network to be able to recognize


Example n 3 2 3 8 possible states
Example: n = 3, 2 (Activation) and compute the outputs for the new weights. 3 = 8 possible states


Weights
Weights (Activation) and compute the outputs for the new weights.

  • Weights between neurons are usually represented in matrix form

  • For example, let’s train the 3D network to recognize the following 2 patterns (M = 2, n = 3)

  • Once the weights are calculated, they remained fixed.

-

é

ù

é

1

ù

1

ê

ú

ê

ú

=

-

Y

1

=

Y

1

ê

ú

2

ê

ú

1

ê

ú

-

1

ê

ú

ë

û

1

ë

û


Weights 2

- (Activation) and compute the outputs for the new weights.

é

ù

é

1

ù

1

T

ê

ú

T

ê

ú

=

=

-

-

-

Y

1

1

1

Y

1

1

1

=

-

Y

1

=

Y

1

1

2

ê

ú

2

ê

ú

1

ê

ú

-

1

ê

ú

ë

û

1

ë

û

Weights (2)

é

ù

1

0

0

ê

ú

=

I

0

1

0

  • M = 2

  • Thus we can determine the weight matrix as follows

ê

ú

ê

ú

0

0

1

ë

û

-

é

ù

é

ù

é

ù

é

ù

1

1

1

0

0

0

2

2

ê

ú

ê

ú

ê

ú

ê

ú

=

+

-

-

-

-

-

=

W

1

1

1

1

1

1

1

1

2

0

1

0

2

0

2

ê

ú

ê

ú

ê

ú

ê

ú

ê

ú

ê

ú

ê

ú

ê

ú

-

1

1

0

0

1

2

2

0

ë

û

ë

û

ë

û

ë

û


How is the hopfield network tested
How is the Hopfield network tested? (Activation) and compute the outputs for the new weights.

  • Given an input vector X, we calculate the output in a similar manner that we have seen before.

Ym = sign(W Xm – θ), m = 1, 2, …, M

Θis the threshold matrix

In this case all thresholds are set to zero.


Stable states
Stable States (Activation) and compute the outputs for the new weights.

  • As we see, Y1 = X1 and Y2 = X2. Thus both states are said to be stable (also called fundamental states).


Unstable states
Unstable States (Activation) and compute the outputs for the new weights.

  • With 3 neurons in the network, there are 8 possible states. The remaining 6 states are unstable.


Error correction network
Error Correction Network (Activation) and compute the outputs for the new weights.

  • Each of the unstable states represents a single error, compared to the fundamental memory.

  • The Hopfield network can act as an error correction network.


The hopfield network1
The Hopfield Network (Activation) and compute the outputs for the new weights.

  • The Hopfield network can store a set of fundamental memories.

  • The Hopfield network can recall those fundamental memories when presented with inputs that maybe exactly those memories or slightly different.

  • However, it may not always recall correctly.

  • Let’s see an example.


Ex when hopfield network cannot recall
Ex: When Hopfield Network cannot recall (Activation) and compute the outputs for the new weights.

  • X1 = (+1, +1, +1, +1, +1)

  • X2 = (+1, -1, +1, -1, +1)

  • X3 = (-1, +1, -1, +1, -1)

  • Let the probe vector be

    X = (+1, +1, -1, +1, +1)

    It is very similar to X1, but the network recalls it as X3.

  • This is a problem with the Hopfield Network


Storage capacity of the hopfield network
Storage capacity of the Hopfield Network (Activation) and compute the outputs for the new weights.

  • Storage capacity is the largest number of fundamental memories that can be stored and retrieved correctly.

  • The maximum number of fundamental memories Mmax that can be stored in the n-neuron recurrent network is limited by


ad