One-layer neural networks Approximation problems

1 / 18

# one-layer neural networks approximation problems - PowerPoint PPT Presentation

One-layer neural networks Approximation problems. Approximation problems Architecture and functioning (ADALINE, MADALINE) Learning based on error minimization The gradient algorithm Widrow-Hoff and “delta” algorithms. Approximation problems. Approximation (regression):

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
One-layer neural networksApproximation problems
• Approximation problems
• Learning based on error minimization
• Widrow-Hoff and “delta” algorithms

Neural Networks - lecture 4

Approximation problems
• Approximation (regression):
• Problem: estimate a functional dependence between two variables
• The training set contains pairs of corresponding values

Linear approximation

Nonlinear approximation

Neural Networks - lecture 4

Architecture
• One layer NN = one layer of input units and one layer of functional units

Fictive unit

-1

W

Y

X

Total

connectivity

Output

vector

Input

vector

N

input

units

M functional

units (output units

Neural Networks - lecture 4

Functioning
• Computing the output signal:
• Usually the activation function is linear
• Examples:

Neural Networks - lecture 4

Learning based on error minimization

Training set: {(X1,d1),…,(XL,dL)},

Xl - vector from RN, dl – vector from RM

Error function: measure of the “distance between the output produced by the network and the desired output

Notations:

Neural Networks - lecture 4

Learning based on error minimization

= find W which minimizes E(W)

Variants:

• In the case of linear activation functions W can be computed by using tools from linear algebra
• In the case of nonlinear functions the minimum can be estimated by using a numerical method

Neural Networks - lecture 4

Learning based on error minimization

First variant. Particular case:

M=1 (one output unit with linear activation function)

L=1 (one example)

Neural Networks - lecture 4

Learning based on error minimization

First variant:

Neural Networks - lecture 4

Learning based on error minimization

Second variant: use of a numerical minimization method

• Is an iterative method based on the idea that the gradient of a function indicates the direction on which the function is increasing
• In order to estimate the minimum of a function the current position is moved in the opposite direction of the gradient

Neural Networks - lecture 4

Learning based on error minimization

Direction opposite to

Direction opposite to

f’(x)<0

f’(x)>0

xk-1

x1

x0

Neural Networks - lecture 4

Learning based on error minimization

Algorithm to minimize E(W) based on the gradient method:

• Initialization:

W(0):=initial values,

k:=0 (iteration counter)

• Iterative process

REPEAT

k:=k+1

UNTIL a stopping condition is satisfied

Neural Networks - lecture 4

Learning based on error minimization

Remark: the gradient method is a local optimization method = it can be easily trapped in local minima

Neural Networks - lecture 4

Widrow-Hoff algorithm

= learning algorithm for a linear network

= it minimizes E(W) by applying a gradient-like adjustment for each example from the training set

Neural Networks - lecture 4

Widrow-Hoff algorithm

Algorithm’s structure:

• Initialization:

wij(0):=rand(-1,1) (the weights are randomly initialized in [-1,1]),

k:=0 (iteration counter)

• Iterative process

REPEAT

FOR l:=1,L DO

Compute yi(l) and deltai(l)=di(l)-yi(l), i=1,M

Compute the E(W) for the new values of the weights

k:=k+1

UNTIL E(W)<E* OR k>kmax

Neural Networks - lecture 4

Widrow-Hoff algorithm

Remarks:

• If the error function has only one optimum the algorithm converges (but not in a finite number of steps) to the optimal values of W
• The convergence speed is influenced by the value of the learning rate (eta)
• The value E* is a measure of the accuracy we expect to obtain
• Is one of the simplest learning algorithms but it can by applied only for one-layer networks with linear activation functions

Neural Networks - lecture 4

Delta algorithm

= algorithm similar with Widrow-Hoff but for networks with nonlinear activation functions

= the only difference is in the gradient computation

Neural Networks - lecture 4

Delta algorithm

Particularities:

1. The error function can have many minima, thus the algorithm can be trapped in one of these (meaning that the learning is not complete)

2. For sigmoidal functions the derivates can be computed in an efficient way by using the following relations

Neural Networks - lecture 4

Limits of one-layer networks

The one layer networks have limited capability being able only to:

• Solve simple (e.g. linearly separable) classification problems
• Approximate simple (e.g. linear) dependences

Solution: include hidden layers

Remark: the hidden units should have nonlinear activation functions

Neural Networks - lecture 4