- By
**jaden** - Follow User

- 288 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'one-layer neural networks approximation problems' - jaden

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

One-layer neural networksApproximation problems

- Approximation problems
- Architecture and functioning (ADALINE, MADALINE)
- Learning based on error minimization
- The gradient algorithm
- Widrow-Hoff and “delta” algorithms

Neural Networks - lecture 4

Approximation problems

- Approximation (regression):
- Problem: estimate a functional dependence between two variables
- The training set contains pairs of corresponding values

Linear approximation

Nonlinear approximation

Neural Networks - lecture 4

Architecture

- One layer NN = one layer of input units and one layer of functional units

Fictive unit

-1

W

Y

X

Total

connectivity

Output

vector

Input

vector

N

input

units

M functional

units (output units

Neural Networks - lecture 4

Functioning

- Computing the output signal:

- Usually the activation function is linear
- Examples:
- ADALINE (ADAptive LINear Element)
- MADALINE (Multiple ADAptive LINear Element)

Neural Networks - lecture 4

Learning based on error minimization

Training set: {(X1,d1),…,(XL,dL)},

Xl - vector from RN, dl – vector from RM

Error function: measure of the “distance between the output produced by the network and the desired output

Notations:

Neural Networks - lecture 4

Learning based on error minimization

Learning = optimization task

= find W which minimizes E(W)

Variants:

- In the case of linear activation functions W can be computed by using tools from linear algebra
- In the case of nonlinear functions the minimum can be estimated by using a numerical method

Neural Networks - lecture 4

Learning based on error minimization

First variant. Particular case:

M=1 (one output unit with linear activation function)

L=1 (one example)

Neural Networks - lecture 4

Learning based on error minimization

Second variant: use of a numerical minimization method

Gradient method:

- Is an iterative method based on the idea that the gradient of a function indicates the direction on which the function is increasing
- In order to estimate the minimum of a function the current position is moved in the opposite direction of the gradient

Neural Networks - lecture 4

Learning based on error minimization

Direction opposite to

the gradient

Direction opposite to

the gradient

Gradient method:

f’(x)<0

f’(x)>0

xk-1

x1

x0

Neural Networks - lecture 4

Learning based on error minimization

Algorithm to minimize E(W) based on the gradient method:

- Initialization:

W(0):=initial values,

k:=0 (iteration counter)

- Iterative process

REPEAT

W(k+1)=W(k)-eta*grad(E(W(k)))

k:=k+1

UNTIL a stopping condition is satisfied

Neural Networks - lecture 4

Learning based on error minimization

Remark: the gradient method is a local optimization method = it can be easily trapped in local minima

Neural Networks - lecture 4

Widrow-Hoff algorithm

= learning algorithm for a linear network

= it minimizes E(W) by applying a gradient-like adjustment for each example from the training set

Gradient computation:

Neural Networks - lecture 4

Widrow-Hoff algorithm

Algorithm’s structure:

- Initialization:

wij(0):=rand(-1,1) (the weights are randomly initialized in [-1,1]),

k:=0 (iteration counter)

- Iterative process

REPEAT

FOR l:=1,L DO

Compute yi(l) and deltai(l)=di(l)-yi(l), i=1,M

Adjust the weights: wij:=wij+eta*deltai(l)*xj(l)

Compute the E(W) for the new values of the weights

k:=k+1

UNTIL E(W)<E* OR k>kmax

Neural Networks - lecture 4

Widrow-Hoff algorithm

Remarks:

- If the error function has only one optimum the algorithm converges (but not in a finite number of steps) to the optimal values of W
- The convergence speed is influenced by the value of the learning rate (eta)
- The value E* is a measure of the accuracy we expect to obtain
- Is one of the simplest learning algorithms but it can by applied only for one-layer networks with linear activation functions

Neural Networks - lecture 4

Delta algorithm

= algorithm similar with Widrow-Hoff but for networks with nonlinear activation functions

= the only difference is in the gradient computation

Gradient computation:

Neural Networks - lecture 4

Delta algorithm

Particularities:

1. The error function can have many minima, thus the algorithm can be trapped in one of these (meaning that the learning is not complete)

2. For sigmoidal functions the derivates can be computed in an efficient way by using the following relations

Neural Networks - lecture 4

Limits of one-layer networks

The one layer networks have limited capability being able only to:

- Solve simple (e.g. linearly separable) classification problems
- Approximate simple (e.g. linear) dependences

Solution: include hidden layers

Remark: the hidden units should have nonlinear activation functions

Neural Networks - lecture 4

Download Presentation

Connecting to Server..