One layer neural networks approximation problems
Download
1 / 18

One-layer neural networks Approximation problems - PowerPoint PPT Presentation


  • 290 Views
  • Updated On :

One-layer neural networks Approximation problems. Approximation problems Architecture and functioning (ADALINE, MADALINE) Learning based on error minimization The gradient algorithm Widrow-Hoff and “delta” algorithms. Approximation problems. Approximation (regression):

Related searches for One-layer neural networks Approximation problems

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'One-layer neural networks Approximation problems' - jaden


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
One layer neural networks approximation problems l.jpg
One-layer neural networksApproximation problems

  • Approximation problems

  • Architecture and functioning (ADALINE, MADALINE)

  • Learning based on error minimization

  • The gradient algorithm

  • Widrow-Hoff and “delta” algorithms

Neural Networks - lecture 4


Approximation problems l.jpg
Approximation problems

  • Approximation (regression):

    • Problem: estimate a functional dependence between two variables

    • The training set contains pairs of corresponding values

Linear approximation

Nonlinear approximation

Neural Networks - lecture 4


Architecture l.jpg
Architecture

  • One layer NN = one layer of input units and one layer of functional units

Fictive unit

-1

W

Y

X

Total

connectivity

Output

vector

Input

vector

N

input

units

M functional

units (output units

Neural Networks - lecture 4


Functioning l.jpg
Functioning

  • Computing the output signal:

  • Usually the activation function is linear

  • Examples:

    • ADALINE (ADAptive LINear Element)

    • MADALINE (Multiple ADAptive LINear Element)

Neural Networks - lecture 4


Learning based on error minimization l.jpg
Learning based on error minimization

Training set: {(X1,d1),…,(XL,dL)},

Xl - vector from RN, dl – vector from RM

Error function: measure of the “distance between the output produced by the network and the desired output

Notations:

Neural Networks - lecture 4


Learning based on error minimization6 l.jpg
Learning based on error minimization

Learning = optimization task

= find W which minimizes E(W)

Variants:

  • In the case of linear activation functions W can be computed by using tools from linear algebra

  • In the case of nonlinear functions the minimum can be estimated by using a numerical method

Neural Networks - lecture 4


Learning based on error minimization7 l.jpg
Learning based on error minimization

First variant. Particular case:

M=1 (one output unit with linear activation function)

L=1 (one example)

Neural Networks - lecture 4


Learning based on error minimization8 l.jpg
Learning based on error minimization

First variant:

Neural Networks - lecture 4


Learning based on error minimization9 l.jpg
Learning based on error minimization

Second variant: use of a numerical minimization method

Gradient method:

  • Is an iterative method based on the idea that the gradient of a function indicates the direction on which the function is increasing

  • In order to estimate the minimum of a function the current position is moved in the opposite direction of the gradient

Neural Networks - lecture 4


Learning based on error minimization10 l.jpg
Learning based on error minimization

Direction opposite to

the gradient

Direction opposite to

the gradient

Gradient method:

f’(x)<0

f’(x)>0

xk-1

x1

x0

Neural Networks - lecture 4


Learning based on error minimization11 l.jpg
Learning based on error minimization

Algorithm to minimize E(W) based on the gradient method:

  • Initialization:

    W(0):=initial values,

    k:=0 (iteration counter)

  • Iterative process

    REPEAT

    W(k+1)=W(k)-eta*grad(E(W(k)))

    k:=k+1

    UNTIL a stopping condition is satisfied

Neural Networks - lecture 4


Learning based on error minimization12 l.jpg
Learning based on error minimization

Remark: the gradient method is a local optimization method = it can be easily trapped in local minima

Neural Networks - lecture 4


Widrow hoff algorithm l.jpg
Widrow-Hoff algorithm

= learning algorithm for a linear network

= it minimizes E(W) by applying a gradient-like adjustment for each example from the training set

Gradient computation:

Neural Networks - lecture 4


Widrow hoff algorithm14 l.jpg
Widrow-Hoff algorithm

Algorithm’s structure:

  • Initialization:

    wij(0):=rand(-1,1) (the weights are randomly initialized in [-1,1]),

    k:=0 (iteration counter)

  • Iterative process

    REPEAT

    FOR l:=1,L DO

    Compute yi(l) and deltai(l)=di(l)-yi(l), i=1,M

    Adjust the weights: wij:=wij+eta*deltai(l)*xj(l)

    Compute the E(W) for the new values of the weights

    k:=k+1

    UNTIL E(W)<E* OR k>kmax

Neural Networks - lecture 4


Widrow hoff algorithm15 l.jpg
Widrow-Hoff algorithm

Remarks:

  • If the error function has only one optimum the algorithm converges (but not in a finite number of steps) to the optimal values of W

  • The convergence speed is influenced by the value of the learning rate (eta)

  • The value E* is a measure of the accuracy we expect to obtain

  • Is one of the simplest learning algorithms but it can by applied only for one-layer networks with linear activation functions

Neural Networks - lecture 4


Delta algorithm l.jpg
Delta algorithm

= algorithm similar with Widrow-Hoff but for networks with nonlinear activation functions

= the only difference is in the gradient computation

Gradient computation:

Neural Networks - lecture 4


Delta algorithm17 l.jpg
Delta algorithm

Particularities:

1. The error function can have many minima, thus the algorithm can be trapped in one of these (meaning that the learning is not complete)

2. For sigmoidal functions the derivates can be computed in an efficient way by using the following relations

Neural Networks - lecture 4


Limits of one layer networks l.jpg
Limits of one-layer networks

The one layer networks have limited capability being able only to:

  • Solve simple (e.g. linearly separable) classification problems

  • Approximate simple (e.g. linear) dependences

    Solution: include hidden layers

    Remark: the hidden units should have nonlinear activation functions

Neural Networks - lecture 4