One-layer neural networks Approximation problems

One-layer neural networksApproximation problems • Approximation problems • Architecture and functioning (ADALINE, MADALINE) • Learning based on error minimization • The gradient algorithm • Widrow-Hoff and “delta” algorithms Neural Networks - lecture 4

Approximation problems • Approximation (regression): • Problem: estimate a functional dependence between two variables • The training set contains pairs of corresponding values Linear approximation Nonlinear approximation Neural Networks - lecture 4

Architecture • One layer NN = one layer of input units and one layer of functional units Fictive unit -1 W Y X Total connectivity Output vector Input vector N input units M functional units (output units Neural Networks - lecture 4

Functioning • Computing the output signal: • Usually the activation function is linear • Examples: • ADALINE (ADAptive LINear Element) • MADALINE (Multiple ADAptive LINear Element) Neural Networks - lecture 4

Learning based on error minimization Training set: {(X1,d1),…,(XL,dL)}, Xl - vector from RN, dl – vector from RM Error function: measure of the “distance between the output produced by the network and the desired output Notations: Neural Networks - lecture 4

Learning based on error minimization Learning = optimization task = find W which minimizes E(W) Variants: • In the case of linear activation functions W can be computed by using tools from linear algebra • In the case of nonlinear functions the minimum can be estimated by using a numerical method Neural Networks - lecture 4

Learning based on error minimization First variant. Particular case: M=1 (one output unit with linear activation function) L=1 (one example) Neural Networks - lecture 4

Learning based on error minimization First variant: Neural Networks - lecture 4

Learning based on error minimization Second variant: use of a numerical minimization method Gradient method: • Is an iterative method based on the idea that the gradient of a function indicates the direction on which the function is increasing • In order to estimate the minimum of a function the current position is moved in the opposite direction of the gradient Neural Networks - lecture 4

Learning based on error minimization Direction opposite to the gradient Direction opposite to the gradient Gradient method: f’(x)<0 f’(x)>0 xk-1 x1 x0 Neural Networks - lecture 4

Learning based on error minimization Algorithm to minimize E(W) based on the gradient method: • Initialization: W(0):=initial values, k:=0 (iteration counter) • Iterative process REPEAT W(k+1)=W(k)-eta*grad(E(W(k))) k:=k+1 UNTIL a stopping condition is satisfied Neural Networks - lecture 4

Learning based on error minimization Remark: the gradient method is a local optimization method = it can be easily trapped in local minima Neural Networks - lecture 4

Widrow-Hoff algorithm = learning algorithm for a linear network = it minimizes E(W) by applying a gradient-like adjustment for each example from the training set Gradient computation: Neural Networks - lecture 4

Widrow-Hoff algorithm Algorithm’s structure: • Initialization: wij(0):=rand(-1,1) (the weights are randomly initialized in [-1,1]), k:=0 (iteration counter) • Iterative process REPEAT FOR l:=1,L DO Compute yi(l) and deltai(l)=di(l)-yi(l), i=1,M Adjust the weights: wij:=wij+eta*deltai(l)*xj(l) Compute the E(W) for the new values of the weights k:=k+1 UNTIL E(W)<E* OR k>kmax Neural Networks - lecture 4

Widrow-Hoff algorithm Remarks: • If the error function has only one optimum the algorithm converges (but not in a finite number of steps) to the optimal values of W • The convergence speed is influenced by the value of the learning rate (eta) • The value E* is a measure of the accuracy we expect to obtain • Is one of the simplest learning algorithms but it can by applied only for one-layer networks with linear activation functions Neural Networks - lecture 4

Delta algorithm = algorithm similar with Widrow-Hoff but for networks with nonlinear activation functions = the only difference is in the gradient computation Gradient computation: Neural Networks - lecture 4

Delta algorithm Particularities: 1. The error function can have many minima, thus the algorithm can be trapped in one of these (meaning that the learning is not complete) 2. For sigmoidal functions the derivates can be computed in an efficient way by using the following relations Neural Networks - lecture 4

Limits of one-layer networks The one layer networks have limited capability being able only to: • Solve simple (e.g. linearly separable) classification problems • Approximate simple (e.g. linear) dependences Solution: include hidden layers Remark: the hidden units should have nonlinear activation functions Neural Networks - lecture 4

One-layer neural networks Approximation problems

One-layer neural networks Approximation problems

Presentation Transcript

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks ( Multi-Layer Perceptrons )

Neural Networks

Neural Networks

Neural Networks

Neural networks

Feedforward Neural Networks. Classification and Approximation

Neural Networks

Approximation Problems

Feedforward Neural Networks. Classification and Approximation

Neural Networks

Neural Networks

Neural Networks

Approximation Problems

Learning in the brain and in one-layer neural networks