Neural Networks

Neural Networks and

Pattern Recognition

Giansalvo EXIN Cirrincione Sabine Van Huffel

unit #5

The non-linear function of many variables is represented in terms of compositions of non-linear functions of a single variable, called activation functions. The multi-layer perceptron Feed-forward network mappings Feed-forward neural networks provide a general framework for representing non-linear functional mappings.

two layers generalized linear discriminant function with adaptive basis functions The multi-layer perceptron Feed-forward network mappings Layered networks

The multi-layer perceptron Feed-forward network mappings six layers

The weights have the value 1.0 unless indicated otherwise The multi-layer perceptron Feed-forward network mappings Hinton diagram The size of a square is proportional to the magnitude of the corresponding parameter and the square is black or white according to whether the parameter is positive or negative.

It is possible to attach successive numbers to the inputs and to all of the hidden and output units such that each unit only receives connections from inputs or units having a smaller number. The multi-layer perceptron Feed-forward network mappings The outputs can be expressed as deterministic functions of the inputs General topologies

2d possible patterns labelled 0 or 1 Threshold units A two-layer network can generate any Boolean function provided the number M of hidden units is sufficiently large. Binary inputs

number of non-zero inputs for the pattern +1 -1 -1 +1 1 0 0 1 Label +1 Each hidden unit acts as a template for the corresponding input pattern and only generates an output when the input pattern matches the template pattern Threshold units -1 +1 +1 1-b Binary inputs no generalization

Possible decision boundaries single convex region AND • M hidden units • output bias = - M Relaxing this, more general decision boundaries can be constructed Continuous inputs

Possible decision boundaries hidden unit activation transitions from 0 to 1 2 3 4 hyperplanes corresponding to hidden units The second-layer weights are all set to 1 and so the numbers represent the value of the linear sum presented to the output unit Continuous inputs

3 2 3 2 4 3 4 3 5 4 5 4 Possible decision boundaries 2 output unit bias : -3.5 3 4 Continuous inputs non-convex decision boundary

3 4 3 4 3 4 5 4 5 4 3 4 3 4 3 Possible decision boundaries output unit bias : -4.5 Continuous inputs disjoint decision region

IMPOSSIBLE decision boundaries: an example However, any given decision boundary can be approximated arbitrarily closely by a two-layer network having sigmoidal activation functions. Continuous inputs

Possible decision boundaries Arbitrary decision region Continuous inputs

OR bias = -1 groups of 2d units AND hyperplane aligned with one side of the hypercube one group is assigned to each hypercube corresponding to C1 Possible decision boundaries divide the input space into a fine grid of hypercubes Continuous inputs

Possible decision boundaries CONCLUSION Feed-forward neural networks with threshold units can generate arbitrarily complex decision boundaries. Problem: classify a dichotomy For N data points in general position in d-dimensional space, a network with N/d hidden units in a single hidden layer can separate them correctly into two classes. Continuous inputs

Sigmoidal units linear transformation A neural network using tanh activation functions is equivalent to one using logistic activation functions but having different values for the weights and biases. Empirically, tanh activation functions often give rise to faster convergence of training algorithms than logistic functions.

A sigmoidal hidden unit can approximate a linear hidden unit arbitrarily accurately A sigmoidal hidden unit can approximate a step function arbitrarily accurately Sigmoidal units linear output units

combination of two logistic sigmoids having the same orientation, but slightly displaced RBF combination of two (2d) ridges with orthogonal orientations second-layer unit sigmoid Three-layer networks They approximate, to arbitrary accuracy, any smoothing mapping.

Sigmoidal units They approximate arbitrarily well any functional continuous mapping They approximate arbitrarily well any decision boundary They approximate arbitrarily well both a function and its derivative two-layer networks

Sigmoidal units • 1-5-1 • BFGS two-layer networks

Generalized Mapping Regressor GMR Pollock, Convergence 10

tanh sign flips interchange # hidden layers 2M M ! Weight-space symmetries

Error back-propagation Credit assignment problem • Hessian matrix evaluation • Jacobian evaluation • several error functions • several kinds of networks back-propagation e.g. gradient descent

activation or input activation or output Error back-propagation • arbitrary feed-forward topology • arbitrary differentiable non-linear activation function • arbitrary differentiable error function

= 1 for bias Error back-propagation First step: forward propagation

hidden or output unit Error back-propagation  computation hidden unit output unit

two layers on-line learning batch learning Error back-propagation example

Error back-propagation

Homework 1 Show, for a feedforward network with tanh hidden unit activation functions and a sum of squares error function, that the origin in weight space is a stationary point of the error function.

Homework 2 Let W the total number of weights and biases. Show that, for each input pattern, the cost of backpropagation for the evaluation of all the derivatives is O(W) (if the derivatives are evaluated numerically by forward propagation, the total cost is O(W2)).

# hidden and outputs nodes Numerical differentiation finite differences perturb each weight in turn O(W2) symmetrical central finite differences O(W2) BP correctness check node perturbation O(MW)

all other inputs held fixed The Jacobian matrix It provides a measure of the local sensitivity of the outputs to changes in each of the input variables. It is valid only for small perturbations of the inputs and the Jacobian must be re-evaluated for each new input vector. forward propagation

The Jacobian matrix

for row 1 of the Jacobian The Jacobian matrix

at least O(W2) 1. Several non-linear optimization algorithms used for training neural networks are based on the second-order properties of the error surface. 2. The Hessian forms the basis of a fast procedure for training a feed-fw. network following a small change in the training data. 3. The inverse Hessian is used to identify the least significant weights in a network as part of a pruning algorithm. 4. The inverse Hessian is used to assign error bars to the predictions made by a trained network.

diagonal element of the Hessian neglect off-diagonal elements Approx. of the Hessian diagonal The inverse of a diagonal matrix is trivial to compute. O(W)

uncorrelated zero mean rv Regression problems straightforward extension Levenberg-Marquardt approximation (outer product approximation) O(W2)

sequential Sherman-Morrison-Woodbury formula

four forward propagations O( W 3 ) BP BP check O( W 2 )

arbitrary feed-forward topology • arbitrary differentiable activation function • arbitrary differentiable error function O( W 2 ) wij does not occur on any forward propagation path connecting unit l to the outputs of the network

units sending connections to unit k Initial conditions: for each unit j (except for input units) set hjj = 1 and set hkj = 0  k  j (units which do not lie on any forward propagation path starting from unit j). forward propagation

sum-of-squares error and linear output units Initial conditions all units to which unit l sends connections back propagation

ALGORITHM 1. Evaluate the activations of all of the hidden and output unit, for a given input pattern, by forward propagation. Similarly, compute the initial conditions for the hkj and forward propagate through the network to find the remaining non-zero elements of hkj . 2. Evaluate k for the output units and, similarly, evaluate the Hkk’ forall the output units. 3. Use BP to find j for all hidden units. Similarly, back propagate to find the {blj } by using the given initial conditions. 4. Evaluate the elements of the Hessian for this input pattern. 5. Repeat the above steps for each pattern in the TS and then sum to obtain the full Hessian.

both weights in the second layer both weights in the first layer one weight in each layer Exact Hessian for two-layer network • Legenda • indices i and i’ denote inputs • indices j and j’ denote hidden units • indices k and k’ denote outputs

homework

Neural Networks

Neural Networks

Presentation Transcript

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural networks

Neural Networks

NEURAL NETWORKS

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks

Neural Networks