1 / 53

Neural Networks

Neural Networks. and. Pattern Recognition. Giansalvo EXIN Cirrincione. Sabine Van Huffel. unit #5. The non-linear function of many variables is represented in terms of compositions of non-linear functions of a single variable, called activation functions. The multi-layer perceptron.

yael-kirby
Download Presentation

Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neural Networks and

  2. Pattern Recognition

  3. Giansalvo EXIN Cirrincione Sabine Van Huffel

  4. unit #5

  5. The non-linear function of many variables is represented in terms of compositions of non-linear functions of a single variable, called activation functions. The multi-layer perceptron Feed-forward network mappings Feed-forward neural networks provide a general framework for representing non-linear functional mappings.

  6. two layers generalized linear discriminant function with adaptive basis functions The multi-layer perceptron Feed-forward network mappings Layered networks

  7. The multi-layer perceptron Feed-forward network mappings six layers

  8. The weights have the value 1.0 unless indicated otherwise The multi-layer perceptron Feed-forward network mappings Hinton diagram The size of a square is proportional to the magnitude of the corresponding parameter and the square is black or white according to whether the parameter is positive or negative.

  9. It is possible to attach successive numbers to the inputs and to all of the hidden and output units such that each unit only receives connections from inputs or units having a smaller number. The multi-layer perceptron Feed-forward network mappings The outputs can be expressed as deterministic functions of the inputs General topologies

  10. 2d possible patterns labelled 0 or 1 Threshold units A two-layer network can generate any Boolean function provided the number M of hidden units is sufficiently large. Binary inputs

  11. number of non-zero inputs for the pattern +1 -1 -1 +1 1 0 0 1 Label +1 Each hidden unit acts as a template for the corresponding input pattern and only generates an output when the input pattern matches the template pattern Threshold units -1 +1 +1 1-b Binary inputs no generalization

  12. Possible decision boundaries single convex region AND • M hidden units • output bias = - M Relaxing this, more general decision boundaries can be constructed Continuous inputs

  13. Possible decision boundaries hidden unit activation transitions from 0 to 1 2 3 4 hyperplanes corresponding to hidden units The second-layer weights are all set to 1 and so the numbers represent the value of the linear sum presented to the output unit Continuous inputs

  14. 3 2 3 2 4 3 4 3 5 4 5 4 Possible decision boundaries 2 output unit bias : -3.5 3 4 Continuous inputs non-convex decision boundary

  15. 3 4 3 4 3 4 5 4 5 4 3 4 3 4 3 Possible decision boundaries output unit bias : -4.5 Continuous inputs disjoint decision region

  16. IMPOSSIBLE decision boundaries: an example However, any given decision boundary can be approximated arbitrarily closely by a two-layer network having sigmoidal activation functions. Continuous inputs

  17. Possible decision boundaries Arbitrary decision region Continuous inputs

  18. OR bias = -1 groups of 2d units AND hyperplane aligned with one side of the hypercube one group is assigned to each hypercube corresponding to C1 Possible decision boundaries divide the input space into a fine grid of hypercubes Continuous inputs

  19. Possible decision boundaries CONCLUSION Feed-forward neural networks with threshold units can generate arbitrarily complex decision boundaries. Problem: classify a dichotomy For N data points in general position in d-dimensional space, a network with N/d hidden units in a single hidden layer can separate them correctly into two classes. Continuous inputs

  20. Sigmoidal units linear transformation A neural network using tanh activation functions is equivalent to one using logistic activation functions but having different values for the weights and biases. Empirically, tanh activation functions often give rise to faster convergence of training algorithms than logistic functions.

  21. A sigmoidal hidden unit can approximate a linear hidden unit arbitrarily accurately A sigmoidal hidden unit can approximate a step function arbitrarily accurately Sigmoidal units linear output units

  22. combination of two logistic sigmoids having the same orientation, but slightly displaced RBF combination of two (2d) ridges with orthogonal orientations second-layer unit sigmoid Three-layer networks They approximate, to arbitrary accuracy, any smoothing mapping.

  23. Sigmoidal units They approximate arbitrarily well any functional continuous mapping They approximate arbitrarily well any decision boundary They approximate arbitrarily well both a function and its derivative two-layer networks

  24. Sigmoidal units • 1-5-1 • BFGS two-layer networks

  25. Generalized Mapping Regressor GMR Pollock, Convergence 10

  26. tanh sign flips interchange # hidden layers 2M M ! Weight-space symmetries

  27. Error back-propagation Credit assignment problem • Hessian matrix evaluation • Jacobian evaluation • several error functions • several kinds of networks back-propagation e.g. gradient descent

  28. activation or input activation or output Error back-propagation • arbitrary feed-forward topology • arbitrary differentiable non-linear activation function • arbitrary differentiable error function

  29. = 1 for bias Error back-propagation First step: forward propagation

  30. hidden or output unit Error back-propagation  computation hidden unit output unit

  31. two layers on-line learning batch learning Error back-propagation example

  32. Error back-propagation

  33. Homework 1 Show, for a feedforward network with tanh hidden unit activation functions and a sum of squares error function, that the origin in weight space is a stationary point of the error function.

  34. Homework 2 Let W the total number of weights and biases. Show that, for each input pattern, the cost of backpropagation for the evaluation of all the derivatives is O(W) (if the derivatives are evaluated numerically by forward propagation, the total cost is O(W2)).

  35. # hidden and outputs nodes Numerical differentiation finite differences perturb each weight in turn O(W2) symmetrical central finite differences O(W2) BP correctness check node perturbation O(MW)

  36. all other inputs held fixed The Jacobian matrix It provides a measure of the local sensitivity of the outputs to changes in each of the input variables. It is valid only for small perturbations of the inputs and the Jacobian must be re-evaluated for each new input vector. forward propagation

  37. The Jacobian matrix

  38. The Jacobian matrix

  39. for row 1 of the Jacobian The Jacobian matrix

  40. at least O(W2) 1. Several non-linear optimization algorithms used for training neural networks are based on the second-order properties of the error surface. 2. The Hessian forms the basis of a fast procedure for training a feed-fw. network following a small change in the training data. 3. The inverse Hessian is used to identify the least significant weights in a network as part of a pruning algorithm. 4. The inverse Hessian is used to assign error bars to the predictions made by a trained network.

  41. diagonal element of the Hessian neglect off-diagonal elements Approx. of the Hessian diagonal The inverse of a diagonal matrix is trivial to compute. O(W)

  42. uncorrelated zero mean rv Regression problems straightforward extension Levenberg-Marquardt approximation (outer product approximation) O(W2)

  43. sequential Sherman-Morrison-Woodbury formula

  44. four forward propagations O( W 3 ) BP BP check O( W 2 )

  45. arbitrary feed-forward topology • arbitrary differentiable activation function • arbitrary differentiable error function O( W 2 ) wij does not occur on any forward propagation path connecting unit l to the outputs of the network

  46. units sending connections to unit k Initial conditions: for each unit j (except for input units) set hjj = 1 and set hkj = 0  k  j (units which do not lie on any forward propagation path starting from unit j). forward propagation

  47. sum-of-squares error and linear output units Initial conditions all units to which unit l sends connections back propagation

  48. ALGORITHM 1. Evaluate the activations of all of the hidden and output unit, for a given input pattern, by forward propagation. Similarly, compute the initial conditions for the hkj and forward propagate through the network to find the remaining non-zero elements of hkj . 2. Evaluate k for the output units and, similarly, evaluate the Hkk’ forall the output units. 3. Use BP to find j for all hidden units. Similarly, back propagate to find the {blj } by using the given initial conditions. 4. Evaluate the elements of the Hessian for this input pattern. 5. Repeat the above steps for each pattern in the TS and then sum to obtain the full Hessian.

  49. both weights in the second layer both weights in the first layer one weight in each layer Exact Hessian for two-layer network • Legenda • indices i and i’ denote inputs • indices j and j’ denote hidden units • indices k and k’ denote outputs

  50. homework

More Related