IE 585

IE 585 Backpropagation Networks

BP Basics • supervised • can use multiple layers of “hidden” neurons • most common training procedure • generally used fully connected perceptron architecture • also called the generalized delta rule, error backpropagation or back error propagation • The networks that get trained like this are sometimes called Multilayer Perceptrons (MLPs) • good for continuous and binary inputs and outputs • theoretic universal approximator for non linear transfer functions

More BP Basics • learns through gradient descent down the error surface • error signal (generally squared error) is propagated back through the weights to adjust them in the direction of greatest error decrease • must have continuous, differentiable transfer function (NO step functions!) • iterative training that usually requires many passes (epochs) through the training set • generally uses sigmoid transfer function and normalizes input/output from 0 to 1

Origins • Extension of LMS (Widrow and Hoff) • Difficulty - how to adjust weights in the hidden layer? • Paul Werbos, Ph.D. dissertation, Harvard, 1974 • Parker, Internal Report, MIT, 1985 • Popularized by McClelland and Rumelhart (PDP Group), 1986

What is a universal approximator? • can theoretically approximate any relationship to any given degree of precision • in practice, this capability is limited by: • finite and imperfect samples • imperfect network training • missing input (independent) variables

Advantages of BP • works in a wide variety of applications • good theoretic approximation properties • very reliable in both training and operation (testing) • fairly straightforward to understand • lots of software available

Applications of BP • Function approximation • Pattern Classification

Drawbacks of BP • training can be very slow - requiring a large number of iterations • training can stall in a local minimum or saddle point area • must choose number of hidden neurons and number of hidden layers (practically, this is 1 or 2) • can be sensitive to both overparameterization (overfitting) and overtraining

Training Idea - Gradient Descent saddle point e r r o r learning rate, step size gradient local minimum global minimum weight values

Goal of BP • Balance the ability to respond correctly to the input patterns that are used for training (memorization) and the ability to give reasonable (good) responses to input that is similar, but not identical, to that used in training (generalization).

Generalization

Overview of Training • select architecture, transfer function, learning rate, momentum rates • randomize weights to small +/- values • randomize order of training set, normalize • present a training pattern • calculate output error and propagate back through output weight matrix • propagate back through hidden layer(s) weight matrix • repeat until the stopping criterion is reached

Possible Stopping Criteria • Total number of epochs • Total squared error less than a threshold • Weights are stable (∆w’s are small)

Notation x t=target output y Output Layer wweight matrix z Hidden Layer vweight matrix Input Layer

Sigmoid Transfer Function Binary Transfer Function: y=1/(1+exp(-(wx))) Bipolar Transfer Function: y= (1-exp(-(wx))) / (1+exp(-(wx))) y 1 0 (wx) y 1 (wx) -1

Recall LMS Rule

Derivation of BP Algorithm

w’s 1 2 v . . . x z k y

Momentum Momentum smooths descent down the error surface and helps prevent weights from getting “stuck”.

BP Example

Other BP Variations • Learning rate - dynamic • Weight updating - batch (1/epoch), continuous (1/training vector) • Use 2nd derivative info (Hessian matrix) • Change gradient descent - genetic algorithms or other optimization methods • Training vector presentation - random, weighted • Pruning unneeded connections

More BP Alterations • Fully connected vs partially connected (“articulated”) • Connections spanning more than 1 layer • Functional links - input functions of independent variables • Hierarchies of nets - nets feeding nets • Committee nets - multiple nets working together either for consensus or by partitioning the domain space to act as experts

Main BP Issues • Choosing number of hidden neurons • Overtraining and overfitting - poor generalization • Covering domain entirely and equally - training set adequacy • Validation - testing set adequacy • Identifying redundant and/or misleading inputs • Black box aspect

Overfitting

Overtraining e r r o r testing set training set epochs

BP Web Sites • http://www.shef.ac.uk/psychology/gurney/notes/l4/l4.html- some basic notes with drawings • http://neuron.eng.wayne.edu/- java applets, cool, interactive site • http://www2.psy.uq.edu.au/~brainwav/Manual/BackProp.html- more notes and drawings

IE 585

IE 585

Presentation Transcript

Com 585

CHM 585 / 490

IE

CHM 585 /490

EDTEC 585

CHM 585 / 490

CHM 585

CHM 585 / 490

CHM 585 Chapter 22

\ ie \\

Ceng 585 Paper Presentation

CHM 585 / 490

CHM 585/490

CHM 490 / 585

CHM 585 / 490

585

MA 485/585

585

CHM 585 / 490

IE 585

INTP 585

IE 585