Outline

Empirical studies on the online learning algorithms based on combining weight noise injection and weight decay Advisor: Dr. John SumStudent: Allen Liang

Outline • Introduction • Learning Algorithms • Experiments • Conclusion 2

Introduction

Background • Neural network (NN) is a network system composed of interconnected neurons. • Learning aims to make a NN achieving good generalization (small prediction error).

Fault tolerant is an unavoidable issue that must be considered in hardware implementation. • Multiplicative weight noise or additive weight noise. • Weight could be randomly breaking down. • Hidden node could be out of work (stuck-at-zero & stuck-at-one). • To have network still workable with graceful degradation in the presence of noise/faults.

Weight noise injection during training • Murray & Edwards (1993): Modify BPA by injecting weight noise during training for MLP • By simulation: convergence, fault tolerance • By theoretical analysis: effect of weight noise on the prediction error of a MLP • A.F. Murray and P.J. Edwards. Synaptic weight noise during multilayer perceptron training: fault tolerance and training improvements. IEEE Transactions on Neural Networks, Vol.4(4), 722-725, 1993. • A.F. Murray and P.J. Edwards. Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training. IEEE Transactions on Neural Networks, Vol.5(5), 792-802, 1994.

Weight noise injection during training (Cont.) • Jim, Giles, Horne (1996): Modify RTRL by injecting weight noise during training for RNN • By simulation: convergence and generalization • By theoretical analysis: effect of weight noise on the prediction error of a RNN • Jim K.C., C.L. Giles and B.G. Horne, An analysis of noise in recurrent neural networks: Convergence and generalization, IEEE Transactions on Neural Networks, Vol.7, 1424-1438, 1996.

Regularization • Bernier and co-workers (2000): Adding explicit regularizer to training MSE as the objective function to be minimized. • Online learning algorithm is developed by the idea of gradient descent • No noise is injected during training • J. L. Bernier, J. Ortega, I. Rojas, and A. Prieto, “Improving the tolerance of multilayer perceptrons by minimizing the statistical sensitivity to weight deviations,” Neurocomputing, vol.31, pp.87-103, Jan, 2000 • J. L. Bernier, J. Ortega, I. Rojas, E. Ros, and A. Prieto, “Obtaining fault tolerance multilayer perceptrons using an explicit regularization,” Neural Process. Lett., vol. 12, no. 2, pp. 107-113, Oct, 2000

Regularization (Cont.) • Ho, Leung, & Sum (2009): Adding regularizer term to training MSE as the objective function • Similar to Bernier et al approach. But, the weighting factor for the regularizer can be determined by the noise variance • Online learning is developed by the idea of gradient descent • No noise is injected during training • J. Sum, C.S. Leung, and K. Ho. On objective function, regularizer and prediction error of a learning algorithm for dealing with multiplicative weight noise. IEEE Transactions on Neural Networks Vol.20(1), Jan, 2009, 2009.

Misconception • Ho, Leung, & Sum (2009-): Convergence? • Show that the work by G. An (1996) is incomplete. • Essentially, his work is identical to the works done by Murray & Edwards (1993,1994) and Bernier et al (2000). Only the effect of weight noise on the prediction error of a MLP has been derived. • By theoretical analysis, injecting weight noise during training a RBF has no use. • By simulation, MSE converges but weights might not converge. • Injecting weight noise and weight decay during training can improve convergence • K.Ho, C.S.Leung and J. Sum, Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks, IEEE Transactions on Neural Networks, in press. • K. Ho, C.S. Leung, and J. Sum. On weight-noise-injection training, M. Koeppen, N. Kasabov and G. Coghill (eds.). Advances in Neuro-Information Processing, Springer LNCS 5507, pp. 919–926, 2009. • J. Sum and K. Ho. SNIWD: Simultaneous weight noise injection with weight decay for MLP training. Proc. ICONIP 2009, Bangkok Thailand, 2009.

Objective • Investigate the fault tolerance and convergence of a NN that is trained by the method of • combining weight noise injection and adding weight decay during BPA training. • Compared the results with the NN being trained by • BPA training • weight noise injection during BPA training • adding weight decay during BPA training • Focus on multiple layer perceptron (MLP) network • Multiplicative and additive weight noise injections

Learning Algorithms • BPA for linear output MLP (BPA1) • BPA1 with weight decay • BPA for sigmoid output MLP (BPA2) • BPA2 with weight decay • Weight noise injection training algorithms

BPA 1 • Data set: • Hidden node output: • MLP output: • ps. 13

BPA 1 (Cont.) • Objective function: • Update equation: • For j = 1, ... , n 14

BPA 1 with weight decay • Objective function: • Update equation: • For j = 1, ... , n 15

BPA 2 • Data set: • Hidden node output: • MLP output: • where • ps. 16

BPA 2 (Cont.) • Objective function: • Update equation: • For j = 1, ... , n 17

BPA 2 with weight decay • Objective function: • Update equation: • For j = 1, ... , n 18

Weight noise injection training algorithms • Update equation: • For multiplicative weight noise injection • For additive weight noise injection 19

Experiments • Data sets • Methodology • Results

Date sets

2D mapping

Mackey-Glass

NAR

Astrophysical Data

XOR

Character Recognition

Methodology • Training • BPA • BPA with weight noise injection • BPA with adding weight decay • BPA with weight noise injection with weight decay • Fault tolerance • MWNI-based training: effect of multiplicative weight noise on the prediction error of the trained MLP • AWNI-based training: effect of additive weight noise on the prediction error of the trained MLP • Convergence of the weight vectors

Methodology

2D mapping (MWN)

2D mapping (AWN)

Mackey-Glass (MWN)

Mackey-Glass (AWN)

NAR (MWN)

NAR (AWN)

Astrophysical (MWN)

Astrophysical (AWN)

XOR (MWN)

XOR (AWN)

Character recognition (MWN)

Outline

Outline

Presentation Transcript

Outline

Outline

Outline

Outline

Outline

Outline

Outline

outline

outline

OUTLINE

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline:

Outline

Outline

OUTLINE: