580 likes | 715 Views
Empirical studies on the online learning algorithms based on combining weight noise injection and weight decay Advisor: Dr. John Sum Student: Allen Liang. Outline. Introduction Learning Algorithms Experiments Conclusion. 2. Introduction. Background.
E N D
Empirical studies on the online learning algorithms based on combining weight noise injection and weight decay Advisor: Dr. John SumStudent: Allen Liang
Outline • Introduction • Learning Algorithms • Experiments • Conclusion 2
Background • Neural network (NN) is a network system composed of interconnected neurons. • Learning aims to make a NN achieving good generalization (small prediction error).
Fault tolerant is an unavoidable issue that must be considered in hardware implementation. • Multiplicative weight noise or additive weight noise. • Weight could be randomly breaking down. • Hidden node could be out of work (stuck-at-zero & stuck-at-one). • To have network still workable with graceful degradation in the presence of noise/faults.
Weight noise injection during training • Murray & Edwards (1993): Modify BPA by injecting weight noise during training for MLP • By simulation: convergence, fault tolerance • By theoretical analysis: effect of weight noise on the prediction error of a MLP • A.F. Murray and P.J. Edwards. Synaptic weight noise during multilayer perceptron training: fault tolerance and training improvements. IEEE Transactions on Neural Networks, Vol.4(4), 722-725, 1993. • A.F. Murray and P.J. Edwards. Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training. IEEE Transactions on Neural Networks, Vol.5(5), 792-802, 1994.
Weight noise injection during training (Cont.) • Jim, Giles, Horne (1996): Modify RTRL by injecting weight noise during training for RNN • By simulation: convergence and generalization • By theoretical analysis: effect of weight noise on the prediction error of a RNN • Jim K.C., C.L. Giles and B.G. Horne, An analysis of noise in recurrent neural networks: Convergence and generalization, IEEE Transactions on Neural Networks, Vol.7, 1424-1438, 1996.
Regularization • Bernier and co-workers (2000): Adding explicit regularizer to training MSE as the objective function to be minimized. • Online learning algorithm is developed by the idea of gradient descent • No noise is injected during training • J. L. Bernier, J. Ortega, I. Rojas, and A. Prieto, “Improving the tolerance of multilayer perceptrons by minimizing the statistical sensitivity to weight deviations,” Neurocomputing, vol.31, pp.87-103, Jan, 2000 • J. L. Bernier, J. Ortega, I. Rojas, E. Ros, and A. Prieto, “Obtaining fault tolerance multilayer perceptrons using an explicit regularization,” Neural Process. Lett., vol. 12, no. 2, pp. 107-113, Oct, 2000
Regularization (Cont.) • Ho, Leung, & Sum (2009): Adding regularizer term to training MSE as the objective function • Similar to Bernier et al approach. But, the weighting factor for the regularizer can be determined by the noise variance • Online learning is developed by the idea of gradient descent • No noise is injected during training • J. Sum, C.S. Leung, and K. Ho. On objective function, regularizer and prediction error of a learning algorithm for dealing with multiplicative weight noise. IEEE Transactions on Neural Networks Vol.20(1), Jan, 2009, 2009.
Misconception • Ho, Leung, & Sum (2009-): Convergence? • Show that the work by G. An (1996) is incomplete. • Essentially, his work is identical to the works done by Murray & Edwards (1993,1994) and Bernier et al (2000). Only the effect of weight noise on the prediction error of a MLP has been derived. • By theoretical analysis, injecting weight noise during training a RBF has no use. • By simulation, MSE converges but weights might not converge. • Injecting weight noise and weight decay during training can improve convergence • K.Ho, C.S.Leung and J. Sum, Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks, IEEE Transactions on Neural Networks, in press. • K. Ho, C.S. Leung, and J. Sum. On weight-noise-injection training, M. Koeppen, N. Kasabov and G. Coghill (eds.). Advances in Neuro-Information Processing, Springer LNCS 5507, pp. 919–926, 2009. • J. Sum and K. Ho. SNIWD: Simultaneous weight noise injection with weight decay for MLP training. Proc. ICONIP 2009, Bangkok Thailand, 2009.
Objective • Investigate the fault tolerance and convergence of a NN that is trained by the method of • combining weight noise injection and adding weight decay during BPA training. • Compared the results with the NN being trained by • BPA training • weight noise injection during BPA training • adding weight decay during BPA training • Focus on multiple layer perceptron (MLP) network • Multiplicative and additive weight noise injections
Learning Algorithms • BPA for linear output MLP (BPA1) • BPA1 with weight decay • BPA for sigmoid output MLP (BPA2) • BPA2 with weight decay • Weight noise injection training algorithms
BPA 1 • Data set: • Hidden node output: • MLP output: • ps. 13
BPA 1 (Cont.) • Objective function: • Update equation: • For j = 1, ... , n 14
BPA 1 with weight decay • Objective function: • Update equation: • For j = 1, ... , n 15
BPA 2 • Data set: • Hidden node output: • MLP output: • where • ps. 16
BPA 2 (Cont.) • Objective function: • Update equation: • For j = 1, ... , n 17
BPA 2 with weight decay • Objective function: • Update equation: • For j = 1, ... , n 18
Weight noise injection training algorithms • Update equation: • For multiplicative weight noise injection • For additive weight noise injection 19
Experiments • Data sets • Methodology • Results
Methodology • Training • BPA • BPA with weight noise injection • BPA with adding weight decay • BPA with weight noise injection with weight decay • Fault tolerance • MWNI-based training: effect of multiplicative weight noise on the prediction error of the trained MLP • AWNI-based training: effect of additive weight noise on the prediction error of the trained MLP • Convergence of the weight vectors