Empirical studies on the online learning algorithms based on combining weight noise injection and...
Download
1 / 58

Outline - PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on

Empirical studies on the online learning algorithms based on combining weight noise injection and weight decay Advisor: Dr. John Sum Student: Allen Liang. Outline. Introduction Learning Algorithms Experiments Conclusion. 2. Introduction. Background.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Outline' - heaton


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Empirical studies on the online learning algorithms based on combining weight noise injection and weight decay Advisor: Dr. John SumStudent: Allen Liang


Outline
Outline on combining weight noise injection and weight decay

  • Introduction

  • Learning Algorithms

  • Experiments

  • Conclusion

2


Introduction
Introduction on combining weight noise injection and weight decay


Background
Background on combining weight noise injection and weight decay

  • Neural network (NN) is a network system composed of interconnected neurons.

  • Learning aims to make a NN achieving good generalization (small prediction error).


  • Fault tolerant on combining weight noise injection and weight decay is an unavoidable issue that must be considered in hardware implementation.

    • Multiplicative weight noise or additive weight noise.

    • Weight could be randomly breaking down.

    • Hidden node could be out of work (stuck-at-zero & stuck-at-one).

    • To have network still workable with graceful degradation in the presence of noise/faults.


Weight noise injection during training
Weight noise injection during training on combining weight noise injection and weight decay

  • Murray & Edwards (1993): Modify BPA by injecting weight noise during training for MLP

    • By simulation: convergence, fault tolerance

    • By theoretical analysis: effect of weight noise on the prediction error of a MLP

      • A.F. Murray and P.J. Edwards. Synaptic weight noise during multilayer perceptron training: fault tolerance and training improvements. IEEE Transactions on Neural Networks, Vol.4(4), 722-725, 1993.

      • A.F. Murray and P.J. Edwards. Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training. IEEE Transactions on Neural Networks, Vol.5(5), 792-802, 1994.


Weight noise injection during training cont
Weight noise injection during training (Cont.) on combining weight noise injection and weight decay

  • Jim, Giles, Horne (1996): Modify RTRL by injecting weight noise during training for RNN

    • By simulation: convergence and generalization

    • By theoretical analysis: effect of weight noise on the prediction error of a RNN

      • Jim K.C., C.L. Giles and B.G. Horne, An analysis of noise in recurrent neural networks: Convergence and generalization, IEEE Transactions on Neural Networks, Vol.7, 1424-1438, 1996.


Regularization
Regularization on combining weight noise injection and weight decay

  • Bernier and co-workers (2000): Adding explicit regularizer to training MSE as the objective function to be minimized.

    • Online learning algorithm is developed by the idea of gradient descent

    • No noise is injected during training

      • J. L. Bernier, J. Ortega, I. Rojas, and A. Prieto, “Improving the tolerance of multilayer perceptrons by minimizing the statistical sensitivity to weight deviations,” Neurocomputing, vol.31, pp.87-103, Jan, 2000

      • J. L. Bernier, J. Ortega, I. Rojas, E. Ros, and A. Prieto, “Obtaining fault tolerance multilayer perceptrons using an explicit regularization,” Neural Process. Lett., vol. 12, no. 2, pp. 107-113, Oct, 2000


Regularization cont
Regularization (Cont.) on combining weight noise injection and weight decay

  • Ho, Leung, & Sum (2009): Adding regularizer term to training MSE as the objective function

    • Similar to Bernier et al approach. But, the weighting factor for the regularizer can be determined by the noise variance

    • Online learning is developed by the idea of gradient descent

    • No noise is injected during training

      • J. Sum, C.S. Leung, and K. Ho. On objective function, regularizer and prediction error of a learning algorithm for dealing with multiplicative weight noise. IEEE Transactions on Neural Networks Vol.20(1), Jan, 2009, 2009.


Misconception
Misconception on combining weight noise injection and weight decay

  • Ho, Leung, & Sum (2009-): Convergence?

    • Show that the work by G. An (1996) is incomplete.

      • Essentially, his work is identical to the works done by Murray & Edwards (1993,1994) and Bernier et al (2000). Only the effect of weight noise on the prediction error of a MLP has been derived.

    • By theoretical analysis, injecting weight noise during training a RBF has no use.

    • By simulation, MSE converges but weights might not converge.

    • Injecting weight noise and weight decay during training can improve convergence

      • K.Ho, C.S.Leung and J. Sum, Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks, IEEE Transactions on Neural Networks, in press.

      • K. Ho, C.S. Leung, and J. Sum. On weight-noise-injection training, M. Koeppen, N. Kasabov and G. Coghill (eds.). Advances in Neuro-Information Processing, Springer LNCS 5507, pp. 919–926, 2009.

      • J. Sum and K. Ho. SNIWD: Simultaneous weight noise injection with weight decay for MLP training. Proc. ICONIP 2009, Bangkok Thailand, 2009.


Objective
Objective on combining weight noise injection and weight decay

  • Investigate the fault tolerance and convergence of a NN that is trained by the method of

    • combining weight noise injection and adding weight decay during BPA training.

  • Compared the results with the NN being trained by

    • BPA training

    • weight noise injection during BPA training

    • adding weight decay during BPA training

  • Focus on multiple layer perceptron (MLP) network

  • Multiplicative and additive weight noise injections


Learning algorithms
Learning Algorithms on combining weight noise injection and weight decay

  • BPA for linear output MLP (BPA1)

  • BPA1 with weight decay

  • BPA for sigmoid output MLP (BPA2)

  • BPA2 with weight decay

  • Weight noise injection training algorithms


Bpa 1
BPA 1 on combining weight noise injection and weight decay

  • Data set:

  • Hidden node output:

  • MLP output:

    • ps.

13


Bpa 1 cont
BPA 1 (Cont.) on combining weight noise injection and weight decay

  • Objective function:

  • Update equation:

    • For j = 1, ... , n

14


Bpa 1 with weight decay
BPA 1 with weight decay on combining weight noise injection and weight decay

  • Objective function:

  • Update equation:

    • For j = 1, ... , n

15


Bpa 2
BPA 2 on combining weight noise injection and weight decay

  • Data set:

  • Hidden node output:

  • MLP output:

    • where

      • ps.

16


Bpa 2 cont
BPA 2 (Cont.) on combining weight noise injection and weight decay

  • Objective function:

  • Update equation:

    • For j = 1, ... , n

17


Bpa 2 with weight decay
BPA 2 with weight decay on combining weight noise injection and weight decay

  • Objective function:

  • Update equation:

    • For j = 1, ... , n

18


Weight noise injection training algorithms
Weight noise injection training algorithms on combining weight noise injection and weight decay

  • Update equation:

    • For multiplicative weight noise injection

    • For additive weight noise injection

19


Experiments
Experiments on combining weight noise injection and weight decay

  • Data sets

  • Methodology

  • Results


Date sets
Date sets on combining weight noise injection and weight decay


2d mapping
2D mapping on combining weight noise injection and weight decay


Mackey glass
Mackey-Glass on combining weight noise injection and weight decay


NAR on combining weight noise injection and weight decay


Astrophysical data
Astrophysical Data on combining weight noise injection and weight decay


XOR on combining weight noise injection and weight decay


Character recognition
Character Recognition on combining weight noise injection and weight decay


Methodology
Methodology on combining weight noise injection and weight decay

  • Training

    • BPA

    • BPA with weight noise injection

    • BPA with adding weight decay

    • BPA with weight noise injection with weight decay

  • Fault tolerance

    • MWNI-based training: effect of multiplicative weight noise on the prediction error of the trained MLP

    • AWNI-based training: effect of additive weight noise on the prediction error of the trained MLP

  • Convergence of the weight vectors


Methodology1
Methodology on combining weight noise injection and weight decay


2d mapping mwn
2D mapping (MWN) on combining weight noise injection and weight decay


2d mapping mwn1
2D mapping (MWN) on combining weight noise injection and weight decay


2d mapping awn
2D mapping (AWN) on combining weight noise injection and weight decay


2d mapping awn1
2D mapping (AWN) on combining weight noise injection and weight decay


Mackey glass mwn
Mackey-Glass (MWN) on combining weight noise injection and weight decay


Mackey glass mwn1
Mackey-Glass (MWN) on combining weight noise injection and weight decay


Mackey glass awn
Mackey-Glass (AWN) on combining weight noise injection and weight decay


Mackey glass awn1
Mackey-Glass (AWN) on combining weight noise injection and weight decay


Nar mwn
NAR (MWN) on combining weight noise injection and weight decay


Nar mwn1
NAR (MWN) on combining weight noise injection and weight decay


Nar awn
NAR (AWN) on combining weight noise injection and weight decay


Nar awn1
NAR (AWN) on combining weight noise injection and weight decay


Astrophysical mwn
Astrophysical (MWN) on combining weight noise injection and weight decay


Astrophysical mwn1
Astrophysical (MWN) on combining weight noise injection and weight decay


Astrophysical awn
Astrophysical (AWN) on combining weight noise injection and weight decay


Astrophysical awn1
Astrophysical (AWN) on combining weight noise injection and weight decay


Xor mwn
XOR (MWN) on combining weight noise injection and weight decay


Xor mwn1
XOR (MWN) on combining weight noise injection and weight decay


Xor awn
XOR (AWN) on combining weight noise injection and weight decay


Xor awn1
XOR (AWN) on combining weight noise injection and weight decay


Character recognition mwn
Character recognition (MWN) on combining weight noise injection and weight decay


Character recognition mwn1
Character recognition (MWN) on combining weight noise injection and weight decay


Character recognition awn
Character recognition (AWN) on combining weight noise injection and weight decay


Character recognition awn1
Character recognition (AWN) on combining weight noise injection and weight decay


Summary
Summary on combining weight noise injection and weight decay


Astrophysical data1
Astrophysical data on combining weight noise injection and weight decay


Conclusion
Conclusion on combining weight noise injection and weight decay

  • For convergence, if we inject appropriate weight noise and adding appropriate weight decay during training it can ensure that the weights will converge.

  • The fault tolerance of a MLP can also be improved for the most data sets


Thank You on combining weight noise injection and weight decay


ad