slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Outline PowerPoint Presentation
Download Presentation
Outline

Loading in 2 Seconds...

play fullscreen
1 / 58

Outline - PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on

Empirical studies on the online learning algorithms based on combining weight noise injection and weight decay Advisor: Dr. John Sum Student: Allen Liang. Outline. Introduction Learning Algorithms Experiments Conclusion. 2. Introduction. Background.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Outline' - heaton


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Empirical studies on the online learning algorithms based on combining weight noise injection and weight decay Advisor: Dr. John SumStudent: Allen Liang

outline
Outline
  • Introduction
  • Learning Algorithms
  • Experiments
  • Conclusion

2

background
Background
  • Neural network (NN) is a network system composed of interconnected neurons.
  • Learning aims to make a NN achieving good generalization (small prediction error).
slide5

Fault tolerant is an unavoidable issue that must be considered in hardware implementation.

    • Multiplicative weight noise or additive weight noise.
    • Weight could be randomly breaking down.
    • Hidden node could be out of work (stuck-at-zero & stuck-at-one).
    • To have network still workable with graceful degradation in the presence of noise/faults.
weight noise injection during training
Weight noise injection during training
  • Murray & Edwards (1993): Modify BPA by injecting weight noise during training for MLP
    • By simulation: convergence, fault tolerance
    • By theoretical analysis: effect of weight noise on the prediction error of a MLP
      • A.F. Murray and P.J. Edwards. Synaptic weight noise during multilayer perceptron training: fault tolerance and training improvements. IEEE Transactions on Neural Networks, Vol.4(4), 722-725, 1993.
      • A.F. Murray and P.J. Edwards. Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training. IEEE Transactions on Neural Networks, Vol.5(5), 792-802, 1994.
weight noise injection during training cont
Weight noise injection during training (Cont.)
  • Jim, Giles, Horne (1996): Modify RTRL by injecting weight noise during training for RNN
    • By simulation: convergence and generalization
    • By theoretical analysis: effect of weight noise on the prediction error of a RNN
      • Jim K.C., C.L. Giles and B.G. Horne, An analysis of noise in recurrent neural networks: Convergence and generalization, IEEE Transactions on Neural Networks, Vol.7, 1424-1438, 1996.
regularization
Regularization
  • Bernier and co-workers (2000): Adding explicit regularizer to training MSE as the objective function to be minimized.
    • Online learning algorithm is developed by the idea of gradient descent
    • No noise is injected during training
      • J. L. Bernier, J. Ortega, I. Rojas, and A. Prieto, “Improving the tolerance of multilayer perceptrons by minimizing the statistical sensitivity to weight deviations,” Neurocomputing, vol.31, pp.87-103, Jan, 2000
      • J. L. Bernier, J. Ortega, I. Rojas, E. Ros, and A. Prieto, “Obtaining fault tolerance multilayer perceptrons using an explicit regularization,” Neural Process. Lett., vol. 12, no. 2, pp. 107-113, Oct, 2000
regularization cont
Regularization (Cont.)
  • Ho, Leung, & Sum (2009): Adding regularizer term to training MSE as the objective function
    • Similar to Bernier et al approach. But, the weighting factor for the regularizer can be determined by the noise variance
    • Online learning is developed by the idea of gradient descent
    • No noise is injected during training
      • J. Sum, C.S. Leung, and K. Ho. On objective function, regularizer and prediction error of a learning algorithm for dealing with multiplicative weight noise. IEEE Transactions on Neural Networks Vol.20(1), Jan, 2009, 2009.
misconception
Misconception
  • Ho, Leung, & Sum (2009-): Convergence?
    • Show that the work by G. An (1996) is incomplete.
      • Essentially, his work is identical to the works done by Murray & Edwards (1993,1994) and Bernier et al (2000). Only the effect of weight noise on the prediction error of a MLP has been derived.
    • By theoretical analysis, injecting weight noise during training a RBF has no use.
    • By simulation, MSE converges but weights might not converge.
    • Injecting weight noise and weight decay during training can improve convergence
      • K.Ho, C.S.Leung and J. Sum, Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks, IEEE Transactions on Neural Networks, in press.
      • K. Ho, C.S. Leung, and J. Sum. On weight-noise-injection training, M. Koeppen, N. Kasabov and G. Coghill (eds.). Advances in Neuro-Information Processing, Springer LNCS 5507, pp. 919–926, 2009.
      • J. Sum and K. Ho. SNIWD: Simultaneous weight noise injection with weight decay for MLP training. Proc. ICONIP 2009, Bangkok Thailand, 2009.
objective
Objective
  • Investigate the fault tolerance and convergence of a NN that is trained by the method of
    • combining weight noise injection and adding weight decay during BPA training.
  • Compared the results with the NN being trained by
    • BPA training
    • weight noise injection during BPA training
    • adding weight decay during BPA training
  • Focus on multiple layer perceptron (MLP) network
  • Multiplicative and additive weight noise injections
learning algorithms
Learning Algorithms
  • BPA for linear output MLP (BPA1)
  • BPA1 with weight decay
  • BPA for sigmoid output MLP (BPA2)
  • BPA2 with weight decay
  • Weight noise injection training algorithms
bpa 1
BPA 1
  • Data set:
  • Hidden node output:
  • MLP output:
    • ps.

13

bpa 1 cont
BPA 1 (Cont.)
  • Objective function:
  • Update equation:
    • For j = 1, ... , n

14

bpa 1 with weight decay
BPA 1 with weight decay
  • Objective function:
  • Update equation:
    • For j = 1, ... , n

15

bpa 2
BPA 2
  • Data set:
  • Hidden node output:
  • MLP output:
    • where
      • ps.

16

bpa 2 cont
BPA 2 (Cont.)
  • Objective function:
  • Update equation:
    • For j = 1, ... , n

17

bpa 2 with weight decay
BPA 2 with weight decay
  • Objective function:
  • Update equation:
    • For j = 1, ... , n

18

weight noise injection training algorithms
Weight noise injection training algorithms
  • Update equation:
    • For multiplicative weight noise injection
    • For additive weight noise injection

19

experiments
Experiments
  • Data sets
  • Methodology
  • Results
methodology
Methodology
  • Training
    • BPA
    • BPA with weight noise injection
    • BPA with adding weight decay
    • BPA with weight noise injection with weight decay
  • Fault tolerance
    • MWNI-based training: effect of multiplicative weight noise on the prediction error of the trained MLP
    • AWNI-based training: effect of additive weight noise on the prediction error of the trained MLP
  • Convergence of the weight vectors
conclusion
Conclusion
  • For convergence, if we inject appropriate weight noise and adding appropriate weight decay during training it can ensure that the weights will converge.
  • The fault tolerance of a MLP can also be improved for the most data sets