1 / 35

MML Inference of RBFs

MML Inference of RBFs. Enes Makalic Lloyd Allison Andrew Paplinski. Presentation Outline. RBF architecture selection Existing methods Overview of MML MML87 MML inference of RBFs MML estimators for RBF parameters Results Conclusion Future work. RBF Architecture Selection (1).

abedi
Download Presentation

MML Inference of RBFs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MML Inference of RBFs Enes Makalic Lloyd Allison Andrew Paplinski

  2. Presentation Outline • RBF architecture selection • Existing methods • Overview of MML • MML87 • MML inference of RBFs • MML estimators for RBF parameters • Results • Conclusion • Future work

  3. RBF Architecture Selection (1) • Determine optimal network architecture for a given problem • Involves choosing: • Number and type of basis functions • Influences the success of the training process • If we choose a RBF that is: • Too small: poor performance • Too large: overfitting

  4. RBF Architecture Selection (2) Overfitting Poor Performance

  5. RBF Architecture Selection (2) • Architecture selection solutions • Use as many basis functions as there is data • Expectation Maximization (EM) • K-means clustering • Regression trees (M. Orr) • BIC, GPE, etc. • Bayesian inference • Reversible jump MCMC

  6. Overview of MML (1) • Objective function to estimate the goodness of a model • A sender wishes to send data, x, to a receiver • How well is the data encoded? • Message length (for example, in bits) Transmissionchannel Sender Receiver (noiseless)

  7. Hypothesis Data given Hypothesis - log Pr(H) - log Pr(D|H) Overview of MML (2) • Transmit the data in two parts: • Part 1: encoding of the model • Part 2: encoding of the data given the model • Quantitative form of Occam’s razor

  8. Overview of MML (3) • MML87 • Efficient approximation to strict MML • Total message length for a model with parameters :

  9. Overview of MML (4) • MML87 • is the prior information • is the likelihood function • is the number of parameters • is a dimension constant • is the determinant of the expected Fisher information matrix with entries (i, j):

  10. Overview of MML (5) • MML87 • Fisher Information: • Sensitivity of likelihood function to parameters • Determines the accuracy of stating the model • Small second derivatives state parameters less precisely • Large second derivatives state parameters more accurately • A model that minimises the total message length is optimal

  11. MML Inference of RBFs (1) • Regression problems • We require: • A likelihood function • Fisher information • Priors on all model parameters

  12. MML Inference of RBFs (2) • Notation

  13. MML Inference of RBFs (3) • RBF Network • m inputs, n parameters, o outputs • Mapping from parameters to outputs • w: vector of network parameters • Network output implicitly depends on the network input vector, • Define output non-linearity

  14. MML Inference of RBFs (4) • Likelihood function • Learning: minimisation of a scalar function • We define L as the negative log likelihood • L implicitly depends on given targets, z, for network outputs • Different input-target pairs are considered independent

  15. MML Inference of RBFs (5) • Likelihood function • Regression problems • The network error, , is assumed Gaussian with a mean and variance

  16. MML Inference of RBFs (6) • Fisher information • Expected Hessian matrix, • Jacobian matrix of L • Hessian matrix of L

  17. MML Inference of RBFs (7) • Fisher information • Taking expectations and simplifying we obtain • Positive semi-definite • Complete Fisher includes a summation over the whole data set D • We used an approximation to F • Block-diagonal • Hidden basis functions assumed to be independent • Simplified determinant – product of determinants for each block

  18. MML Inference of RBFs (8) • Priors • Must specify a prior density for each parameter • Centres: uniform • Radii: uniform (log-scale) • Weights: Gaussian • Zero mean and standard deviation • is usually taken to be large (vague prior)

  19. MML Inference of RBFs (9) • Message length of a RBF • where: • denotes the cost of transmitting the number of basis functions • F(w) is the determinant of the expected Fisher information • L is the negative log-likelihood • C is a dimension constant • Independent of w

  20. MML Inference of RBFs (10) • MML estimators for parameters • Standard unbiased estimator for the error s.d. • Numerical optimisation using • Differentiation of the expected Fisher information determinant

  21. Results (1) • MML inference criterion is compared to: • Conventional MATLAB RBF implementation • M. Orr’s regression tree method • Functions used for criteria evaluation • Correct answer known • Correct answer not known

  22. Results (2) • Correct answer known • Generate data from a known RBF (one, three and five basis functions respectively) • Inputs uniformly sampled in the range (-8,8) • 1D and 2D inputs were considered • Gaussian noise N(0,0.1) added to the network outputs • Training set and test set comprise 100 and 1000 patterns respectively

  23. Results (3) • MSE • Correct answer known (1D input)

  24. Results (4) • MSE • Correct answer known (2D inputs)

  25. Results (5) • Correct answer not known • The following functions were used:

  26. Results (6) • Correct answer not known • Gaussian noise N(0,0.1) added to the network outputs • Training set and test set comprise 100 and 1000 patterns respectively

  27. Results (7)

  28. Results (8)

  29. Results (9) • MSE • Correct answer not known

  30. Results (10) • Sensitivity of criteria to noise

  31. Results (11) • Sensitivity of criteria to data set size

  32. Conclusion (1) • Novel approach to architecture selection in RBF networks • MML87 • Block-diagonal Fisher information matrix approximation • MATLAB code available from: • http://www.csse.monash.edu.au/~enesm

  33. Conclusion (2) • Results • Initial testing • Good performance when level of noise and dataset size is varied • No over-fitting • Future work • Further testing • Examine if MML parameter estimators improve performance • MML and regularization

  34. Conclusion (3) • Questions?

  35. Conclusion (4) Thank you :)

More Related