1 / 19

Applying Sequential, Sparse Gaussian Processes – an Illustration Based on SIC2004

Applying Sequential, Sparse Gaussian Processes – an Illustration Based on SIC2004. Ben Ingram Neural Computing Research Group Aston University, Birmingham, UK. Spatial Interpolation Comparison 2004. What is SIC2004? SIC2004 objectives are to: generate results that are reliable

shiri
Download Presentation

Applying Sequential, Sparse Gaussian Processes – an Illustration Based on SIC2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applying Sequential, Sparse Gaussian Processes – an Illustration Based on SIC2004 Ben Ingram Neural Computing Research Group Aston University, Birmingham, UK

  2. Spatial Interpolation Comparison 2004 • What is SIC2004? • SIC2004 objectives are to: • generate results that are reliable • generate results in smallest amount of time • generate results automatically • deal with anomalies • Data provided: background gamma radiation in Germany

  3. Spatial Interpolation Comparison 2004 • Radiation data from 10 randomly selected days were given to participants to devise a method that met the criteria of SIC2004 • For each day there were 200 observations made at the locations shown by red circles • The aim: to predict as fast and as accurately as possible at 808 locations (black crosses) given 200 observations for an 11th randomly selected day

  4. Sequential Sparse Gaussian Processes • Gaussian processes equivalent to Kriging [Cornford 2002] • SSGP use a subset of the dataset called ‘basis vectors’ to best approximate the Gaussian process • Traditional methods require a matrix inversion which is n3 operation, nm2 (where m is number of ‘basis vectors’) • Model complexity controlled by ‘basis vectors’, but important features in the data retained

  5. Sequential Sparse Gaussian Processes • Bayesian Approach • Utilizes prior knowledge such as experience, expert knowledge or previous datasets • Model parameters described by Prior probability distribution • Likelihood: how likely is it that the parameters w generated the data D • Posterior distribution of parameters proportional to product of the likelihood and prior Likelihood Prior Posterior Bayes rule Normalising constant

  6. Choosing Model for SSGP • Machine Learning community treat estimating the covariance function differently • In Geostatistics experiment variogram computed and an appropriate model fitted • In Machine Learning the model is chosen based on experience or informed intuition • How could the 10 prior datasets be used? • Assume data is independent but identically distributed • Compute experimental variograms for subset of data (160 observations) for 10 prior days • Fitted various variogram models and used them in cross-validation for predicting at the 40 withheld locations

  7. Variography • Several models were fitted including mixtures of models variance lag distance • Mixture model consistently fitted better

  8. Variography • Experimental variogram used to select covariance model for SSGP • Insufficient number of observations at smaller lag distances to learn behaviour • Assume little variation at short separation distances • Use tighter variance with hyper-parameters of squared exponential component

  9. Boosting • Boosting used to estimate ‘best’ hyper-parameters (nugget, sill and range) • Adjust the hyper-parameters to maximize the likelihood of the training data • Iterative method used to search for optimal values of the hyper-parameters • Boosting assumes that each iterative step to locate the optimal hyper-parameters is composed of a linear combination of the individual iterative steps calculated for each day • Leave-one-out cross-validation used • 9 days used to estimate optimal parameters • Used resulting hyper-parameters as mean value for hyper-parameters to on left out dataset • Some information about hyper-parameters learnt, but the values are not fixed. Differing degrees of uncertainty associated with each hyper-parameter

  10. Interpolating using SSGP • Anisotropic covariance functions were used because we believed that the variation was not uniform in all directions • Learnt hyper-parameters used to set initial hyper-parameter values for SSGP • How were the number of ‘basis vectors’ (model complexity) chosen? • Cross-validation • Accuracy decreases as number of ‘basis vectors’ decreases

  11. Using our method with the competition data • SSGP was used with 11th day dataset to predict at 808 locations • In addition to the data for the 11th day a “joker” dataset was given • ‘Joker’ dataset simulated a radiation leak into the environment – but contestants did not know this until after the contest • SSGP was used with ‘joker’ dataset to predict at the same 808 locations

  12. Results • To determine how well SSGP performed, we compared it with some standard Machine learning techniques: • Multi-layer perceptrons • Radial basis functions • Gaussian processes • Netlab Matlab toolbox was used for calculations

  13. Results

  14. Contour Maps SSGP GP Actual

  15. Results – Joker dataset

  16. Contour Maps - Joker SSGP GP Actual

  17. Learnt hyper-parameters • Exponential range parameters break down as noise parameter becomes large • Squared Exponential parameters relatively constant between datasets

  18. Conclusions • Once nature of covariance structure is understood, interpolation with SSGP is completely automatic • There were problems predicting when there were extreme values, this would be expected • Incorporating a robust estimation method for data with anomalies should be investigated • For 11th day dataset SSGP and GP produced similar results, but SSGP is faster • SSGP devised for large datasets, but can improve speed with small datasets.

  19. Acknowledgements • Lehel Csato – Developer of SSGP algorithm SSGP software available from: http://www.ncrg.aston.ac.uk

More Related