290 likes | 447 Views
GEM-SA: a tutorial. John Paul Gosling University of Sheffield. Overview. GEM-SA: Gaussian Emulation Machine for Sensitivity Analysis It’s a Windows based program that has a graphical interface created by Marc Kennedy during his time in CTCD
E N D
GEM-SA: a tutorial John Paul Gosling University of Sheffield
Overview • GEM-SA: Gaussian Emulation Machine for Sensitivity Analysis • It’s a Windows based program that has a graphical interface created by Marc Kennedy during his time in CTCD • It does emulation for prediction, uncertainty analysis and sensitivity analysis • It also has a facility to create experimental designs for the analysis of computer models.
Starting the program • On the desktop, there is a folder <GEM-SA tutorial>, opening it will reveal two other folders: • Inside the folder <GEM-SA1.1> is the program: • Double-clicking this will start the program
Main window menu toolbar Sensitivity Analysis output grid log window
Generating input designs • There are two designs available: LP-TAU and Maximin Latin Hypercube. Both have good space filling properties. Press this button to create a file of inputs for your computer model
Generating input designs • Then we specify ranges over which the input will be of interest • These must cover your beliefs about the range of each input
The design • Here’s a 50-point LP-TAU design for three inputs • You’ll also find they’ve been written to the file you specified (LP_TAU50.txt) in GEM-SA’s working directory
Creating/Editing a project • Now, we’ll run through some of the options available to us for emulator building. • We can create a new project or edit an existing project by selecting the appropriate item from the project menu. • Or we can use these toolbar buttons. New Edit
Edit Project - Files Names of input files Names of output files
Edit Project - Options Edit input names How many inputs?
Edit Project - Options What should be calculated, and how? Which joint effects should be calculated?
Edit Project - Options What prior mean for the output? Are the inputs uncertain?
Edit Project - Options What kind of predictions and cross validation?
Edit Project - Simulations MCMC control parameters Number of realisations for prediction and ME/JE How many points used to calculate main effects, joint effects
Input names • By clicking the <Names…> button, a window opens that allows us to name each of the inputs. • This can be handy when viewing the variance decomposition results and main effects plots.
Distributions for inputs • When we click the <OK> button, the following window opens. • This windows allows us to specify our beliefs about the inputs.
A first run through • Consider the simple nonlinear model we saw earlier y = sin(x1)/{1+exp(x1+x2)} • We have 2 inputs, x1 and x2, and we assume they both must be valued in the range [0,1]. • 20 points will give us a decent coverage of the unit square that is the input space here. • Two files have already been saved in the folder <Examples\Eg1> to help save us time.
Monte Carlo method • Here’s the result of a Monte Carlo analysis using 30 input pairs. • Mean = 0.139, median = 0.142 • Std. dev. = 0.053 • Variance = 0.0028
Monte Carlo method • Here’s the result of a Monte Carlo analysis using 10,000 input pairs. • Mean = 0.114, median = 0.115 • Std. dev. = 0.054 • Variance = 0.0029
Prediction • Predictions can be • Correlated realisations of outputs at the prediction inputs • Similar to main effect outputs • Marginal means and variances of outputs at the prediction inputs • Faster to compute, especially with many prediction points • Easy to interpret
A plot of the predictions • Here is the prediction output files plotted with the real function with x2 fixed at 0.5.
Cross validation • Choice of none, leave-one-out or leave final 20% out • Leave-one-out • Hyperparameters use all data and are then fixed when prediction is carried out for each omitted point • Leave final 20% out • Hyperparameters are estimated using the reduced data subset
A real example • A dynamic vegetation model is being used to predict the NBP of deciduous broadleaf woodland in the vicinity of Whitby, North Yorkshire. • The scientists are uncertain about ten inputs of the model and want to know how this uncertainty affects the NBP output of the model – Monte Carlo methods are out of the question as the model is too complex. • When they used their best guesses for these inputs, the model returned a NBP of 146.4gC/m2.
The input names in order • Maximum age (years) N(200,625) • Water potential (M Pa) N(3,0.25) • Leaf life span (days) N(190,1600) • Leaf mortality index N(0.005,6.25e-6) • Bud burst limit (degree days) N(135,6.25) • Seeding density (m2) N(0.1,0.0001) • Soil sand (%) N(43.27,222.12) • Soil clay (%) N(22.36,49.21) • log(stem growth rate) N(-5.116,0.041209) • Bulk density N(1.214,0.0325)
Main effects plots • The plug-in estimate of the NBP is far away from our mean for NBP as the main effect plot for bulk density is concave around it’s expected value of 1.214.
Producing main/joint effects plots for publication • In the files section of the edit project window, there are two fields that allow the user to specify where the main/joint effects data should be written. • These files can be used to produce graphs like the one I showed earlier. • The main effects file is structured as follows: • There are a number of blocks of function realisations – one for each input. • These are controlled by
Limitations of GEM-SA • In theory, the methods used by GEM-SA are limitless; however, the program itself isn’t. • It can handle up to 30 inputs and 400 training data. • Also, the distributions that are used to express our uncertainty about the inputs are limited to uniform or normal.
When it all goes wrong… • How do we know when the emulator is not working? • Large roughness parameters • Especially ones hitting the limit of 99 • Large emulation variance on UA mean • Poor CV standardised prediction error • Especially when some are extremely large • In such cases, see if a larger training set helps • Other ideas like transforming output scale
Where to find the program • GEM-SA is available on the web along with tutorial slides from a longer course and further example data sets. • Links to it can be found on my website where there is also a technical report explaining the perils of using the “plug-in” approach: j-p-gosling.staff.shef.ac.uk