1 / 1

Crystal Linkletter and Derek Bingham Department of Statistics and Actuarial Science

Variable Selection for Gaussian Process Models in Computer Experiments. Crystal Linkletter and Derek Bingham Department of Statistics and Actuarial Science Simon Fraser University. David Higdon and Nick Hengartner Statistical Sciences Discrete Event Simulations

gema
Download Presentation

Crystal Linkletter and Derek Bingham Department of Statistics and Actuarial Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Variable Selection for Gaussian Process Models in Computer Experiments Crystal Linkletter and Derek Bingham Department of Statistics and Actuarial Science Simon Fraser University David Higdon and Nick Hengartner Statistical Sciences Discrete Event Simulations Los Alamos National Laboratory Kenny Q. Ye Department of Epidemiology and Population Health Albert Einstein College of Medicine Introduction Computer simulators often require a large number of inputs and are computationally demanding. A main goal of computer experimentation may be screening, identifying which inputs have a significant impact on the process being studied. Gaussian spatial process (GASP) models are commonly used to model computer simulators. These models are flexible, but make variable selection challenging. We present reference distribution variable selection (RDVS) as a new approach to screening for GASP models. Results Simulated Example We used a 54-run space-filling Latin hypercube design with p=10 factors. The response is generated by: A GASP model is used to analyse the generated response and the RDVS algorithm is used to identify the first four factors as active: Posterior distributions for correlation parameters of 10 factors. The horizontal line marks the 10th percentile of the reference distribution. Correlation parameters with posterior medians below this line indicate active factors. Taylor Cylinder Experiment A 118-run 5-level nearly-orthogonal design was used. Exploratory analysis suggests factor 6 is important, otherwise significant factors are difficult to identify: RDVS identifies factor 6 and six other factors as having a significant impact on cylinder deformation. Discussion RDVS is able to correctly identify when none of the true factors are active. This variable selection technique complements methods in sensitivity analysis. It can be used as a precursor to alternative visualization and ANOVA approaches to screening. The method is robust to the specification of the prior distributions. Since the inert variable is assigned the same prior as the true factors, the method self-calibrates. Gaussian Spatial Process Model To model the response from a computer experiment, we use a Bayesian version of the GASP model originally used by Sacks et al. (1989): y(X): Simulator response – (n x 1) vector X: Input to the computer code – (n x p) design matrix : White-noise process, independent of z(X) The Gaussian spatial process, z(X), is specified to have mean zero and covariance function Under this parameterization, if k is close to one, the kth input is not active. RDVS is a method for gauging the relative magnitudes of the correlation parameters k. • Conclusions and Future Research • RDVS is a new method for variable selection for Bayesian Gaussian Spatial Process models. • The methodology is motivated by asking: what would the posterior distribution of the correlation parameter for an inert factor look like given the data? • The approach is Bayesian and only requires the generation of an inert factor, but the screening has a frequentist flavour, using the distribution of the inert factor as a reference distribution. • Future research: • Using a linear regression model for the mean of the GASP model • Using RDVS for variable selection for other models. Computer Experiment Example Taylor Cylinder Experiment (Los Alamos National Lab) This is a finite element code used to simulate the high velocity impact of a cylinder. In the experiment, copper cylinders (length 5.08 cm, radius 1 cm) are fired into a fixed barrier at a velocity of 177 m/s. The cylinder length after impact is used as the outcome. The process is governed by 14 parameters which control the behaviour of the cylinder after impact. Over the limited range that the computer experiment exercises the simulator, it is expected that the response is dominated by only a few of the 14 parameters. • RDVS Algorithm • To implement RDVS, a factor which is known to be inert is appended to the design matrix X. This provides a benchmark against which the other input factors can be compared. • Algorithm • Augment the design matrix by adding a new design column corresponding to an inert factor. • Find the posterior median of the correlation parameter corresponding to the dummy factor. • Repeat steps 1. and 2. many times to obtain the distribution of the posterior median of an inert factor to use as a reference distribution. • Compare the posterior medians of the correlation parameters of the true factors to the reference distribution. The percentile of the reference distribution used for comparison reflects the rate of falsely identifying an inert factor as active. Acknowledgements This research was initiated while Linkletter, Bingham and Ye were visiting the Statistical Sciences group at Los Alamos National Laboratory. This work was supported by a grant from the Natural Sciences and Engineering Research Council of Canada. Ye’s research supported by NSD DMS-0306306.

More Related