Fast Simulators for Assessment and Propagation of Model Uncertainty*

Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of the National Institute of Statistical Sciences

Some activities requiring numerous runs of a complex computer model • Output analysis: with random inputs, what is the distribution of output variables? • Optimization: finding the optimal setting for process control variables (e.g., signal timing). • Design: of computer or field experiments. • Bayesian Inference: learning about unknown model parameters or inputs from field data (i.e., data from the process being modeled).

The problem and solution • If runs of the computer model are too slow, the activity cannot be completed. • The natural solution is to approximate the computer model; most common is approximation by a faster computer model. • models of lower ‘resolution’ • linearized versions of the model • response surface (or Gaussian process) approximations • probability networks of various types.

An Example: Bayesian input analysis for CORSIM • The microsimulator CORSIM is a computer model of street and highway traffic. • It models vehicles, entering the network and moving according to interaction rules. • The traffic network studied consists of a 44-intersection neighborhood in Chicago. • CORSIM was applied to model a one-hour period during rush-hour.

Network (Chicago) O’Hare Kingsbury Huron Erie Ontario Ohio Grand Illinois Hubbard Dearborn Orleans Franklin LaSalle Clark Wells LOOP

Key Unknown Inputs • Demands,  : the means of exponential inter-arrival time distributions that determine the (random) numbers of vehicles that enter the system from external streets.  is 16-dimensional. • Turning probabilities, P: the probabilities that vehicles turn right, left, or go through each intersection. P is 84-dimensional.

Data: vehicle counts, C • Demand counts: the numbers of vehicles entering the network at each street, recorded by observers placed on the external streets. • Turning counts: made by observers over short time intervals at all intersections. • Video counts: At central intersections, cameras were placed that produced an exact count of vehicles.

Problems with the Data • Demand counts are inaccurate, some as much as 40%. • Turning counts were made over short time periods. • Some of the turning counts were missing. • The observer counts were incompatible with the video counts (reality) so they were tuned to bring them into accordance.

Example of a tuning adjustment Observer reported 1969 vehicles entering here. This was adjusted to 1790 vehicles to fit the observed video count here. Erie Ontario LaSalle

Often, too few inputs are tuned, and those that are tuned are then over-tuned. The often considerable uncertainty in the tuned inputs is ignored, resulting in overly optimistic assessment of output variance . Tuning can mask model biases that actually exist, making the model less accurate for prediction outside the range of the data (not applicable here). Problems with tuning

A solution: Bayesian analysis • Compute the posterior distribution of the true model inputs, given the data. • But this typically requires use of Markov chain Monte Carlo (MCMC) methods, involving thousands of model runs; too time consuming for CORSIM. • Thus a fast simulator is needed, one which represents those features of CORSIM that allow the data to be related to model inputs.

Structure of the fast simulator • It is a probability network • with the same nodal structure as CORSIMS; • with unknown inputs  (vehicle inter-arrival rates) and P (turning probabilities) that mean the same as in CORSIM; • but, with ‘instantaneous’ vehicles, that (i) enter the network; (ii) turn appropriately; (iii) exit. Note: fast simulators often have a limited purpose, and are not general replacements for the computer model; here, we ignore the key features of time, interactions, signals, etc.

Modeling the demand counts data • Demand counts: Each demand count, CiD, is modelled by a Poisson distribution with mean biNi , where Ni is the true count and bi-1 is the unknown “observer bias.” • The bi are modelled as being i.i.d. Gamma(, ), with  <2 (so that the expected bias is less than 100%), but are otherwise unknown, and assigned a uniform prior distribution.

Modeling the turning counts data • If Ni vehicles arrive at an intersection from a given direction, the numbers turning right, left, and going through, (NiR, NiL, NiT), are assumed to follow a multinomial distribution with probabilities (PiR, PiL, PiT). • The (PiR, PiL, PiT) are assigned the Jeffreys prior distribution  (PiR PiL PiT)-1/2. • The observed turning counts, CiT, were assumed to be accurate.

Latent Variables and Restrictions • Introduce ‘latent’ Ni , counts on all streets: • the total number of vehicles entering an intersection must equal the number leaving; • the video counts, assumed to be accurate, lead to known values of some sums of these Ni ; • Eliminate ‘excess’ Ni (from an initial ?? to 74), in such a way that the restrictions have a simple structure. (Poster by G. Molina.) • Let N denote the constrained region of Ni .

The posterior distribution • By Bayes theorem, the posterior distribution, p(N, l, P, b, ,  | C), of all unknowns given the data C, is simply proportional to the product of the likelihood and the prior, i.e. • fPoisson(CD | ND, b) fmultinomial(CT| P)  pmultinomial(N | P) pPoisson(ND | l)  pJeffreys(P,) pGamma(b | ,  ) 11N.

Computation • The posterior has 192 unknown parameters. • Computation must be done by MCMC. We utilize a Gibbs sampling scheme. • The full conditional distributions for P,, b, and  are, respectively, Dirichlet, Gamma, Gamma, and restricted Gamma; these are easy to sample. •  has a log-concave density; rejection sampling • Each Ni is sampled directly from its discrete distribution (restricted range). • Roughly 100,000 iterations needed.

Gridlock and model constraints • In CORSIM, gridlock (all vehicles stopped) can occur (20% of the runs in last graph). • This essentially defines the unfeasibilityregion, , of the parameter space. • This can be handled in CORSIM by simply ignoring runs that yield gridlock (in the Bayesian inference, this corresponds to multiplying the posterior by 1).

Conclusions • ‘Tuning’ should be replaced by Bayesian inference for unknown parameters or inputs. • It may be necessary to constrain the parameter space by ignoring model runs that lie outside the unfeasibility region. • If evaluation of the computer model is too slow, fast simulators should be sought for which Bayesian inference is feasible.

Fast Simulators for Assessment and Propagation of Model Uncertainty*

Fast Simulators for Assessment and Propagation of Model Uncertainty*

Presentation Transcript

Rossby wave propagation

Uncertainty

Risk and uncertainty: a range of approaches

Model Averaging: Beyond Model Uncertainty in Risk Analysis *

Plant propagation

Plant Propagation

METR 5970.002 Advanced Atmospheric Radiation

Uncertainty

ECE 5221 Personal Communication Systems

Propagation of Uncertainty

ATDI Propagation Models in ICS telecom

Uncertainty Analysis Meets Climate Change

Identify the Proper Vegetative Propagation Technique

Using uncertainty to test model complexity

Why should you use simulators for training?

Summary of Experimental Uncertainty Assessment Methodology

Perspectives in handling uncertainty in European data

A Danish CGE model including uninsurable idiosyncratic earnings uncertainty

Plant Propagation

MINIMISING UNCERTAINTY IN PRODUCTION ESTIMATES

Propagation of Uncertainty: Multiplication and Division

Cable Model Voltage Clamp Propagation of an Action Potential