1 / 28

Fast Simulators for Assessment and Propagation of Model Uncertainty*

Fast Simulators for Assessment and Propagation of Model Uncertainty*. Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of the National Institute of Statistical Sciences. Some activities requiring numerous runs of a complex computer model.

pstrickland
Download Presentation

Fast Simulators for Assessment and Propagation of Model Uncertainty*

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of the National Institute of Statistical Sciences

  2. Some activities requiring numerous runs of a complex computer model • Output analysis: with random inputs, what is the distribution of output variables? • Optimization: finding the optimal setting for process control variables (e.g., signal timing). • Design: of computer or field experiments. • Bayesian Inference: learning about unknown model parameters or inputs from field data (i.e., data from the process being modeled).

  3. The problem and solution • If runs of the computer model are too slow, the activity cannot be completed. • The natural solution is to approximate the computer model; most common is approximation by a faster computer model. • models of lower ‘resolution’ • linearized versions of the model • response surface (or Gaussian process) approximations • probability networks of various types.

  4. An Example: Bayesian input analysis for CORSIM • The microsimulator CORSIM is a computer model of street and highway traffic. • It models vehicles, entering the network and moving according to interaction rules. • The traffic network studied consists of a 44-intersection neighborhood in Chicago. • CORSIM was applied to model a one-hour period during rush-hour.

  5. Network (Chicago) O’Hare Kingsbury Huron Erie Ontario Ohio Grand Illinois Hubbard Dearborn Orleans Franklin LaSalle Clark Wells LOOP

  6. Key Unknown Inputs • Demands,  : the means of exponential inter-arrival time distributions that determine the (random) numbers of vehicles that enter the system from external streets.  is 16-dimensional. • Turning probabilities, P: the probabilities that vehicles turn right, left, or go through each intersection. P is 84-dimensional.

  7. Data: vehicle counts, C • Demand counts: the numbers of vehicles entering the network at each street, recorded by observers placed on the external streets. • Turning counts: made by observers over short time intervals at all intersections. • Video counts: At central intersections, cameras were placed that produced an exact count of vehicles.

  8. Problems with the Data • Demand counts are inaccurate, some as much as 40%. • Turning counts were made over short time periods. • Some of the turning counts were missing. • The observer counts were incompatible with the video counts (reality) so they were tuned to bring them into accordance.

  9. Example of a tuning adjustment Observer reported 1969 vehicles entering here. This was adjusted to 1790 vehicles to fit the observed video count here. Erie Ontario LaSalle

  10. Often, too few inputs are tuned, and those that are tuned are then over-tuned. The often considerable uncertainty in the tuned inputs is ignored, resulting in overly optimistic assessment of output variance . Tuning can mask model biases that actually exist, making the model less accurate for prediction outside the range of the data (not applicable here). Problems with tuning

  11. A solution: Bayesian analysis • Compute the posterior distribution of the true model inputs, given the data. • But this typically requires use of Markov chain Monte Carlo (MCMC) methods, involving thousands of model runs; too time consuming for CORSIM. • Thus a fast simulator is needed, one which represents those features of CORSIM that allow the data to be related to model inputs.

  12. Structure of the fast simulator • It is a probability network • with the same nodal structure as CORSIMS; • with unknown inputs  (vehicle inter-arrival rates) and P (turning probabilities) that mean the same as in CORSIM; • but, with ‘instantaneous’ vehicles, that (i) enter the network; (ii) turn appropriately; (iii) exit. Note: fast simulators often have a limited purpose, and are not general replacements for the computer model; here, we ignore the key features of time, interactions, signals, etc.

  13. Modeling the demand counts data • Demand counts: Each demand count, CiD, is modelled by a Poisson distribution with mean biNi , where Ni is the true count and bi-1 is the unknown “observer bias.” • The bi are modelled as being i.i.d. Gamma(, ), with  <2 (so that the expected bias is less than 100%), but are otherwise unknown, and assigned a uniform prior distribution.

  14. Modeling the turning counts data • If Ni vehicles arrive at an intersection from a given direction, the numbers turning right, left, and going through, (NiR, NiL, NiT), are assumed to follow a multinomial distribution with probabilities (PiR, PiL, PiT). • The (PiR, PiL, PiT) are assigned the Jeffreys prior distribution  (PiR PiL PiT)-1/2. • The observed turning counts, CiT, were assumed to be accurate.

  15. Latent Variables and Restrictions • Introduce ‘latent’ Ni , counts on all streets: • the total number of vehicles entering an intersection must equal the number leaving; • the video counts, assumed to be accurate, lead to known values of some sums of these Ni ; • Eliminate ‘excess’ Ni (from an initial ?? to 74), in such a way that the restrictions have a simple structure. (Poster by G. Molina.) • Let N denote the constrained region of Ni .

  16. The posterior distribution • By Bayes theorem, the posterior distribution, p(N, l, P, b, ,  | C), of all unknowns given the data C, is simply proportional to the product of the likelihood and the prior, i.e. • fPoisson(CD | ND, b) fmultinomial(CT| P)  pmultinomial(N | P) pPoisson(ND | l)  pJeffreys(P,) pGamma(b | ,  ) 11N.

  17. Computation • The posterior has 192 unknown parameters. • Computation must be done by MCMC. We utilize a Gibbs sampling scheme. • The full conditional distributions for P,, b, and  are, respectively, Dirichlet, Gamma, Gamma, and restricted Gamma; these are easy to sample. •  has a log-concave density; rejection sampling • Each Ni is sampled directly from its discrete distribution (restricted range). • Roughly 100,000 iterations needed.

  18. Gridlock and model constraints • In CORSIM, gridlock (all vehicles stopped) can occur (20% of the runs in last graph). • This essentially defines the unfeasibilityregion, , of the parameter space. • This can be handled in CORSIM by simply ignoring runs that yield gridlock (in the Bayesian inference, this corresponds to multiplying the posterior by 1).

  19. Conclusions • ‘Tuning’ should be replaced by Bayesian inference for unknown parameters or inputs. • It may be necessary to constrain the parameter space by ignoring model runs that lie outside the unfeasibility region. • If evaluation of the computer model is too slow, fast simulators should be sought for which Bayesian inference is feasible.

More Related