Presented by: Flavia Tsang M.A.Sc Student 12 th March 2004

Microsimulation And Modelling Applications: Methods, Issues and Analysis(MAMAMIA)Module 1 – Unit 3:Estimation and Validation Presented by: Flavia Tsang M.A.Sc Student 12th March 2004

Outline • Part I • Simulation-Assisted Estimation • Part II • Parameter Estimation Strategies for Large Scale Urban Models • Part III • Validation of Microsimulation Models

Part ISimulation-Assisted Estimation

Simulation-Assisted Estimation Purpose: • To examine various methods of estimation: • Maximum Simulated Likelihood • Method of Simulated Moments • Method of Simulated Score • To understand the advantages and limitations of each form of estimation, thereby facilitating our choice among methods

Maximum Simulated Likelihood • Review - the log-likelihood function is: where is a vector of parameters, Pn() is the (exact) probability of the observed choice of observation n The summation is over a sample of N • The maximum likelihood (ML) estimation is the value of that maximizes LL()

Maximum Simulated Likelihood • Since the gradient of LL() is zero at the maximum, the ML estimator can also be defined as the value of  at which where sn() = ln Pn()/  is the score for observation n • Maximum simulated likelihood (MSL) has the same formulation as maximum likelihood (ML) except that simulated probabilities are used in lieu of the exact probability

Maximum Simulated Likelihood • The main issue with MSL arises because of the log transformation • Suppose Phat is an unbiased simulator for Pn() • Since log operation is a nonlinear transformation, ln Phatis not unbiased for ln Pn() • The bias in the simulator of ln Phat translate into bias in the MSL estimator

Maximum Simulated Likelihood • To determine the asymptotic properties of the MSL estimator: • How the simulation bias behaves when the sample size increases?  depends on the relationship between number of draws (R) and sample size (N) • If R is fixed, then the MSL estimator does not converge to the true parameters, because of the simulation bias in ln • If R rises with N, the simulation bas disappears as N rises without bound

Maximum Simulated Likelihood • In summary, • If R is fixed, then MSL is inconsistent • If R rises at any rate with N, then MSL is consistent • If R rises faster than the root of N, then MSL is not only consistent but also efficient

Method of Simulated Moments • Method of Moments (MOM) is defined as the parameters that solve the equation: where dnj is the dependent variable that identified the chosen alternative: dnj = 1 if n chose j, 0 otherwise znj is a vector of exogenous variable called weights

Methods of Simulated Moments • Dnj – Pnj() are the residuals • MOM estimator is the parameter value at which the residuals are uncorrelated with the instruments in the sample • When some instruments are chosen so that MOM becomes efficient, such instructments are called the ideal instruments • For standard logit model, the explanatory variables can be used as ideal instruments

Method of Simulated Moments • The method of simulated moment (MSM) is obtained by replacing the exact probability Pnj()with simulated probability • The MSM estimator is the value of that solves

Method of Simulated Moments • Pnjth enters the equation linearly, therefore, Pnjth is unbiased for Pnj() • Since there is no simulation bias in the estimation condition, MSM estimator is consistent even when R is fixed • MSM still contains simulation noise (variance due to simulation). This noise becomes smaller as R rises • Disadvantage of MSM is that it loss efficiency when non-ideal weights are used

Method of Simulated Moments • In summary, • Disadvantage of MSM is that it loss efficiency when non-ideal weights are used • In case the instrument are simulated with bias and the instruments are not ideal, the MSM estimator has same properties as MSL except that it is not asymptotically efficient • Yet, it overcome the problem of MSL • MSM is consistent when R is fixed

Method of Simulated Scores • The method of simulated score (MSS) estimator is the value of that solves where snth is a simulator of the score • The score can be rewritten as:

Method of Simulated Scores • An unbiased simulator for the second term Pnj/is easily obtained by taking the derivative of the simulated probability • The difficulty arises in finding an unbiased simulator for the first term 1/Pnj() • Simply taking the inverse of the simulated probability does not provide an unbiased estimator

Method of Simulated Scores • Need to obtain 1/Pnj(): • Consider drawing balls from an urn that contains many balls of different colours. Suppose the probability of obtaining a red ball is 0.2. How many draws would it take on average to obtain a red ball? • Answer: 1/0.2 =5 • The same idea can be applied to choice probability

Method of Simulated Scores • To obtain 1/Pnj(): 1. Take a draw of the random terms from their density 2. Calculate the utility of each alternative with this draw 3. Determine whether alternative j has the highest utility 4. If so, call the draw an accept. If not, call the draw a reject and repeat steps 1 to 3 with a new draw

Method of Simulated Scores • To obtain 1/Pnj() (cont’d): 5. Define Br as the number of draws that are taken until the first accept is obtained. Perform step 1 to 4 R times, obtaining Br for r =1, … R The simulator of 1/Pnj() is: • The simulator is unbiased for 1/Pnj()

Method of Simulated Scores • In summary, • no guarantee that an accept will be obtained in a given number of draws • not continuous in parameters • MSS overcomes the problems of MSL • For fixed, for fixed R, MSS is consistent and asymptotically normal

Part II Parameter Estimation Strategies for Large Scale Urban Models

Parameter Estimation Strategies Purpose: • Examine different strategies for parameter estimation in large scale models; five different strategies will be presented: • Limited view • Piece wise • Simultaneous • Sequential • Bayesian Sequential

A Modular Modelling System • There are connections between submodels, representing data flows within the modelling system • The submodels have a certain degree of independence from each other, and the degree to which they are treated as independent leads to different strategies

Limited View Approach • Focus on the entire modelling system, ignoring individual submodels • The modelling system is run with the input set to observed values and the parameters are adjusted until the modeling system’s outputs closely match corresponding observed values

Limited View Approach Advantages: • Allow the model-builder to concentrate on how the model will be used in application, instead of how the parameters will be estimated • Focusing on the entire modelling system is likely to reveal structure problems Disadvantages: • Cannot make use of “extra data”, which are synthesized by other submodels (eg. In a nested logit formulation, result of the lower model informs the upper model)

Piece Wise Estimation • Connections between submodels are ignored • The parameter of each submodel are estimated based on the data that directly affect that submodel

Piece Wise Estimation Advantages: • Breaks the problem into more manageable pieces • Easer to use extra data during the consideration of each submodel ( i.e. entirely different data could be used to inform parameter values, eg. targeted sample data, stated preference data, and even data from a different city) • Often correspond to well-establish theories and are often operationalized using fairly simple equations

Piece Wise Estimation Disadvantages: • Sometimes it is impossible to consider a submodel on its own, if no observed data is available to replace the synthesized data • If dependent submodels are non-linear, this could lead to a bias in outputs • Combining accurate submodels do not guarantee an accurate overall modelling system

Simultaneous Estimation • Overall modelling system is run and its outputs are compared to various targets • Concurrently, each of the individual submodels are also run to process the extra data available

Simultaneous Estimation Advantages: • Overcomes many data availability problems, since missing data can be synthesize from other submodels • Particularly appropriate when there is a theoretical reason why a parameter in one submodel should be identical to a parameter in another submodel Disadvantages: • Computationally intensive

Sequential Estimation • Combine piece wise estimation with the limited view approach • The parameters of individual submodels are estimated, and then the overall model is considered • Various parameters are identified as being crucial to the higher level behaviour of the model, and these parameters are estimated by examining the entire modelling system

Sequential Estimation Advantages: • Various extra data can be used when estimating the lower level models, but the highest level of estimation can ignore these data • Observed data not required for all data flows because previously calibrated model can provide synthetic data, and • The entire modelling system is adjusted in a systematical way to match observed data

Sequential Estimation Disadvantages: • Less accurate than simultaneous estimation • Error estimates on parameter values are biased • For a parameter that is shared between submodels, its value will be determined by the last estimation procedure; the information on the parameter from earlier estimation will be discarded

Bayesian Sequential Estimation • Allows for a prior density function to specify what is already know about certain parameter values (eg. range of acceptable parameter values) • Advantages: • The parameter estimates and confidence limits from the estimation of the parameters within individual submodels could be used when estimating at the highest level of the modelling system • Less complex than full simultaneous estimation

Part IIIValidation of Microsimulation Models

Relationship of Validation, Verification and establishing credibility Validation Verification Establish Credibility Validation System Conceptual Model Simulation Program Correct Results Results Implemented Analysis and data Make model runs Sell Results to Management Programming Source: Law and Kelton (1991) “Simulation Modelling and Analysis”, McGraw-Hill, Inc.

The Role of Validation • Validation is a proactive, diagnostic effort to ensure that the model’s results are reasonable and credible • A formal validation exercise produces an extensive battery of tests /measures /comparisons • Validation is qualitatively distinct from just making sure the model is doing what one has told it to do (Verification) • Quantitative measures are used for validation, but the ultimate impact is inherently qualitative

Special Challenges Long Term Analysis • Projection for these models extend well into the future, often several decades beyond the present. There are few sources of “future data” against which to assess reasonableness

Special Challenges Monte Carlo Nature • Simulations driven by random inputs will produce random output • Decision-makers, however, dislike such variation. Almost without exception, they would prefer to see point estimates

Special Challenges Which items to validate? • There is a great mass of information being projected, different portion of which are relevant for various analysis • This poses the unavoidable question of which particular items to validate, given resource constraints, and of how those validations can most effectively be carried out

References Part I Train, K. (2003) Chapter 10 Simulation Assisted Estimation, In “Discrete Choice Methods with Simulations”, Cambridge University Press. Part II Abraham, J. (2000) “Parameter Estimation in Urban Models: Theory and Application to a Land Use Transportation Interaction Model of the Sacramento, California Region” Doctor of Philosophy Dissertation. Department of Civil Engineering, University of Calgary Part III Mutton, Sutherland and Weeks (2000) Validation of longitudinal dynamic microsimulation models: experience with CORSIM and DYNACAN, In “Microsimulation Modelling for Policy Analysis”, Cambridge University Press.

Presented by: Flavia Tsang M.A.Sc Student 12 th March 2004