Session 3: Calibration

Session 3: Calibration Using observations of the real process

Outline • Calibration and its relatives • Calibration and inversion • History matching • Tuning and extrapolation • Data assimilation • Validation • The role of emulators • Model discrepancy • Why we need to acknowledge model discrepancy • Modelling model discrepancy • Case study – history matching the galaxy UQ12 minitutorial - session 3

Calibration and its relatives UQ12 minitutorial - session 3

Using observations of the real process • Simulation models are nearly always intended to represent some real world process • The issues that we address in this session all arise when we take that representation seriously • And try to relate the simulator to observations of the real process • Three parts to this session • Describing different ways that observational data can be used • Explaining the importance of model discrepancy – the link between model and reality • A case study in a serious and challenging model UQ12 minitutorial - session 3

Terminology • A simulation model produces output from inputs • It has two kinds of inputs • Calibration parameters • Unknown but fixed • Control variables • Known parameters of application context • Calibration and the other tasks considered in this session have one common feature • Using observations of the real process • But they differ slightly in the way those observations are used • And in their underlying objectives UQ12 minitutorial - session 3

Notation • Simulation model has the form y = f(x, θ) • where y is the output • θ denotes calibration parameters • and x denotes control variables • So the model itself is the function f • Observations take the form zi = r(xi) + εi • where εi denotes observation error • and r(x) denotes reality under conditions x • Note that reality doesn’t depend on the calibration parameters UQ12 minitutorial - session 3

Calibration • Calibration involves using the observational data to learn about the values of the calibration parameters • The traditional methodwrites zi = f(xi,θ) + εi • Equating model output f(xi,θ) with reality r(xi) • Estimate θ, e.g. by minimising sum of squared residuals • Call estimate t and predict (extrapolate) real world process at a new x value by f(x, t) • Ignores uncertainty about θ • Treats it as now known to equal t • A Total UQ philosophy demands that we quantify posterior uncertainty in θ, after using the data to learn about it UQ12 minitutorial - session 3

Inversion • Calibration is often referred to in some fields as inversion • Implicitly, the idea is to take the observations , represented as z = f(x, θ) = fx(θ)in which z and x are known, and solve forθ = fx-1(z) • Inverse problems of this kind are extensively studied • Since in practice we don’t have fx-1(.) inversion usually boils down to searching the parameter space just as in calibration • Note that inversion simply tries to find θ • But strict solutions do not exist because of observation error • We need to recognise uncertainty in observations and then in θ • Bayesian methods are often used for this reason UQ12 minitutorial - session 3

History matching • Calibration (or inversion) is referred to by several other names in different fields • For some communities, it is called history matching • However, we will use this term with a slightly different meaning • In calibration we (explicitly or implicitly) search the θ space to learn about its value from how close f(x, θ) gets to reality • In history matching we simply try to identify the part of θ space in which the simulator gets close enough to the observations • According to a criterion of plausibility • History matching is often a useful preliminary to calibration • Or just to see whether any acceptable matches exist UQ12 minitutorial - session 3

Tuning • Tuning is another word that is in some communities synonymous with calibration • However, it often implies a slightly different purpose • The purpose of calibration is typically to learn about the parameters θ, as a scientific question • Tuning is typically done in order to predict the real process • The activity of tuning or calibration (or inversion) is the same • To derive a posterior distribution for θ • But this is used to predict f(x, θ) at new control inputs x • When the prediction is for x outside the range of observations the prediction becomes extrapolation • Which is particularly challenging UQ12 minitutorial - session 3

Tuning and physical parameters • Simulator parameters may be physical or just for tuning • Physical parameters have true values in the real world • We are often really interested in their physical values • Tuning parameters don’t have true physical values • They often represent crude adjustments for missing physics • Their values are whatever makes the model fit best to reality • In the tuning task we learn about both sets • They together make up the set of calibration parameters θ • We may hope to learn about physical parameter values as a by-product of tuning UQ12 minitutorial - session 3

Data assimilation • Many simulators are dynamic • At each time step, the current state vector ξt is updated • Possibly depending on forcing inputs and other parameters • In data assimilation, observations of the process become available at different time points • They are used to tune the state vector sequentially • Intended to improve the model’s tracking over time • So data assimilation is a form of calibration or tuning • Typically, uncertainty about ξ t is accounted for and updated • Kalman filter, ensemble Kalman filter etc • It is not usual to learn about other fixed calibration parameters • But we should do this in the interests of Total UQ UQ12 minitutorial - session 3

Validation • The last use of observations is quite different • Validation is concerned with assessing the validity of the simulator as a representation of reality • Part of verification and validation (V&V) • Verification asks whether the simulation model has been implemented/coded correctly • Validation asks whether it can get sufficiently close to reality • Once it has been tuned • A simple form of validation is offered by history matching • The model can be declared valid if adequate matches exist UQ12 minitutorial - session 3

The role of emulation • All of these tasks involve searching through the parameter space • Comparing f(x, θ) with z for many θ • In principle they can be performed without emulation • As long as the simulator is fast enough • But slow simulators and high-dimensional parameter spaces often make emulation essential • As always, we need to allow for code uncertainty • The toolkit has some pages on calibration and history matching with emulators • But the second part of this session concentrates on another very important source of uncertainty UQ12 minitutorial - session 3

Model discrepancy Relating the simulator to reality UQ12 minitutorial - session 3

A fundamental error • When presenting calibration, I said the traditional approach equates the simulator to reality • The assumption is that for the true value of θ we have r(x) = f(x, θ) • Unfortunately, all models are wrong • “All models are wrong but some are useful” • George E P Box, 1979 • The following simple example explores whathappens when we fail to acknowledge this key fact UQ12 minitutorial - session 3

Example: A simple machine (SM) • A machine produces an amount of work y which depends on the amount of effort t put into it • Model is y = f(t, β) = βt • Control variable t • Calibration parameter βis rate at which effort is converted to work • True value of β is 0.65 • Graph shows observed data • Points lie below y = 0.65t • For large enough t • Because the model is wrong • Losses due to friction etc. UQ12 minitutorial - session 3

SM – calibration with no discrepancy • We wish to calibrate this model • To learn about the true value of β • Using observations zi • With no model discrepancy, this case reduces to a simple linear regression zi = βt+ εi • Posterior distribution of β found by simple regression analysis • Mean 0.602 • Standard deviation 0.005 • True value 0.65 is well outside this distribution • More data makes things worse UQ12 minitutorial - session 3

SM – calibration, no discrepancy With increasing data the posterior becomes more and more concentrated on the wrong (best fit) value UQ12 minitutorial - session 3

The problem is completely general • Calibrating (inverting, tuning, matching) a wrong model gives parameter estimates that are wrong • Not equal to their true physical values – biased • With more data we become more sure of these wrong values • The simple machine is a trivial model, but the same conclusions apply to all simulation models • All models are wrong • In more complex models it is just harder to see what is going wrong • Even with the SM, it takes a lot of data to see any curvature in reality UQ12 minitutorial - session 3

Model discrepancy • The SM example demonstrates that we need to accept that the model does not correctly represent reality • For any values of the calibration parameters • The simulator outputs deviate systematically from reality • Call it model bias or model discrepancy • There is a difference between the model with best/true parameter values and reality r(x) = f(x, θ) + δ(x) • where δ(x) represents this discrepancy • Will typically itself have uncertain parameters UQ12 minitutorial - session 3

SM revisited • Kennedy and O’Hagan (2001) introduced this model discrepancy • Modelled it as a zero-mean Gaussian process • They claimed it acknowledges additional uncertainty • And mitigates against over-fitting of θ • So add this model discrepancy term to the linear model of the simple machine r(t) = βt + δ(t) • With δ(t) modelled as a zero-mean GP • Posterior distribution of β now behaves quite differently UQ12 minitutorial - session 3

SM – calibration, with discrepancy Posterior distribution covers the true value, and does not get worse with increasing data UQ12 minitutorial - session 3

Extrapolation • To reinforce the message, look at extrapolation • Involves predicting the real process at control variable values outside where we have data • Implicitly, the data are used to calibrate • So with traditional calibration we know the model fits reality as well as possible in the range of the data • But without model discrepancy • The parameter estimates will be biased • Extrapolation will also be biased • Because best fitting parameter values are different in different parts of the control variable space • With more data we become more sure of these wrong values UQ12 minitutorial - session 3

SM – extrapolation, no discrepancy Even a minor extrapolation (t = 5) is hopelessly wrong and gets worse with increasing data UQ12 minitutorial - session 3

SM – interpolation, no discrepancy Even interpolation (t = 1) is hopelessly wrong, too, and gets worse with increasing data UQ12 minitutorial - session 3

SM – extrapolation, with discrepancy With model discrepancy, extrapolation is OK, even for large sample – interpolation is very good UQ12 minitutorial - session 3

SM – big extrapolation with discrepancy Although if we extrapolate far enough we find problems, despite including model discrepancy UQ12 minitutorial - session 3

Beyond simple model discrepancy • With simple GP model discrepancy the posterior distribution for θ is typically very wide • Tends to ensure we cover the true value • But is not very helpful • And increasing data does not improve the precision • Similarly, extrapolation with model discrepancy gives wide prediction intervals • And may still not be wide enough • How can we do better? • Primarily by having better prior information UQ12 minitutorial - session 3

Nonidentifiability • Formulation with model discrepancy is not identifiable • For any θ, there is a δ(x) to match reality perfectly • Reality is r(x) = f(x, θ) + δ(x) • Given θ, model discrepancy is δ(x) = r(x) – f(x, θ) • Suppose we had an unlimited number of observations • We would learn reality’s true function r(x) exactly • But we would still not learn θ • It could in principle be anything • And we would still not be able to extrapolate reliably UQ12 minitutorial - session 3

The joint posterior • Calibration leads to a joint posterior distribution for θand δ(x) • But nonidentifiability means there are many equally good fits (θ, δ(x)) to the data • Induces strong correlation between θand δ(x) • This may be compounded by the fact that simulators often have large numbers of parameters • (Near-)redundancy means that different θvalues produce (almost) identical predictions • Sometimes called equifinality • Within this set, the prior distributions for θand δ(x) count UQ12 minitutorial - session 3

The importance of prior information • The nonparametric GP term allows the model to fit and predict reality accurately given enough data • Within the range of the data • But it doesn’t mean physical parameters are correctly estimated • The separation between original model and discrepancy is unidentified • Estimates depend on prior information • Unless the real model discrepancy is just the kind expected a priori the physical parameter estimates will still be biased • To learn about θ in the presence of model discrepancy we need better prior information • And this is also crucial for extrapolation UQ12 minitutorial - session 3

Better prior information • For calibration • Prior information about θ and/or δ(x) • We wish to calibrate because prior information about θ is not strong enough • So prior knowledge of model discrepancy is crucial • In the range of the data • In the SM, a model for δ(x) that says it is zero at t = 0, with gradient zero, but then increasingly negative, should do better • Talk on Monday by JennýBrynjarsdóttir • For extrapolation • All this plus good prior knowledge of δ(x)outside the range of the calibration data • That’s seriously challenging! UQ12 minitutorial - session 3

Careful modelling of discrepancy • In principle, we can learn more if we put in more and better prior information about model discrepancy • This is an important area of ongoing research • But some illustrations of the issues that arise may be instructive UQ12 minitutorial - session 3

Hierarchies of Simulators • Often we have hierarchies of simulators • Usually the resolution is increasing but additional processes could be added UQ12 minitutorial - session 3

Hierarchies of Simulators • Rather than emulate each simulator separately • Emulate simulator 1 and then emulate the difference between outputs at each level • Need to have some runs at common inputs • Need few runs of expensive complex simulators UQ12 minitutorial - session 3

Reified Simulators Modelling the relationship between Simulator 1 and reality is complex Much of its model discrepancy is linked to the improvements possible with Simulator 2, Simulator 3 … UQ12 minitutorial - session 3

Reified Simulators Linking Simulator 2 to reality is almost as tricky And data can’t be used twice UQ12 minitutorial - session 3

Reified Simulators The reified simulator is at the end of currently foreseeable models Its relationship with reality is simpler Other simulators link to reality through the reified simulator UQ12 minitutorial - session 3

Reified Simulators • Reified simulators are ‘imaginary’ simulators that we impose between our simulators and reality • They are the ‘best’ simulator we could visualise at this time • Model discrepancy is split into two: • The discrepancy between the current simulator and the reified simulator • The discrepancy between the reified simulator and reality • Reification does not reduce the discrepancy • But might make it easier to elicit • Reification is one quite formal way to think about model discrepancy UQ12 minitutorial - session 3

Conclusions … • Several tasks rely on observational data • All are deeply compromised if we don’t acknowledge and quantify model discrepancy • Calibration/inversion/tuning • Parameter estimates wrong, distributions too tight • Over-fitting and over-confidence • Tuning/prediction/extrapolation • Predictions wrong and over-confident • Data assimilation • Over-reaction to data and over-confidence again • Validation • Only through correcting discrepancy can a model be valid UQ12 minitutorial - session 3

… and more conclusions • Total UQ demands that we quantify all uncertainties • Or at least try to, and acknowledge those that are unquantified • Model discrepancy is an important source of uncertainty • Quantifying prior beliefs about discrepancy is hard but important – active research area • Analyses incorporating model discrepancy are more complex but also more honest and less self-deceptive • Data assimilation is particularly challenging • Uncertainty about both state vector and fixed calibration parameters – rarely done • Plus model discrepancy uncertainty • Plus code uncertainty when we need to emulate UQ12 minitutorial - session 3

Another conference • UCM 2012 • Still open for posterabstracts • Early bird registrationdeadline 30th April • http://mucm.ac.uk/ucm2012 UQ12 minitutorial - session 3

Session 3: Calibration

Session 3: Calibration

Presentation Transcript

Calibration

CE 451 - Urban Transportation Planning and Modeling Iowa State University Calibration, Adjustment and Validation

Consistent Radiometric Calibration of Landsat TM and MSS Sensors

Automated Hematology: An Overview

Speech Audiometry

Materials and Resources

Chapter 10 Computational photography

English for Tax Administration 2

SEEM 94 Calibration to Single Family RBSA Data Analysis and proposed actions

Chapter 10

Chapter 10

Calibration Status for the Infrared: HIRS, AIRS, IASI, CrIS, HES

Trends in Temperature Calibration

Calibration of Digital EEG machines

Java Web Development with NetBeans IDE

Net Analyte Signal Based Multivariate Calibration Methods

SPRAYER CALIBRATION

Generator Journey through CFD: Contract and Pre-allocation

Session 2 UrbanInfo User Module

Presentation Agenda

Calibration of CFT Detector

AUTO-CALIBRATION AND CONTROL APPLIED TO ELECTRO-HYDRAULIC VALVES