1 / 47

Validation of uncertain predictions against uncertain observations

Validation of uncertain predictions against uncertain observations. Scott Ferson, William Oberkampf and Lev Ginzburg 20 February 2008, REC 2008, Savannah, Georgia. V & V. Verification (checking the math) Code testing Interval analysis, probability bounds analysis Units/dimension checking

luzm
Download Presentation

Validation of uncertain predictions against uncertain observations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Validation of uncertain predictions against uncertain observations Scott Ferson, William Oberkampf and Lev Ginzburg 20 February 2008, REC 2008, Savannah, Georgia

  2. V & V • Verification (checking the math) • Code testing • Interval analysis, probability bounds analysis • Units/dimension checking • Validation (checking against data)

  3. Goals • Objectively measure the conformance of predictions with empirical data • Use this measure to characterize the reliability of other predictions

  4. Initial setting • The model is fixed, at least for the time being • No changing it on the fly during validation • A prediction is a probability distribution • Expressing stochastic uncertainty • Observations are precise (scalar) numbers • Measurement uncertainty is negligible relaxed later

  5. Validation metric • A measure of the mismatch between the observed data and the model’s predictions • Low value means a good match • High value means they disagree • Distance between prediction and data

  6. Desirable properties of a metric • Expressed in physical units • Generalizes deterministic comparisons • Reflects full distribution • Not too sensitive to long tails • Mathematical metric • Unbounded (you can be really off)

  7. How the data come 400 350 300 Temperature [degrees Celsius] 250 200 600 700 800 900 1000 Time [seconds]

  8. 1 Probability 0 200 250 300 350 400 450 Temperature How we look at them

  9. One suggestion for a metric 1 Area or average horizontal distance between the empirical distribution Snand the predicted distribution Probability 0 200 250 300 350 400 450 Temperature

  10. Area metric • Minkowski L1 metric between distributions • Univariate version of the Wasserstein distance between the prediction F and data distribution Sn, where the minimum is over all possible stochastic dependencies between X and Y • Smallest mean absolute difference of deviates

  11. a = L(2,1.6) + 5 a ~(range=[5.25515,14.5592], mean=7, var=[1.94,2.56]) b = N(7,2.25) c = mix( N(4,0.5), N(10,0.5)) d = 0.9*a + 0.4 show a in black show c in blue hide c show b in blue hide b show d in blue hide d Reflects full distribution 1 Matches in mean Both mean and variance Matches well overall 0 0 10 20 1 0 0 10 20 1 Probability 0 5 15 10

  12. Single observation 1 Probability 0 0 1 2 3 4 A single datum can’t match an entire distribution (unless it’s degenerate)

  13. When the prediction is really bad • The metric degenerates to simple distance • Probability is dimensionless, so units are the same 1 d 24 Probability 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28

  14. Depends on the local scale 1 • The metric depends on the units • Could standardize (divide by s.d.), but this means the metric will no longer be in physical units d 0.45 0 0 1 2 3 4 1 d 45 0 0 100 200 300 400

  15. Why physical units? • Distributions in the left graph don’t overlap but they seem closer than those on the right 1 1 Probability 0 0 0 1 2 3 4 0 1 2 3 4

  16. Why an unbounded metric? • Neither overlaps, but left is better fit than right • Smirnov’s metric Dmax considers these two cases indistinguishable (they’re both just ‘far’) 1 Probability 0 0 1 2 3 4 0 10 20 30 40

  17. The model says different things 400 350 300 Temperature [degrees Celsius] 250 200 600 700 800 900 1000 Time [seconds]

  18. 1 Probability 0 200 250 300 350 400 450 Temperature

  19. Pooling data comparisons • When data are to be compared against a single distribution, they’re pooled into Sn • When data are compared against different distributions, this isn’t possible • Conformance must be expressed on some universal scale

  20. u2 u3 u1 N(2, 0.6) ~normal(range=[0.454502,3.5455], mean=2, var=0.36) max(0.0001,exponential(1.7)) ~(range=[0.0001,9.00714], mean=[1.699999,1.7001], var=[2.43,2.89]) mix(U(1,5),N(10,1)) * 2.3 ~(range=[2.3,28.9244], mean=14.95, var=70.9742) Universal scale ui=Fi(xi) where xi are the data and Fi are their respective predictions 1 1 1 Probability 0 0 0 1 10 100 1000 0 1 2 3 4 0 5 10

  21. Backtransforming to physical scale 1 G u Probability Probability 0 1 2 3 4 5 0

  22. Backtransforming to physical scale • The distribution of G1(Fi(xi)) represents the empirical data (like Sn does) but in a common, transformed scale • Could pick any of many scales, and each leads to a different value for the metric • The distribution of interest is the one used for the regulatory statement

  23. Number of function evaluations • Some models are difficult to evaluate • Extracting distributional predictions may be expensive in terms of function evaluations • Is the validation metric applicable when only very coarse predictions based on few function evaluations are available?

  24. Coarse prediction 1 Prediction can be expressed as an ‘empirical’ distribution too Probability 0 0 1 2 3 4

  25. Statistical test for model accuracy • Kolmogorov-Smirnov test of distribution of ui’s against uniform over [0,1] • This tests whether the empirical data are as though they were drawn from the respective prediction distributions Probability integral transform theorem (Angus 1994) says the u’s will be distributed as uniform(0,1) if xi ~ Fi • Assumes the empirical data are independent of each other

  26. Epistemic uncertainty

  27. How should we compare intervals? Prediction Data

  28. Validation for intervals • Validation measure is the smallest difference • Overlapping intervals match perfectly • Validity is distinct from precision • Otherwise no value in an uncertainty analysis

  29. http://encarta.msn.com/map_701512318/English_Channel.html

  30. Epistemic uncertainty about distributions z=0.0001; zz =9.999 show z,zz a = N([6,7],1)-1 show a b = -1+mix(1,[5,7], 1,[6.5,8], 1,[7.6,9.99], 1, [3.3,6], 1,[4,8], 1,[4.5,8], 1,[5,7], 1,[7.5,9], 1,[4,8], 1,[5,9], 1,[6,9.99]) show b in blue b = -0.2+mix(1, [9,9.6],1, [5.3,6.2], 1,[5.6,6], 1,[7.8,8.4], 1,[5.9,7.8], 1,[8.3,8.7], 1,[5,7], 1,[7.5,8], 1,[7.6,9.99], 1, [3.3,6], 1,[4,8], 1,[4.5,8], 1,[5,7], 1,[8.5,9], 1,[7,8], 1,[7,9], 1,[8,9.99]) breadth(env(rightside(a),b)) 2.137345705795 c = -4 b = -0.2+mix(1, [9,9.6],1, [5.3,6.2]+c, 1,[5.6,6]+c, 1,[7.8,8.4], 1,[5.9,7.8], 1,[8.3,8.7], 1,[5,7], 1,[7.5,8], 1,[7.6,9.99], 1, [3.3,6], 1,[4,8], 1,[4.5,8]+c, 1,[5,7]+c, 1,[8.5,9], 1,[7,8], 1,[7,9], 1,[8,9.99]) breadth(env(rightside(a),b)) / 2 1.329372857714 1 1 1 Cumulative probability 0 0 0 0 5 10 0 5 10 0 5 10 Probability boxes (p-boxes) Left and right bounds on the uncertain CDF

  31. Epistemic uncertainty in predictions a = N([5,11],1) show a b = 8.1 show b in blue b = 15 breadth(env(rightside(a),b)) 4.023263478773 b = 11 breadth(env(rightside(a),b)) / 2 0.4087173895951 • In left, the datum evidences no discrepancy at all • In middle, the discrepancy is relative to the edge • In right, the discrepancy is even smaller 1 1 1 Probability d 0.4 d = 0 d 4 0 0 0 0 10 20 0 10 20 0 10 20

  32. Epistemic uncertainty in both z=0.0001; zz =9.999 show z,zz a = N([6,7],1)-1 show a b = -1+mix(1,[5,7], 1,[6.5,8], 1,[7.6,9.99], 1, [3.3,6], 1,[4,8], 1,[4.5,8], 1,[5,7], 1,[7.5,9], 1,[4,8], 1,[5,9], 1,[6,9.99]) show b in blue b = -0.2+mix(1, [9,9.6],1, [5.3,6.2], 1,[5.6,6], 1,[7.8,8.4], 1,[5.9,7.8], 1,[8.3,8.7], 1,[5,7], 1,[7.5,8], 1,[7.6,9.99], 1, [3.3,6], 1,[4,8], 1,[4.5,8], 1,[5,7], 1,[8.5,9], 1,[7,8], 1,[7,9], 1,[8,9.99]) breadth(env(rightside(a),b)) 2.137345705795 c = -4 b = -0.2+mix(1, [9,9.6],1, [5.3,6.2]+c, 1,[5.6,6]+c, 1,[7.8,8.4], 1,[5.9,7.8], 1,[8.3,8.7], 1,[5,7], 1,[7.5,8], 1,[7.6,9.99], 1, [3.3,6], 1,[4,8], 1,[4.5,8]+c, 1,[5,7]+c, 1,[8.5,9], 1,[7,8], 1,[7,9], 1,[8,9.99]) breadth(env(rightside(a),b)) / 2 1.329372857714 1 1 1 d = 0 d 0.05 d 0.07 Probability 0 0 0 0 5 10 0 5 10 0 5 10 Predictions in white Observations in blue

  33. Area and distribution of differences 1 Probability 0 1 2 3 4 5 1 2 3 4 0 1 2 3 4 5 6 7 1 2 3 1 0 1 2 -1 0 1 2 -1 0 1 2 3 4 5 0 1 3 1 Probability 0 1 2 3 4 5 6 1 2 3 4 0 1 2 3 4 5 6 7 1 2 3 1 0 0 1 2 3 -1 0 1 2 3 -1 0 1 2 3 4 5 6 -1 0 1 2 4

  34. Measure for uncertain numbers • Smallest possible expected absolute difference • Infimum taken over all possible distributions and over all possible dependencies between them • Not a metric • In general, hard to compute for imprecise probs • Quite easy for p-boxes

  35. Three other schemes • Pompieu’s method • Standard metric for possibly overlapping sets • Range of ideas • Natural approach for interval analysts • Upper limit is hard to compute • Double metric • Consider left and rights edges separately

  36. Д = (7, 12) Д = (9, 9) Д = (5, 6) Double metric 1 Probability 0 0 10 20 0 10 20 0 10 20 (0,0) only when left edges coincide and right edges coincide Prediction and data match in location and precision

  37. Validation for imprecise probabilities Measure Scheme Metric Compute Strictness Shortest distance Shape No Medium Reasonable Max-sup-inf Element Yes Hard Too strict Range of areas Element No Hard Reasonable Double metric Shape Yes Easy Too strict

  38. Validation: summary • Both assessment and reliability of extrapolation • How good is the model? • Should we trust its pronouncements? • Updating is a separate activity • Need metric to be both ad hoc and universal • Epistemic uncertainty introduces some wrinkles • Full credit for being modest about predictions

  39. End

  40. Definition of a true metric • Positive, d(x, y)  0 • Symmetric, d(x, y) = d(y, x) • Identicals indistinguishable, d(x, y) = 0 x = y • Triangle inequality, d(x, y) + d(y, z)  d(x, z) • Quasi-, semi-, pseudo-, ultra-metric

  41. Other metrics • Area is only one of many possible metrics • Area favors central tendency (median) • Could also use the medial distance from a datum to the distribution, or maybe the 95th percentile of distances • Might prefer conformance in the tails, or one tail in particular

  42. Degrees of impossibility • If a datum is completely outside the range of the prediction, it’s ‘impossible’ • Transforming to the u scale makes it 0 or 1 • We’d like to preserve how far outside it is

  43. F<(x), x < 0 F*(x) = F(x), 0  x  1 F>(x), x > 1 2 1 1 0 0 10 20 -1 0 0 10 30 20 40 Extended distribution functions F F* Probability Extension slopes can be set by the distribution’s dispersion, to mimic tails, or as just relocated 45 lines

  44. Using extensions in the metric • Extended functions Fi* can be used to get u’s (now no longer ranging only on [0,1]) • The common backtransformation scale can also be extended to G* to accept these u’s • This allows values considered impossible by the prediction to be represented

  45. Vector of outputs • Usually want to treat dimensions separately • Possible to unify (pool) prediction-observation pairs even if they’re from different dimensions • Degrees, seconds, pascals, meters, etc. • But there’s no G for backcalculation and so there can’t be a physically meaningful scale

  46. Comparing accuracies • Questions like “Is the match for temperature as good as the match for conductivity?” also require a universal scale to which all physical dimensions must be transformed • If we do this, the metric becomes a norm

More Related