1 / 23

Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts. Ian Jolliffe and David Stephenson. Sampling uncertainty and sampling schemes for (2x2) tables Hit rate Extensions – other measures and serial correlation . Binary deterministic forecasts .

emele
Download Presentation

Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts Ian Jolliffe and David Stephenson Sampling uncertainty and sampling schemes for (2x2) tables Hit rate Extensions – other measures and serial correlation EMS September 2013

  2. Binary deterministic forecasts • Such forecasts are fairly common – forecast whether or not an event will occur • Their format leads to a (2x2) contingency table EMS September 2013

  3. (2 x 2) table and some verification measures • a/(a+c) Hit rate (H) = probability of detection • b/(b+d) False alarm rate (F) = probability of false detection • H-F - Peirce’s (1884) skill score (PSS) • (a+d)/n Proportion correct (PC) • (a+b)/(a+c) Frequency bias • a/(a+b+c). Critical success index (CSI) = threat score … many more -18 in Chapter 3 (by Hogan & Mason) in Jolliffe and Stephenson (2012) Forecast Verification. A Practitioner’s Guide in Atmospheric Science, 2nd edition, Wiley. EMS September 2013

  4. Uncertainty/inference for verification measures • Given the value of some verification measure, some idea of its uncertainty is needed to make inferences e.g. construct confidence intervals • The example is a subset of the well-known Finlay tornado data for May 1884. The figure resamples from these data. EMS September 2013

  5. Sampling schemes • Could have: • a, b, c, d all independent Poisson • n fixed; a, b, c, d multinomial • Row totals fixed or column totals fixed – independent binomials • Row totals and column totals fixed – hypergeometric Which is most plausible? Does it make much difference? EMS September 2013

  6. MULTINOMIAL SAMPLING BINOMIAL SAMPLING • Binomial sampling has fixed a+c=10 and so hit rate is always a multiple of 1/10 • Multinomial has additional sampling variation in hit rates between 1/10ths EMS September 2013

  7. Sampling schemes • Could have: • a, b, c, d all independent (Poisson) • n fixed; a, b, c, d multinomial • Row totals fixed or column totals fixed – independent binomials • Row totals and column totals fixed – hypergeometric • Thesecond of these is the most plausible for much climate data • Hogan & Mason (Chapter 3 of Jolliffe & Stephenson) give (approximate) variances for 16 measures, but they assume column totals fixed. EMS September 2013

  8. Sampling schemes • Could have: • a, b, c, d all independent (Poisson) • n fixed; a, b, c, d multinomial • Row totals fixed or column totals fixed – independent binomials • Row totals and column totals fixed – hypergeometric • Thesecond of these is the most plausible for much climate data – but you may disagree!! • Hogan & Mason (Chapter 3 of Jolliffe & Stephenson) give (approximate) variances for 16 measures, but they assume column totals fixed. EMS September 2013

  9. Variance of hit rate • Hit rate or probability of detection is H = a/(a+c) • Suppose that (a+c) is fixed (binomial sampling) and that θH is the probability that the event has been forecast, given that it occurred • Then var(H) = θH(1- θH)/(a+c) which is estimated by ac/(a+c)3 • The multinomial sampling scheme can be obtained by first sampling (a+c) from a binomial with n trials and probability of success equal to the probability of event occurring (base rate); then, given the sampled value of (a+c), sample from the binomial with (a+c) trials and probability of success θH EMS September 2013

  10. Variances of hit rate II • It turns out that with multinomial sampling, var(H) = θH(1- θH)/(a+c) is replaced by var(H) = θH(1- θH)E[(a+c)-1] with slight abuse of notation • Using a variance expression based on fixed (a+c) ignores the variability in (a+c) that occurs under multinomial sampling • There is a complication that (a+c) can equal zero, leading to an infinite value of E[(a+c)-1], but data with (a+c) = 0 can be ignored as they provide no information on the performance of the forecasts EMS September 2013

  11. Multinomial vs. binomial comparison for hit rate • The table gives , for n=100 , some values of the ratio of multinomial vs. binomial variances for various values of (a+c) • The diagram shows this ratio for more values of (a+c) and three values of n EMS September 2013

  12. Multinomial vs. binomial comparison • Inflation of variance for most values of (a+c) • Exception for very small values of (a+c) – due to frequently discarded zero values? • Maximum inflation of around 30% occurs around (a+c) = 4 • Inflation decreases towards 0 as (a+c) increases • A remarkable similarity of curves for different n • For the tornado data, multinomial variance is 12.7% larger than for binomial EMS September 2013

  13. Extensions • Only one measure (hit rate) has been examined here • Exactly the same reasoning can be used for other measures with a similar ratio formula • Modifications are needed for other measures • Serial correlation is another complication – the results given assume independence which is not necessarily true. Can have a bigger effect than choice of sampling scheme. EMS September 2013

  14. Conclusions • When reporting values of verification measures it is important to quantify the uncertainty associated with that value • For the seemingly simple case of data in a (2x2) contingency table this is a surprisingly subtle task because • Different sampling schemes lead to different variances • Serial correlation (or other forms of dependence) also change variances • Some fairly general results can be found, but for many measures and situations tailor-made calculations may be needed • Not withstanding the difficulties, the calculations should be done EMS September 2013

  15. Questions? Comments? i.t.jolliffe@exeter.ac.uk EMS September 2013

  16. Other verification measures • Exactly the same reasoning can be used to obtain multinomial- based variances for measures which are proportions, with the denominator equal to a sum of cell counts and the numerator a sum of a subset of the denominator counts, for example • F = False alarm rate b/(b+d) • J = Threat score a/(a+b+c) • The variance comparison table for H can be used • For F replacing (a+c) by (b+d) • For J, replacing (a+c) by (a+b+c). The comparison here is with an unrealistic sampling scheme, which nonetheless corresponds to a variance estimate given in the literature. EMS September 2013

  17. Other verification measures II • For proportion correct, there are exact analytic expressions for variance under both binomial and multinomial sampling, which can be compared For the tornado data, the percentage increases in variance for multinomial sampling compared to the alternative scheme assumed by the table are 12.7 (H), 3.4 (J) and 17.5 (PC) Asymptotic expressions are available for some other measures, but different considerations are needed for exact values, possibly including simulation EMS September 2013

  18. Serial correlation – another complication • All that has been said has assumed independence of the n observations being forecast • This is not necessarily true – there may be serial correlation. Rain today may be more likely if there was rain yesterday than if there was not • Serial correlation can have a bigger effect on variance than assuming the wrong sampling scheme EMS September 2013

  19. Serial correlation – an example • Gabriel & Neumann (1962), QJRMS, 88, 90-95, give data on wet/day days in Tel Aviv for 27 years of daily data, November-April • There is serial correlation – for example, for November the probability of a wet day following a wet (dry) day is 0.60 (0.13) • To assess how much such serial correlation affects variances of verification measures use Markov chain simulation EMS September 2013

  20. Markov chain simulation • Wilks (2010), QJRMS, 136, 2109-2118 considers probability forecasts and builds in serial dependence between forecasts directly • We consider binary deterministic forecasts with dependence built directly into the observations and hence indirectly into the forecasts • We simulate from a two-state Markov chain for various values of n (sample size), s (base rate) and ρ, the serial correlation EMS September 2013

  21. Multinomial vs. binomial comparison for hit rate • The table gives , for n=100 , some values of the ratio of variances with/without serial correlation for various values of (a+c) and ρ • The diagram shows this ratio for more values of (a+c) and three values of n EMS September 2013

  22. Serial correlation – simulation results • Ratio gets bigger for increasing ρ • Largest values are bigger than when comparing sampling schemes • For given n, things get worse as (a+c) decreases • Things get worse for lower base rate EMS September 2013

  23. Serial correlation - examples • The Gabriel/Neumann data have large n and moderate s and ρ, so the effect of serial correlation is small • For example, for November, ρ=0.47, s=0.24 and n=810 leading to only a 1% increase in variance • For the May tornado data, n is again large (540) but s is much smaller (0.02). We don’t know ρ but if it were 0.5, then variance would be increased by about 30% by serial correlation. • In reality non-independence is likely to exist in the tornado data but will be more complex with space and time both involved EMS September 2013

More Related