statistical weather forecasting 3 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Statistical Weather Forecasting 3 PowerPoint Presentation
Download Presentation
Statistical Weather Forecasting 3

Loading in 2 Seconds...

play fullscreen
1 / 37

Statistical Weather Forecasting 3 - PowerPoint PPT Presentation

  • Uploaded on

Statistical Weather Forecasting 3. Daria Kluver Independent Study From Statistical Methods in the Atmospheric Sciences By Daniel Wilks. Let’s review a few concepts that were introduced last time on Forecast Verification. Purposes of Forecast Verification

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Statistical Weather Forecasting 3' - aleron

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
statistical weather forecasting 3

Statistical Weather Forecasting 3

Daria Kluver

Independent Study

From Statistical Methods in the Atmospheric Sciences

By Daniel Wilks


Purposes of Forecast Verification

    • Forecast verification- the process of assessing the quality of forecasts.
    • Any given verification data set consists of a collection of forecast/observation pairs whose joint behavior can be characterized in terms of the relative frequencies of the possible combinations of forecast/observation outcomes.
    • This is an empirical joint distribution
the joint distribution of forecasts and observations
The Joint Distribution of Forecasts and Observations
  • Forecast =
  • Observation =
  • The joint distribution of the forecasts and observations is denoted
  • This is a discrete bivariate probability distribution function associating a probability with each of the IxJ possible combinations of forecast and observation.

The joint distribution can be factored in two ways, the one used in a forecasting setting is:

  • Called calibration-refinement factorization
  • The refinement of a set of forecasts refers to the dispersion of the distribution p(yi)

If yi has occurred, this is the probability of oj happening.

Specifies how often each possible weather event occurred on those occasions when the single forecast yi was issued, or how well each forecast is calibrated.

The unconditional distribution, which specifies the relative frequencies of use of each of the forecast values yi sometimes called the refinement of a forecast.

forecast skill
Forecast Skill
  • Forecast skill- the relative accuracy of a set of forecasts, wrt some set of standard control, or reference, forecast (like climatological average, persistence forecasts, random forecasts based on climatological relative frequencies)
  • Skill score- a percentage improvement over reference forecast.

Accuracy of reference


Accuracy that would be achieved by a perfect forecast.

on to new material
On to new material…
  • 2x2 Contingency tables
  • Scalar attributes of contingency tables
    • Tornado example
    • NWS vs vs climatology
  • Skill Scores
  • Probabilistic Forecasts
  • Multicategory Discrete Predictands
  • Continuous Predictands
    • Plots and score
  • Probability forecasts for multicategory events
  • Non-Probabilistic Field forecasts
nonprobabilistic forecasts of discrete predictands
Nonprobabilistic Forecasts of Discrete Predictands
  • Nonprobabilistic – contains unqualified statement that a single outcome will occur. Contains no expression of uncertainty.
the 2x2 contingency table
The 2x2 Contingency Table
  • The simplest joint distribution is from I=J=2. (or nonprobabilistic yes/no forecasts)
  • I=2 possible forecasts
  • J=2 outcomes

i=1 or y1, event will occur

i=2 or y2, event will not occur

j=1 or o1, event subsequently occurs

j=2 or o2, event doesn’t subsequently occur


their relative frequency, a/n is the sample estimate of the corresponding joint probability p(y1,o1)

b occasions called “false alarms”

a forecast-observation pairs called “hits”

the relative frequency estimates the joint probability p(y1,o2)

the relative frequency estimates the joint probability p(y2,o2)

C occasions called “misses”

the relative frequency estimates the joint probability p(y2,o1)

D occasions called “correct rejection or correct negative ”

scalar attributes characterizing 2x2 contingency tables
Scalar Attributes Characterizing 2x2 contingency tables
  • Accuracy –
    • proportion correct
    • Threat Score (TS)
    • Odds ratio
  • Bias-
    • Comparison of the average forecast with the average observation
  • Reliability and Resolution-
    • False Alarm Ratio
  • Discrimination-
    • Hit rate
    • False Alarm Rate
nws weather com climatology example
NWS,,climatology example
  • 12 random nights from Nov 6 to Dec 1
  • Will overnight lows be colder than or equal to freezing?
skill scores for 2x2 contingency tables
Skill Scores for 2x2 Contingency Tables
  • Heidke Skill Score-
    • based on the proportion correct referenced with the proportion correct that would be achieved by random forecasts that are statistically independent of the observations.
  • Peirce Skill Score-
    • similar to Heidke Skill score, except the reference hit rate in the denominator is random and unbiased forecasts.
  • Clayton Skill Score
  • Gilbert Skill Score or Equitable Threat Score
  • The Odds Ratio (ɵ) can be used as a skill score

Threat score gives a better comparison, because large number of no forecasts are ignored.


Odds ratio is 45.3>1, suggesting better than random performance

Bias ratio is B=1.96, indicating that approximately twice as many tornados were forecast as actually occurred

FAR = 0.720, which expresses the fact that a fairly large fraction of the forecast tornados did not eventually occur.

H=0.549 and F=0.0262, indicating that more than half of the actual tornados were forecast to occur, whereas a very small fraction of the non tornado cases falsely warned of a tornado.

Skill Scores:






Gilbert pointed out that never forecasting a tornado produces an even higher proportion correct:, PC = (0+2752)/2803=0.982.

Finley chose to evaluate his forecasts using the proportion correct, PC = (28+2680)/2803=0.966.

Dominated by the correct no forecast.

what if your data are probabilistic
What if your data are Probabilistic?
  • For a dichotomous predictand, to convert from a probabilistic to a nonprobabilistic format requires selection of a threshold probability, above which the forecast will be “yes”.
  • Ends up somewhat arbitrary.

Threshold that would maximize the Threat score

Produce unbiased forecasts (b=1)

Nonprobabilistic forecasts of the more likely of the two events.

Climatological probability of precip

multicategory discrete predictands
MulticategoryDiscrete Predictands
  • Make into 2x2 tables

R m s

R non-rain






nonprobabilistic forecasts of continuous predictands
Nonprobabilistic Forecasts of continuous predictands
  • It is informative to graphically represent aspects of the joint distribution of nonprobabilistic forecasts for continuous variables.

Conditional Quantile Plots

Conditional distributions of the observations given the forecasts are represented in terms of selected quantiles, wrt the perfect 1:1 line.

MOS observed temps are consistently colder than the forecasts

Subjective forecasts are essentially unbiased.

Subjective forecasts are somewhat sharper, or more refined,

more extreme temperatures being forecast more freq.

Contain 2 parts, representing the 2 factors in the calibration – refinement factorization of the joint distribution of forecasts and observations.

performance of MOS forecasts

b) performance of subjective forecasts

These plots are examples of a diagnostic verification technique, allowing diagnosis of a particular strengths and weakness of a set of forecasts through exposition of the full joint distribution.

scalar accuracy measures
Scalar Accuracy Measures
  • Only 2 scalar measures of forecast accuracy for continuous predictands in common use.
  • Mean Absolute Error, and Mean Squared Error
mean absolute error
Mean Absolute Error
  • The arithmetic average of the absolute values of the differences between the members of each pair.
  • MAE = 0 if forecasts are perfect. Often used to verify temp forecasts.
mean squared error
Mean Squared Error
  • The average squared difference between the forecast and observed pairs
  • More sensitive to larger errors than MAE
  • More sensitive to outliers
  • MSE = 0 for perfect
  • RMSE = which has same physical dimensions as the forecasts and observations
  • To calculate the bias of the forecast, compute the Mean Error:
skill scores
Skill Scores
  • Can be computed with MAE, MSE, or RMSE as the underlying accuracy statistics

Climatological value for day k

probability forecasts of discrete predictands
Probability Forecasts of Discrete Predictands
  • The joint Distribution for Dichotomous Events
  • Not just using probabilities of 0 and 1

For each possible forecast probability we see the relative freq that forecast value was used, and the probability that the event o1 occurred given the forecast yi

the brier score
The Brier Score
  • Scalar accuracy measure for verification of probabilistic forecasts of dichotomous events
  • This is the mean squared error of the probability forecasts, where o1 = 1 if the event occurs and o2 = 0 if the event doesn’t occur.
  • Perfect forecast BS = 0 less accurate forecasts receive higher BS.
  • Briar Skill Score:
the reliability diagram
The Reliability Diagram
  • Is a graphical device that shows the full joint distribution of forecasts and observations for probability forecasts of a binary predictand, in terms of its calibration-refinement factorization
  • Allows diagnosis of particular strengths and weaknesses in a verification set.

Forecasts are consistently too large relative to the conditional event relative frequencies, avg forecast larger than avg obs.

Underconfident: extreme probabilities forecast too infrequently

Overconfident: extreme probabilities forecast too often

The conditional event relative frequency is essentially equal to the forecast probability.

Forecasts are consistently too small relative to the conditional event relative frequencies, avg forecast smaller than avg obs.


Well-calibrated probability forecasts mean what they say, in the sense that subsequent event relative frequencies are equal to the forecast probabilities.

hedging and strictly proper scoring rules
Hedging and Strictly proper scoring rules
  • If a forecaster is just trying to get the best score, they may improve scores by hedging, or gaming -> forecasting something other than our true belief in order to achieve a better score.
  • Strictly proper – a forecast evaluation procedure that awards a forecaster’s best expected score only when his or her true beliefs are forecast.
    • Cannot be hedged
    • Brier score
    • You can derive that it is proper, but I wont here.
probability forecasts for multiple category events
Probability Forecasts for Multiple-category events
  • For multiple-category ordinal probability forecasts:
    • Verification should penalize forecasts increasingly as more probability is assigned to event categories further removed from the actual outcome.
    • Should be strictly proper.
  • Commonly used:
    • Ranked probability score (RPS)
probability forecasts for continuous predictands
Probability forecasts for continuous predictands
  • For an infinite number of predictand classes the ranked probability score can be extended to the continuous case.
  • Continuous ranked probability score
  • Strictly proper
  • Smaller values are better
  • It rewards concentration of probability around the step function located at the observed value.


nonprobabilistic forecasts of fields
Nonprobabilistic Forecasts of Fields
  • General considerations for field forecasts
  • Usually nonprobabilistic
  • Verification is done on a grid

Scalar accuracy measures of these fields:

    • S1 score,
    • Mean Squared Error,
    • Anomaly correlation

Thank you for your participation throughout the semester

  • All presentations will be posted on my UD website
  • Additional information can be found in Statistical Methods in the Atmospheric Sciences (second edition) by Daniel Wilks