Bias Correction Methods Adjusting Moments

Bias Correction Methods Adjusting Moments Bo Cui*, Zoltan Toth Yuejian Zhu, Dingchen Hou*, and Richard Wobus * Environmental Modeling Center, NCEP/NWS * SAIC at Environmental Modeling Center, NCEP/NWS

Acknowledgements Zoltan Toth Yuejian Zhu Dingchen Hou Richard Wobus

Outline • Tasks & Goals • Bias-Correction Algorithm: Adjusting Moments • Experimental Design • Ensemble Forecast Verification • Future Plans

Ensemble Postprocessing • NWP models, ensemble formation are imperfect • Deficiencies due to various problems in NWP models • Systematic errors in analysis induced by observations and model related • Ensemble formation • Not appropriate initial spread • Lack of representation of model related uncertainty • Limited ensemble size • Known model/ensemble problems addressed at their sources, no “perfect” solution exists • Systematic errors remain and cause biases in • 1st , 2nd moments of ensemble distribution

Tasks & Goals • Tasks • Develop and implement a statistical post-processing scheme to reduce the biases in ensemble forecasts (height, temperature and other variables) • Correct both the 1st and 2nd moments of the ensemble • Goals • Biased-corrected forecasts will have reduced or no bias with respect to the verifying analysis fields, given on the model grid

Moment Adjustment • Bias Assessment FIRST MOMENT B = DIFFERENCE BETWEEN Ensemble mean forecast and Verifying analysis SECOND MOMENT R = RATIO BETWEEN RMS Error of Ensemble mean and Ensemble Spread • Bias Correction 1st moment = Ensemble mean – B 2nd moment= Ensemble mean – B – (Ensemble Forecast – Ensemble Mean) * R

Implementation Facts • Bias assessment – carried out separately at each • forecast lead time • individual grid point • ensemble mean, GFS and ensemble control forecasts • Bias correction tests - applied on • all ensemble member forecasts • for 00Z initial cycle only • 2.5x2.5 lat/lon resolution • 500 mb height, 850 mb temperature

Alternatives or Refinements of Bias-Correction Algorithm Adaptive methods: • Consider most recent past data with decaying averaging • Use data from surrounding grid-points (with a Gaussian weighting function) • Use large (climatological) sample data if available and forecast system is stable • Adjust temporal/spatial sampling domain to optimize performance • Construct cumulative frequency distribution to match that of observed, QPF calibration (Yuejian Zhu) • Regime dependent method (Jun Du) • use correlation coefficients between circulation field today vs. that in recent past to determine weights given to data in estimating bias

Experimental Design Implementation of decaying averaging for 1st moment bias T0-46 day T0-16 day T0 day decaying averaging mean error = (1-w) * prior t.m.e + w * (f – a) a) Prior estimate to startup procedure: choose T0 as current date (00Z), calculate the time mean errors between T-46 and T-16 day. b) Update: the prior estimate of the average state is multiplied by a factor 1-w (<1). Then, most recent verification error (f - a) is added to the decaying average for each lead time with a weight of w. c) Cycling: repeat step (b) every day. Three experiments with w of 1%, 2% and 10%

Experimental Design Centered running mean error test for 1st moment bias T0-15 day T0 day T0+15 day Define +/- 15 day time average as bias. Use bias estimate (with dependent data) as “optimal” benchmark. Implementation: • Four experiments: optimal test, three decaying averaging experiments (1%, 2% and 10% weight) • 8-month period for these experiments (Spring and Summer 2004 )

Temporal Cross Section: 500 mb Height Time Mean Error (40° N, 95° W, Jan. to Aug. 2004) OPT W=1% May 22 May 22 Jun. 22 W=2% W=10% May 22 Jun. 22 May 22 Jun. 11

Temporal Cross Section: 850 mb Temp. Time Mean Error (40° N, 95° W, Jan. to Aug. 2004) OPT W=1% May 1 May 1 Jun. 2 W=2% W=10% May 1 Jun. 2 May 1 May 10

Ensemble Forecasts Verification • Verification of ensemble mean 500 mb height and 850 mb temperature • Verification domains NH, SH and Tropics • Verification data set GFS final analysis • Verification scores AC=pattern anomaly correlation coefficientRMS=root mean square error of ensemble meanROC= relative operating characteristics RPSS=ranked probability skill score

AC and RMS 500 mb Height, Summer 2004 AC RMS 3 bias-corrected ensembles with decaying average: AC scores slightly improved for week 1 RMS error slightly reduced for first several days

ROC: 500 mb Height, Summer 2004 NH SH • 2% weight experiment improves performance over NH, and slightly over SH up to week 2 • 10% weight experiments performance improved over Tropics TR

ROC: 500 mb Height, Spring 2004 NH SH • NH and SH: ROC with some weight improved for most lead time • Tropics: ROC improved at all leads indicting bias much reduced for sub-regions. 10% weight experiment has a better performance TR

RPSS: 500 mb Height, Summer 2004 NH SH • 2% weight experiment improve performance over NH, and slightly over SH as well • 10% weight experiment improves performance over Tropics, especially for week 2 TR

Preliminary Results • In general, the time mean errors of 500 mb height increase with • forecast lead time. The time mean errors growth of 500mb • height with forecast lead time is nearly linear in some cases. • What determines linearity? • The time mean error difference between 1% and 2% weight • experiments is small. The 10% weight experiment has higher • frequency details compared to the 1% and 2% experiments (better for short range?). • The centred running mean error test (OPT) shows potential for • significant improvement in the forecast of both 500 mb height • and 850 mb temperature in term of all verification scores, • compared to the raw ensembles.

Preliminary Results • For days 1 through 6, the AC scores for the raw ensemble and • three bias corrected ensembles with decaying averaging are • relatively close to each other on average. With some weights, • AC and RMS performance can be improved. • The 2% ensemble show large improvements of ROC, RPSS • and BSS score over the North and South Hemisphere. The • improvement of these scores in summer is more significant than • in spring. On the other hand, the choice of 10% weight works • better for Tropics compared to 1% and 2%. Use different • weights for Tropics? • The decaying averaging approach to improve the NCEP’s • global ensemble forecast system seems promising.Problems • with estimating bias for longer lead time with short sample.

Future Plans • Test 1st moment bias-correction algorithm on longer period (four seasons, 5 years) for tuning. • Start research on the 2nd moment calibration. • Test refinements of bias correction algorithm listed before. • Run 4 cycles per day, adding 06Z 12Z and18Z forecasts, to provide more timely information and increase sample size. Use data with 1x1 lat/lon resolution. • Add new ensemble forecast variables such as 2m temperature, U,V, cumulative frequency distribution for forecast QPF. • Consider other methods and/or use of larger sample especially for longer lead times.

Refinements of Bias-Correction Algorithm Details: Decaying averaging • Use recent verification statistics in the calibration process, accumulated in a decaying averaging sense • Achieved by using a recursive averaging procedure (Kalman Filtering) 6.6% 3.3% 1.6% Toth, Z., and Y. Zhu, 2001

Centered Running Mean Error: Summer 2004 Latitudinal Cross Section (95° W) Longitudinal Cross Section (40° N) z500 z500 40N 95W T850 T850 40N 95W

Bias Correction Methods Adjusting Moments