Two consolidation projects
This presentation is the property of its rightful owner.
Sponsored Links
1 / 40

Two Consolidation Projects: PowerPoint PPT Presentation


  • 68 Views
  • Uploaded on
  • Presentation posted in: General

Two Consolidation Projects:. Towards an International MME: CFS+EUROSIP(UKMO,ECMWF,METF) 11 slides Towards a National MME: CFS and GFDL 18 slides. Does the NCEP CFS add to the skill of the European DEMETER-3 to produce a viable International Multi Model Ensemble (IMME) ?.

Download Presentation

Two Consolidation Projects:

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Two consolidation projects

Two Consolidation Projects:

  • Towards an International MME: CFS+EUROSIP(UKMO,ECMWF,METF) 11 slides

  • Towards a National MME: CFS and GFDL 18 slides


Two consolidation projects

Does the NCEP CFS add to the skill of the European DEMETER-3 to produce a viable International Multi Model Ensemble (IMME) ?

Huug van den Dool

Climate Prediction Center, NCEP/NWS/NOAA

Suranjana Saha and Åke Johansson

Environmental Modeling Center, NCEP/NWS/NOAA

August 2007


Two consolidation projects

DATA and DEFINITIONS USED

  • DEMETER-3 (DEM3) = ECMWF + METFR + UKMO

  • CFS

  • IMME = DEM3 + CFS

  • 1981 – 2001

  • 4 Initial condition months : Feb, May, Aug and Nov

  • Leads 1-5

  • Monthly means


Two consolidation projects

DATA/Definitions USED (cont)

  • Deterministic : Anomaly Correlation

  • Probabilistic : Brier Score (BS) and Rank Probability Score (RPS)

  • Ensemble Mean and PDF

  • T2m and Prate

  • Europe and United States

    “ NO (fancy) consolidation, equal weights, NO Cross-validation”


Two consolidation projects

DATA/Definitions USED (cont)

Verification Data :

  • T2m : CPC Monthly Analysis of the CAMS + Global Historical Climate Network (Fan and Van den Dool 2007)

  • Prate : CMAP (Xie-Arkin 1997)


Number of times imme improves upon dem 3 out of 20 cases 4 ic s x 5 leads

Number of times IMME improves upon DEM-3 :out of 20 cases (4 IC’s x 5 leads):

“The bottom line”


Frequency of being the best model in 20 cases in terms of anomaly correlation of the ensemble mean

Frequency of being the best model in 20 casesin terms ofAnomaly Correlation of the Ensemble Mean

“Another bottom line”


Frequency of being the best model in 20 cases in terms of brier score of the pdf

Frequency of being the best model in 20 casesin terms ofBrier Score of the PDF

“Another bottom line”


Frequency of being the best model in 20 cases in terms of ranked probability score rps of the pdf

Frequency of being the best model in 20 casesin terms ofRanked Probability Score (RPS) of the PDF

“Another bottom line”


Two consolidation projects

CONCLUSIONS

  • Overall, NCEP CFS contributes to the skill of IMME (relative to DEM3) for equal weights.

  • This is especially so in terms of the probabilistic Brier Score

    and for Precipitation


Two consolidation projects

CONCLUSIONS (Cont)

In comparison to ECMWF, METFR and UKMO,

the CFS as an individual model does:

  • well in deterministic scoring (AC) for Prate and

  • very well in probability scoring (BS) for Prate and T2m

    over both USA and EUROPEAN domains


Two consolidation projects

CONCLUSIONS (Cont)

  • The relative weakness of the CFS is in the deterministic scoring (AC) for T2m (which is near average of the other models) over both EUROPE and USA

  • Skill (if any) over EUROPE or USA is very modest for any model, or any combination of models

  • The Brier Score shows rare improvements over climatological probabilities in this study

  • The AC for the ensemble mean gives a more “positive” impression about skill than the Brier Score


Study of the performance of gfdl seasonal forecasts in a multi model ensemble at ncep

Study of the performance of GFDL seasonal forecasts in a Multi Model Ensemble at NCEP

Huug van den Dool

Climate Prediction Center/NCEP/NWS/NOAA

Suranjana Saha

Environmental Modeling Center/NCEP/NWS/NOAA


Two consolidation projects

Data Used

  • 4 initial conditions: April 1, May 1, Oct 1 and Nov 1

  • 10 member one-year forecasts (leads 0 thru 11)

  • Period 1981-2005 (25 years)

  • GFDL has a fully coupled model CM2.1 (IPCC version)


Two consolidation projects

Verification Data Used

  • Focus on monthly mean 2m-temperature and precipitation over the continental US

  • Verification of 2m-temperature against GHCN+CAMS (land only)

  • Verification of precipitation against CMAP ( land and ocean)

  • Area: valid grid points (2.5x2.5) within 25N-50N, 125W-65W box over the US


Two consolidation projects

Comparison to the NCEP Climate Forecast System (CFS)

GFDL members start a few days before and on the first of the month.

CFS members are clustered around the 11th and 21st of the previous month and the 1st of the initial month.

In an NCEP operational setting, the GFDL model would be run everyday (similar to the CFS).

Therefore, the calibration of the operational forecast would be obtained from an interpolation of two sets of forecasts, a month apart (one of which would be a month old), thus resulting in a possible degradation of skill.


Two consolidation projects

VERIFICATION OF US PRATE ANOMALY CORRELATION


Two consolidation projects

CFS US PRATE ANOMALY CORRELATION

There are 32 ENTRIES: 8 leads for 4 initial months

initial month apr may oct nov

lead

8.126 .143 .059 .083

7 .135 .174 .101 .034

6 -.001 .071 .081 .112

5 .098 .035 .123 .041

4 .027 .087 .227 .168

3 .049 .025 .166 .231

2 .058 .061 .119 .220

1 .142 .030 .149 .161

0 .191 .244 .189 .277

Worst mean-sd mean mean+sd best

-.001 .043 .104 .166 .231

Some skill

in ENSO months

NO CROSS VALIDATION


Two consolidation projects

CFS US PRATE ANOMALY CORRELATION

There are 32 ENTRIES: 8 leads for 4 initial months

initial month apr may oct nov

lead

8 .062 .093 .007 .024

7 .125 .107 .062 -.017

6 -.090 -.039 .039 .058

5 .038 -.056 .056 -.016

4 -.033 .033 .192 .086

3 -.023 -.073 .120 .191

2 -.028 -.016 .065 .182

1 .113 -.059 .122 .101

0 .139 .241 .116 .248

Worst mean-sd mean mean+sd best

-.090 -.032 .045 .121 .192

CV brings

all numbers down

CROSS VALIDATION CV3RE


Two consolidation projects

GFDL US PRATE ANOMALY CORRELATION

There are 32 ENTRIES: 8 leads for 4 initial months

initial month apr may oct nov

lead

8 .027 .044 .045 .048

7 .122 -.002 .032 .091

6 -.040 .138 .057 .066

5 .003 .047 .113 .016

4 .061 -.019 .153 .081

3 .093 .024 .071 .154

2 .059 .099 .166 .109

1 .120 .074 .085 .217

0 .184 .219 .087 .250

Worst mean-sd mean mean+sd best

-.040 .018 .074 .130 .217

Weak skill

in ENSO months

NO CROSS VALIDATION


Two consolidation projects

MME2 US PRATE ANOMALY CORRELATION

There are 32 ENTRIES: 8 leads for 4 initial months

initial month apr may oct nov

lead

8 .104 .135 .065 .082

7 .168 .112 .091 .075

6 -.032 .140 .092 .115

5 .062 .056 .149 .039

4 .060 .043 .241 .159

3 .093 .032 .147 .234

2 .076 .105 .183 .199

1 .165 .067 .145 .227

0 .223 .277 .170 .311

Worst mean-sd mean mean+sd best

-.032 .051 .113 .176 .241

NO CROSS VALIDATION


Two consolidation projects

US PRATE (AC)BEST out of 32 cases (4 IC’s x 8 leads):

NO CV

MEAN AC

CV3RE

MEAN AC


Two consolidation projects

US PRATE (summary)

  • CFS alone is slightly better than GFDL alone

  • MME2 is slightly better than CFS alone

  • MME2 is better than GFDL alone

  • Numerically, differences are minuscule,

  • and the existence of any skill is debatable


Two consolidation projects

PRATE OVER NINO 3.4 AREA (summary)

CFS MME2 GFDL

.532 .520 .313 (NO CV)

.511 .481 .252 (CV3RE)

  • Adding GFDL to CFS for MME2 degrades scores

  • GFDL has ENSOs, maybe even too strong in 1983 and 1998, but

  • the precipitation anomalies are weak at the equator and are pushed

  • away from the equator, mainly into the southern hemisphere.


Two consolidation projects

VERIFICATION OF

US SURFACE TEMPERATURE ANOMALY CORRELATION


Two consolidation projects

US 2m TEMPERATURE (AC)BESTout of 32 cases (4 IC’s x 8 leads):

NO CV

MEAN AC

CV3RE

MEAN AC


Two consolidation projects

US 2m TEMPERATURE (summary)

  • CFS alone is not better than GFDL alone

  • MME2 is slightly better than CFS alone

  • MME2 is not better than GFDL alone

  • Numerically, differences are minuscule,

  • and the existence of any skill is debatable


Two consolidation projects

TREND ANALYSIS OF US 2m TEMP

  • Effect of OCN (Optimal Climate Normals)

  • filtering on AC scores for all 32 cases

  • (NO-CV)

  • 9 year running mean is removed

  • RAW OCN-filtered

  • GFDL0.099 0.068

  • CFS 0.080 0.073

  • GFDL loses its advantage over the CFS

  • when the trend is removed


Two consolidation projects

CONCLUSIONS (1)

  • Skill of both, CFS and GFDL, is extremely low for both 2m temperature (T2M) and precipitation (PRATE) over the US, and this skill wilts further upon cross validation (CV3RE)

  • GFDL makes no contribution to the skill of MME2 for PRATE over the US

  • GFDL makes no contribution to the skill of MME2 for PRATE over the tropical Pacific (Nino 3.4 area)

  • GFDL has a small edge over the CFS and contributes to MME2 for T2M over the US


Two consolidation projects

CONCLUSIONS (2)

  • The inconsistency between performance in PRATE and T2M is explained by inclusion of historical CO2 etc, i.e. GFDL does a better job on the decadal temperature trends. This is explained by the drop in the skill when the trend is removed.

  • The empirical tool, OCN (Optimal Climate Normals), is routinely used by CPC to incorporate decadal trends in the consolidation of the official seasonal forecasts for US T2M. Its performance is better than any of their dynamical tools.


From delsole 2007

From Delsole(2007)

  • Surprisingly, none of the regression models proposed here can consistently beat the skill of a simple multi-model mean

  • “Under suitable assumptions, both the Bayesian estimate and the constrained least squares solution reduce to standard ridge regression”.


Kharin and zwiers 2002

Kharin and Zwiers(2002):

  • Several methods of combining individual forecasts from a group of climate models to produce an ensemble forecast are considered

  • In the extratropics, the regression-improved ensemble mean performs best.

  • The “superensemble” forecast that is obtained by optimally weighting the individual ensemble members does not perform as well as either the simple ensemble mean or the regression-improved ensemble mean.

  • The sample size evidently is too small to estimate reliably the relatively large number of optimal weights required for the superensemble approach.


Finally huug van den dool 2007

FinallyHuug van den Dool, 2007

  • There is essentially not enough hindcast data for these fancy consolidation methods to work (21-25 years is nothing !!). ((There may be exceptions))

  • There is no (or not enough) independent information in model A versus Model B

  • We have to be rigorous in CV procedures!


The rest is extra

The rest is EXTRA


Two consolidation projects

+CPC limit

+Delsole limit

Classic


Appendix consolidation techniques

Appendix: Consolidation Techniques

  • A technique to linearly combine any set of models

    Example: Con3 = a*A + b*B + c*C,

    where A, B and C are forecasts and a, b, and c coefficients.

  • The coefficients ideally depend on skill and co-linearity among the models, as determined from many hindcasts

  • Because of near instability of the matrix problem, NCEP applies ‘ridging’ to the covariance matrix, and tries to pool as much data as possible (areas, leads..).

  • To arrive at a skill estimate, we perform a 3 year-out cross validation (CV3), namely the year in consideration and two more years chosen at random (to reduce CV pathological problems)


Two consolidation projects

BRIER SCORE FOR 3-CLASS SYSTEM

1. Calculate tercile boundaries from observations 1981-2001 (1982-2002 for longer leads) at each gridpoint.

2. Assign departures from model’s own climatology (based on 21 years, all members) to one of the three classes: Below (B), Normal (N) and Above (A), and find the fraction of forecasts (F) among all participating ensemble members for these classes denoted by FB, FN and FA respectively, such that FB+ FN+FA=1 .

3. Denoting Observations as O, we calculate a Brier Score (BS) as :

BS={(FB-OB)**2 +(FN-ON)**2 + (FA-OA)**2}/3,

aggregated over all years and all grid points.

{{For example, when the observation is in the B class, we have (1,0,0) for (OB, ON, OA) etc.}}

4. BS for random deterministic prediction: 0.444

BS for ‘always climatology’ (1/3rd,1/3rd,1/3rd) : 0.222

5. RPS: The same as Brier Score, but for cumulative distribution (no-skill=0.148)


Two consolidation projects

Anomaly correlation does not asymptote to 100 at fcst time=0

Interpolation of initial conditions from Reanalysis 2 may not be correct or accurate


Two consolidation projects

CROSS-VALIDATION

Anomaly Pattern correlation over the tropical Pacific. Average for all leads and initial months. Empty bar: Full (dependent), filled bar: 3-yr out cross-validated.


Two consolidation projects

Peña and Van den Dool (2008)Consolidation of Multi Method Forecasts by Ridge Regression: Application to Pacific Sea Surface Temperature

 Strategies to increase the ratio of the effective sample size of the training data to the number of coefficients to be fitted are proposed and tested. These strategies include:

  • i) objective selection of a smaller subset of models, ii) pooling of information from neighboring gridpoints, and iii) consolidating all ensemble members rather than each model’s ensemble average.

  • In all variations of the ridge regression consolidation methods tested, increased effective sample size produces more stable weights and more skillful predictions on independent data.

  • In the western tropical Pacific, most consolidation methods outperform the simple equal weight ensemble average; in other regions they have similar skill as measured by both the anomaly correlation and the relative operating curve.

  • The main obstacle to progress is a short period of data and a lack of independent information among models.

  • CV3RE


  • Login