Highlights of DTC Model Testing and Evaluation Results*

Highlights of DTC Model Testing and Evaluation Results* Bill Kuo1, Louisa Nance1, Barb Brown1 and Zoltan Toth2 Developmental Testbed Center National Center for Atmospheric Research Earth System Research Laboratory *Contribution from all DTC Staff

Objectives of the DTC* *DTC is jointly sponsored by NOAA, Air Force, NSF, & NCAR Advance NWP research by providing the research community an environment that is functionally similar to that used in operations to test and evaluate the components of the NWP systems supported by the DTC; Reduce the average time required to implement promising codes emerging from the research community by performing initial extensive testing to demonstrate the potential of new science and technologies for possible use in operations; Sustain scientific interoperability of the community modeling system; Manage and support the common baseline of end-to-end community software to users, including dynamic cores, physics and data assimilation codes, pre- and post-processors and codes that support ensemble forecasting systems; and Establish, maintain and support a community statistical verification system for use by the broad NWP community.

Model Evaluation Tools (MET) http://www.dtcenter.org/met/users/ • State-of-the-art tools • Traditional and advanced (e.g., spatial) • Database and display system • Supported to community • Tutorials, email help, etc. • Ensemble tools • Brier score + decompositions • ROC, Reliability • Ensemble quantiles • Rank histogram • CRPS (continuous rank probability score) • See Verification Methods session Friday, 10:30 and P57 - Fowler

DTC Test & Evaluation Activities • Mesoscale Modeling (WRF) • Assess performance of select configurations for new WRF releases • Test physics in a functionally similar operational environment P58 - Wolff , P59 - Harrold • Test SREF member configurations • QPF verification of high resolution models • Assess performance of microphysics schemes (HMT) • Hurricanes (HWRF) • Test HWRF configured from WRF repository for use at EMC 2.2 – Bernardet, P84 - Bao • Test HWRF physics options & assess their impact on rapid intensification 10.7 - Biswas • Perform diagnostics studies to examine the strengths & weaknesses of HWRF (HFIP) • Data Assimilation • Test GSI baseline (comparison w/ WRF-Var) 9.6 - Shao • Test regional EnKF systems P8 – Newman • Ensembles • Test bias-correction & down-scaling schemes for SREF • Verification of storm-scale ensemble systems for severe weather & QPF (HWT) • Demonstration of real-time QPF verification for mesoscale ensemble system (HMT) P55 – Jensen, P56 – Tollerud • Data Assimilation/Ensembles/Hurricanes • Assess the impact of GSI-hybrid DA on HWRF forecasts (HFIP) P7 - Zhou

WRF Innovation T&E • Inter-comparison T&E allows for a quantitative assessment of forecast performance between • an operational baseline and community contributed scheme QNSE vs AFWA OC RRTMG vs AFWA OC • Inter-comparison T&E allows for a quantitative assessment of forecast performance between • an operational baseline and community contributed scheme P59 - Harrold

WRF Innovation T&E • Inter-comparison T&E allows for a quantitative assessment of forecast performance between • an operational baseline and community contributed scheme AFWA OC V3.3.1 vs V3.1.1 • Inter-comparison T&E allows for a quantitative assessment of forecast performance between • an operational baseline and community contributed scheme • two different versions of WRF using the same physics scheme

Comparison of V3.1.1 and V3.3.1 From: Wei Wang and Ming Chen

WRF Member Testing for NCEP’s SREF New membership: NMMB(7), NMME(7) & ARW(7) • Tested performance of 5 WRF configurations for ~50 cases distributed over a year • Candidate configuration NMM-GFS replaced w/ NMM-NCAR • Cursory timing tests – ARW adaptive time step • Transition from 32/35 km to 16/17 km

Mesoscale Model Evaluation Testbed (MMET) – P58 – Jamie Wolff et al. • Outcome of NWP Workshop on Model Physics with an Emphasis on Short-Range Weather Prediction, held at EMC 26-28 July 2011 • Mechanism to assist research community with initial stage of testing and allow for efficient demonstration of merits of a new development • Common framework for testing; allow for direct comparisons between different techniques • Model input and observational datasets provided to utilize for testing • Baseline results for select operational models established by the DTC • Hosted by the DTC; served through Repository for Archiving, Managing and Accessing Diverse DAta (RAMADDA) http://dtcenter.org/repository

MMET Cases • Initial solicitation of cases from DTC Science Advisory Board Members and Physics Workshop Participants – great response and enthusiasm towards endeavor • Target casesduring initial year • 20090228 – Mid-Atlantic snow storm where North American Mesoscale (NAM) model produced high QPF shifted too far north • 20090311 – High dew pointpredictions by NAM over the upper Midwest and in areas of snow • 20091007–High-Resolution Window (HIRESW) runs underperformed compared to coarser NAM model • 20091217 – “Snowapocalypse ‘09”: NAM produced high QPF over mid-Atlantic, lack of cessation of precipitation associated with decreasing cloud top over eastern North Carolina • 20100428-0504– Historic Tennessee flooding associated with an atmospheric river event • 20110404 – Recording breaking severe report day • 20110518-26 – Extended period of severe weather outbreak covering much of the mid-west and into the eastern states later in the period • 20111128 – Cutoff low over SW US; NAM had difficulties throughout the winter of breaking down cutoff lows and progressing them eastward • 20120203-05 – Snow storm over Colorado, Nebraska, etc.; NAM predicted too little precipitation in the warm sector and too much snow north of front (persistent bias)

Innovations from HMT-West Research System: ESRL/GSD and HMT Ensemble Modeling System • WRF model 9-member ensemble; ARW and NMM cores • Outer domain 9km; Nested domain 3 km • Hybrid members: Multi physics packages, two model cores, and different GFS initial conditions • Outer domain runs to 5 day lead time; Nest to 12 hr; DTC evaluated first 72 hours • Comparisons made with current operational systems (GFS, SREF, NAM, HRRR, etc) • Evaluation focus on QPFwith addition of state variables in 2011-2012 • HMT-West typically runs December – March; DTC has evaluated approximately 3.5 months of data for past 3 seasons see P55 Jensen et. al

Model Comparisons from 2010-2011 HMT-West HRRR (3km) NMM-B parallel (4km) HMT-Ens Mean (9km) Gilbert Skill Score (or ETS) NAM (12km) GFS (0.5 deg) 6hr AccumPrecip > 1” – Meso- and fine-scale models tended to have higher median Gilbert Skill Scores over GFS for extreme precipitation events. Differences appear statistically significant at hours 18-30 and 66 6 12 18 24 30 36 42 48 54 60 66 72 • Including Parallel Runs in Testbed Evaluations: DTC testing of NMM-B parallel runs provided additional confidence (beyond EMC routine pre-implementation testing) and helped push forward an Oct. 2011 implementation of NMM-B core.

Model Comparisons from HMT-West 2012 Optimal Better Area Under ROC Curve Gilbert Skill Score (or ETS) No Skill (10 member) (9 member) (21 member) Gilbert Skill Score – Ability to forecast given amount 6hr AccumPrecip > 1” - All scores are low – partially due to sample-size but SREF (32km) shows very little skill whereas HMT & AFWA (3 & 4km) ensembles can score as high at 0.3 Area Under ROC – Ability to discriminate between event/non-event Prob(6hr AccumPrecip) > 1” - All scores are low at 6hr lead time – There are differences in the median AFWA and SREF values at 12 hr leads that may be significant • Beyond higher resolution: Different initialization sources (AFWA) and methods (HMT and AFWA) may prove useful for the next-generation ensemble system. Select innovations from HMT-West will be tested by DTC during the coming year. see P55 Jensen et. al

HFIP GSI-Hybrid Data Assimilation Test: P7 Zhou No DA GSI 3DVAR GSI-Hybrid Best Track GSI Hybrid using global ensemble improved Bret track forecast

Summary & Outlook • The DTC is a community facility with a mission to: • Accelerate the transition of new NWP technology into operations • Maintain and support community modeling systems for research and operational NWP communities • Facilitate the interaction between research and operational NWP • The DTC seeks input from the community through: • Participation in DTC Testing and Evaluation activities (e.g., MMET): Funding is available for off-cycle visitor proposal • Suggestions for new DTC T&E activities • Defining future direction of the DTC through the DTC Science Advisory Board (Cliff Mass is the chair of DTC SAB)

THANK YOU! http://www.dtcenter.org/

Highlights of DTC Model Testing and Evaluation Results*

Highlights of DTC Model Testing and Evaluation Results*

Presentation Transcript

The FEST Model for Testing the Importance of Hysteresis in Hydrology

Manual Muscle Testing

Clinical Evaluation of CAD Diagnostic Testing for Ischaemia

Object-Oriented Testing

The ACE Grain Flow Model: Results and Discussions

Daniel L. Stufflebeam C. I. P. P. Evaluation Model

USING A LOGIC MODEL FRAMEWORK FOR PROGRAM PLANNING AND EVALUATION

Chapter 8 – Software Testing

Some results from operational verification in Italy

STATNAMIC LOAD TESTING Development, Interpretation of Results, Advantages

CH08: Testing the Programs

Evaluation of Measurement Uncertainty – applications in testing

Partnership

Chapter 9

Software Testing Techniques

Essential elements of a defense-review of DNA testing results

Teaching and Testing:

Chapter 11: Testing

Session 7 Introduction to Research and Evaluation

Testing IDS