1 / 8

DET: Testing and Evaluation Plan

DET: Testing and Evaluation Plan. Barbara Brown 1 , Ed Tollerud 2 , and Tara Jensen 1 1 NCAR/RAL, Boulder, CO and DTC 2 NOAA/GSD, Boulder, CO and DTC. Wally Clark. DTC and DET Testing and Evaluation. T&E is one of the most important activities undertaken by the DTC

duane
Download Presentation

DET: Testing and Evaluation Plan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DET: Testing and Evaluation Plan Barbara Brown1, Ed Tollerud2, and Tara Jensen1 1 NCAR/RAL, Boulder, CO and DTC 2 NOAA/GSD, Boulder, CO and DTC Wally Clark

  2. DTC and DET Testing and Evaluation • T&E is one of the most important activities undertaken by the DTC • DTC testing has involved WRF core comparisons, boundary layer schemes, and other aspects of NWP • DTC has created “Reference Configurations” (RCs) that are to be re-tested in conjunction with model changes • DET infrastructure is being developed to allow • Testing and evaluation and • Intercomparison of ensemble systems and system components

  3. Major categories of testing • Forecasting system comparisons • Compare forecasts based on one configuration with forecasts based on a different model configuration • Examples • Two types of model initialization • Two or more methods of statistical post-processing • Individual reference configuration • Model “setup” is evaluated • Setup is re-evaluated when model changes are implemented • Reference configurations may be defined by • Operational centers • Users • RCs may also be community-contributed • Forecasts contributed by a modeling group • Ex: Forecasts evaluated in HWT and HMT projects

  4. DTC Testing and Evaluation Principles • A formal test plan is developed, defining all of the important aspects of the testing and evaluation • Developer may have a role in helping to create the test plan • Execution of test is independent of the developer • Focus of test depends on the questions that are of interest • Module being used • Variables of interest • Many cases evaluated for statistical significance • Not just a few case studies • Multiple seasons, times of day, etc. • Meaningful stratifications • Location/region • Season • Other user-based criteria

  5. Components of a test plan (example) • Goals • Experiment design • Codes • Specification of the codes will be run as part of the test • Model output • What kinds of output will be produced? • Forecast periods • Post-processing • Verification • Statistical methods and measures • Graphics generation and display • Data archival and dissemination of results • Computer resources • Deliverables Example from QNSE evaluation (surface T and wind)

  6. Questions to address when developing a test plan • Which aspect(s) (or modules)of the ensemble system will be evaluated? • What performance aspects are we trying to compare? Or evaluate? • Who are the “users”? • What are the variables of interest? Answers to these questions will lead to determination of the other aspects of the plan

  7. Considerations for ensemble T&E • Number of cases will likely need to be increased (over non-ensemble evaluations) • Many probabilistic and ensemble verification scores (e.g., reliability) require relatively large subsamples • Subsamples must be large enough to assess statistical significance • But – Sampling must be focused enough for representativeness • Verification approaches and metrics are somewhat unique • Computer resources may be a limitation

  8. Other considerations • Real-time vs. post-analysis • DTC intensive tests generally done in post-analysis • Real-time demonstrations also have many benefits (e.g., HMT, HWT) • Subjective evaluations – should these be considered for DET T&E? • How much rigorous end-to-end testing required vs. evaluation of individual components? Example for HMT evaluation – winter 2010

More Related