SPoRT-MET Scripting Package Tutorial for Regional NWP Model Verification

Regional Numerical Weather Prediction (NWP) Modeling and Verification WorkshopPart 5: SPoRT-MET Scripting Package Tutorial Jonathan Case (ENSCO, Inc./NASA Short-term Prediction Research and Transition [SPoRT] Center)

SPoRT’s Model Evaluation Tools (MET) Scripting Package Tutorial Prepared by Jonathan Case1,Bradley zavodsky2, and Jayanthi Srikishen3 1ENSCO, Inc., 2NASA/MSFC, and 3Universities Space Research Association Short-term Prediction research and transition (SPort) center NASA/SERVIR

Purpose of SPoRT MET Scripts Effectively evaluating model performance requires a combination of quantitative metrics and case studies SPoRT transitions techniques, data and products SPoRT values transition of capabilities that enable its partners to perform evaluations that support forecaster-led conference presentations and journal articles SPoRT directly interacts with U.S. National Weather Service (NWS) forecasters

Purpose of SPoRT MET Scripts • Model Evaluation Tools (MET) is a software package developed by NCAR that contains a number of executable programs that will: • Reformat observations • Match model grid to observations • Perform statistical evaluation • Steep learning curve due to large numberof components Dynamic scripts to easily run the software Open-source plotting scripts to visualize statistics Prepare observations from PREPBUFR files

SPoRT MET Scripts Contents • Once unzipped and untarred, a number of directories, python scripts (*.py), and namelist.met files appear • Namelist is modified to define variables / statistics to be generated; users should only edit namelist.met files in the configFiles/ directory • Scripts run the MET workflow; users only modify select fields in the headers of workflow scripts • Modules within utils/ folder contain code used by multiple scripts • Sub-directories contain documentation, templates used by scripts, or denote placeholders where scripts output data • Users must download and compile the MET software (http://www.dtcenter.org/met/users/downloads/index.php) prior to running scripts (v5.1 as of August 2016) • This presentation contains an overview of the various components of the scripts

Model Verification Process with SPoRT-MET • Make UEMS model runs, archiving files in GRIB2 format with an appropriate naming convention • Acquire observational data • Surface and upper-air point observations • Select precipitation grids acquired automatically • Match forecast grids to observations for each run • Generate error statistics in various ways: • An individual model run, or summary of multiple runs • Two or more experiments (e.g., sensitivity simulations) • Entire grid, or select geographical subset(s), or select station(s) • Plot results

Acquire point observations: pb2nc_loop.sh This script manages the acquisition and pre-processing of surface and upper-air observations for verification Calls 2 python scripts [runSPoRTMETScripts_pb2nc.py & obtainObservations.py] that download/pre-process data NOTE: Users must register an account at rda.ucar.edu; then edit obtainObservations.py, inserting the login information on lines 374-375 Output consists of hourly surface & upper-air observations in netcdfformat for use in MET programs NOTE: Processing takes long time!

Scripts: obtainObservations.py • Manages MET data pre-processing utilities, including ASCII2NC and PB2NC • Automatically accesses MADIS FTP server (U.S. only) or rda.ucar.edu(for PREPBUFR) to obtain files foreach case study date and hour • Requires MADIS account and edit .netrc file • PREPBUFR requires account at rda.ucar.edu • Runs ASCII2NC or PB2NC to create NetCDF files used by Point Stat (output in pointData directory) • sfcobs_YYYYMMDD_HHHH.nc • upperobs_YYYYMMDD_HHHH.nc • Also automatically downloads precipitation and gridded analysis data (latter not currently supported)

Model Verification Process with SPoRT-MET • Make UEMS model runs, archiving files in GRIB2 format with an appropriate naming convention • Acquire observational data • Surface and upper-air point observations • Select precipitation grids acquired automatically • Match forecast grids to observations • Generate error statistics in various ways: • An individual model run, or summary of multiple runs • Two or more experiments (e.g., sensitivity simulations) • Entire grid, or select geographical subset(s), or select station(s) • Plot results

Match forecast grid to observations: runSPoRTMETScripts_multiExperiments.py This script interpolates forecast grids to point observation locations and/or observed precip grids, and calculates differences Contains a series of TRUE or FALSE statements that are read from namelist.met.multiExperiments to determine which parts of MET will be run (e.g., sfc, upper-air, and/or precipitation) Uses utils/readfiles.py to extractnecessary information needed to run the scripts

Match forecast grid to observations, cont: runSPoRTMETScripts_multiExperiments.py Calls obtainObservations.py to auto-acquire precipitation grids Calls runPointStat.py, runPointStatAnalysis.py, runGridStat.py, and/or runGridStatAnalysis.py, depending on entries in namelist.met.multiExperiments This script is recommended to batch-process a series of forecast runs and/or multiple experiments With all options turned on, scriptwill produce overall errors for eachindividual model run by forecast hour

Model Verification Process with SPoRT-MET • Make UEMS model runs, archiving files in GRIB2 format with an appropriate naming convention • Acquire observational data • Surface and upper-air point observations • Select precipitation grids acquired automatically • Match forecast grids to observations for each model run • Generate error statistics in various ways: • An individual model run, or summary of multiple runs • Two or more experiments (e.g., sensitivity simulations) • Entire grid, or select geographical subset(s), or select station(s) • Plot results

Produce summary of composite results: runSPoRTMETScripts_aggregate.py This script aggregates errors (generates overall results) across multiple model runs, using the *.stat files produced by runSPoRTMETScripts_multiExperiments.py Contains a series of TRUE or FALSE statements that are read from namelist.met.aggregate to determine which parts of MET will be run (e.g., sfc, upper-air, and/or precipitation) Uses utils/readfiles.py to extractnecessary information needed to run the scripts

Produce summary of composite results, cont: runSPoRTMETScripts_aggregate.py Calls runPointStatAnalysis_aggregate.py, and/or runGridStatAnalysis_aggregate.py, depending on entries in namelist.met.aggregate This script is recommended to summarize forecast error results for any desired number of forecasts Script will produce overall errors for multiple consecutive model runsby forecast hour This script is also recommended forcreating plots to visualize, whetherplotting a single model run or aggregated results

Individual script components Python scripts called by the master run scripts already described

Scripts: runPointStat.py • Red Point Stat circle under Statistics in workflow • Runs Point Stat to interpolate gridded fields to observation locations (e.g. METARs and RAOBs) to generate text files to be read into Stat Analysis (temp. output in pointStatOutput/; archived in OUTPUT/pointStatOutput/YYYYMMDDCC/). Files produced are of the form: • point_stat_EXP_d##_VAR_LEV_FF0000L_YYYYYMMDD_VV0000V.stat • EXP = Experiment name • d## = WRF domain number (e.g., d01) • VAR = variable (e.g., TMP or DPT) • LEV = surface (sfc) or pressure level (PPPmb) • FF = forecast hour time • YYYYMMDD = valid year, month, & day • VV = valid hour

Scripts: runPointStatAnalysis.py • Red Stat Analysis circle under Analysis in workflow • Uses the *.stat files output from Point Stat to generate statistics comparing the model output to the observations (temp. output in pointStatAnalysisOutput/ ; archived in OUTPUT/pointStatOutput/YYYYMMDDCC/) • LOC_VAR_POLY_LEV_EXP_d##_MPR.dat • LOC = surface (sfc) or upper air (upa) • VAR = variable • POLY = verification subdomain (e.g. ‘USER’) • LEV = surface (sfc) or pressure level (PPPmb) • EXP = experiment name • d## = WRF domain number (e.g., d01) • All forecast hours are concatenated into space delimited .dat file

Scripts: runPointStatAnalysis_aggregate.py • Same as runPointStatAnalysis.py, except designed to aggregate over a series of consecutive daily forecasts (e.g., 7 or 30 days to examine weekly or monthly errors) • Uses the *.stat files output from multiple forecasts to generate statistics (temp. output in pointStatAnalysisOutput/ ; archived in OUTPUT/pointStatOutput/YYYYMMDDCC_#days/) • LOC_VAR_POLY_LEV_EXP_d##_MPR.dat • LOC = surface (sfc) or upper air (upa) • VAR = variable • POLY = verification subdomain (e.g. ‘USER’) • LEV = surface (sfc) or pressure level (PPPmb) • EXP = experiment name • d## = WRF domain number (e.g., d01) • #days = number of days of overall results • All forecast hours are concatenated into space delimited .dat file

Scripts: runGridStat.py • Red GenVxMask, PCP Combine, and Regrid Data Plane circle under Reformat; Grid Stat circle under Statistics • Reprojects forecast grids to observed grids and produces files of the differences (temp. output in gridStatOutput/ ; archived in OUTPUT/gridStatOutput/YYYYMMDDCC/) • grid_stat_EXP_d##_VAR_FF0000L_YYYYYMMDD_VV0000L_*.stat • EXP = Experiment name • VAR = variable name (e.g., APCP_06 for 6-h accumulated precipitation) • FF = forecast initialization time • YYYYMMDD = valid year, month, and day • VV = valid hour • d## = WRF domain number (e.g., d01) • Currently supports international satellite precipitation grids: CMORPH,IMERG-Late, and IMERG-Final

Scripts: runGridStatAnalysis.py • Red Stat Analysis circle under Analysis in workflow • Uses the *.stat files output from Grid Stat to generate statistics comparing the model output to the gridded verification dataset (temp. output in gridStatAnalysisOutput/; archived in OUTPUT/gridStatOutput/YYYYMMDDCC/) • VAR_POLY_sfc_EXP_d##_THRESH_NBHOOD_gtPCTCOV.dat • VAR = variable (e.g., APCP06) • POLY = verification subdomain • EXP = experiment name • THRESH = precipitation threshold (e.g., 5mm) • NBHOOD = # of surrounding grid points inneighborhood box (e.g., 7 x 7 box = 49 pts) • PCTCOV = fractional coverage in nbhood box • d## = WRF domain number (e.g., d01) • All forecast hours are concatenated into space delimited file

Scripts: runGridStatAnalysis_aggregate.py • Same as runGridStatAnalysis.py, except designed to aggregate over a series of consecutive daily forecasts (e.g., 7 or 30 days to examine weekly or monthly errors) • Uses the *.stat files output from multiple forecasts to generate statistics (temp. output in gridStatAnalysisOutput/ ; archived in OUTPUT/gridStatOutput/YYYYMMDDCC_#days/) • VAR_POLY_sfc_EXP_d##_THRESH_NBHOOD_gtPCTCOV.dat • VAR = variable (e.g., APCP06) • POLY = verification subdomain • EXP = experiment name • THRESH = precipitation threshold (e.g., 5mm) • NBHOOD = # of surrounding grid points inneighborhood box (e.g., 7 x 7 box = 49 pts) • PCTCOV = fractional coverage in nbhood box • d## = WRF domain number (e.g., d01) • #days = number of days of overall results • All forecast hours are concatenated into space delimited file

Scripts: makePlots.py and plotObs.py • At the very minimum, user must install the python modules numpy and matplotlib • makePlots.py uses the concatenated *.dat files from stat_analysis to generate plots of the statistics (output in plotOutput/) • Recommended for quick look at results; edits may be needed to utils/plotgenerate.py routines to produce publication-quality graphics • plotObs.py queries all .stat files to extract the lat/lon of every unique obs;plots all points by data type(i.e., surface/upper-air/ship or buoys) Open source plotting scripts to visualize stats

Set up scripts and namelist.met files for acquiring point observations runSPoRTMETScripts_pb2nc.py, namelist.met.conf.pb2nc, and pb2nc_loop.sh

runSPoRTMETScripts_pb2nc.py • Edit verifyInitHH, startFH, stopFH, and fint fields • verifInitHH: 2-digit model initialization hour • startFH: usually 0 (forecast hour zero or initialization) • stopFH: usually the last forecast hour of model run, but user can test over any range of forecast hours • fint: Output frequency of UEMS output GRIB2 files • Other fields to edit depend on user-specific setup • experimentList/experimentName: Leading text string on exported UEMS GRIB2 files (consistent with UEMS setup) • modelOutputDir: archived directory for each experiment • lowerLat/upperLat/leftLon/rightLon: approximate bounds of model grids for d01 and (if a nested grid) d02 • Note: SPoRT-MET currently supports only a 2-nested grid setup

namelist.met.conf.pb2nc • The only namelist block (section beginning with ‘&’) to edit is &DirectoryInfo: • RunDir: SPoRT-MET scripts top-level directory • EMSHomeDir: Installation directory of the UEMS (echo $UEMS) • METDir: Installation directory of MET software (top directory) • All other sections of namelist.met.conf.pb2nc are already set up for running pb2nc, and/or appropriate fields will be set up by runSPoRTMETScripts_pb2nc.py

pb2nc_loop.sh Edit startdate and stopdate to determine the range of dates for acquiring point observations Script then calls runSPoRTMETScripts_pb2nc.py to run data acquisition through MET’s pb2nc program The script is currently set up to acquire observations for a 2-day (48-h forecast), so it increments by 2 days To run, type:./pb2nc_loop.sh , or submit the job in the background, since it will take a LONG time!

Set up scripts and namelist.met files for matching forecasts with observations and computing differences namelist.met.conf.multiExperiments and runSPoRTMETScripts_multiExperiments.py

namelist.met.conf.multiExperiments • &DirectoryInfo: defines software directories • RunDir: SPoRT-MET scripts top-level directory • EMSHomeDir: Installation directory of the UEMS (echo $UEMS) • METDir: Installation directory of MET software (top directory) • &ModelDomain: (substituted by script) • &ForecastInfo: (substituted by script)

namelist.met.conf.multiExperiments, cont. • &ObservationInfo: controls acquisition of observations • ObtainMADIS: Only applies to U.S. MADIS mesonet data • ObtainPrepBUFR: (substituted by script; sets to TRUE only on first model grid d01 to acquire data) • Set the Use* variables to true to use a dataset; set to false to exclude from verification • TimeRange* variables tell MET to match up observations that fall within ±n minutes of the forecast valid time (15 for Sfc; 30 for Upper) • Useful for stations that do not always report exactly at the top of the hour when the forecasts are valid • Set the *QCBounds for each variable to the lower and upper bounds of realistic observations for the elevations / time of year being verified • Set the ObtainPrecipitation variable to TRUE to obtain precipitation observations; set to FALSE if only doing point verification or to obtain the precipitation data manually

namelist.met.conf.multiExperiments, cont. • &ObservationInfo, cont: • GriddedPrecipitationVerificationAnalysis: Currently supports STIV, CMORPH, IMERGFNL, or IMERGL • PrecipitationFormat: NATIVE or GRIB2 • Applies only to CMORPH (NATIVE or GRIB2) or IMERGL (GRIB2 only) • Currently using IMERGL in GRIB2 format from SPoRT’s ftp server • PrecipitationRegion: subset name of CMORPH or IMERGLfor using GRIB2 files; set to “africa” for East Africa verification • ObtainGrids: set to TRUE to verify against a larger-scale analysis product (e.g., GFS or ECMWF analyses); [not currently supported for global grids] • GriddedVerificationModel: model name (e.g., GFS); [not currently supported]

namelist.met.conf.multiExperiments, cont. • &METInfo: high-level control over running MET components • Run*: set to TRUE to run each component of the MET package; set to FALSE to not run selected components (unless testing, these should be all TRUE) • RunAggregate: set to FALSE for multiExperiments script • NumDaysAggregate: List of days separated by commas over which to aggregate statistics; does not apply to multiExperiments script • PressureLevels: list of upper-air pressure levels (in hPa) separated by commas (no spaces) to be verified by either Point Stat and/or Grid Stat • VerificationRegions: pre-defined [NCEP] verification region on which to verify; USER for a user-defined domain • NOTE: New custom verification regions (simple lat/lon pairs in ASCII files) can be added to MET software in $MET/data/poly. Just examine sample *.poly files for examples. Can be created easily from shape files!! • UserVerify*: lower left and upper right corners of user-defined verification grid. (substituted by script; same as lat/lon values in &ModelDomain block)

namelist.met.conf.multiExperiments, cont. • &PointStatInfo: provides the information to run Point Stat • UseVerifySurfacePoint/UpperPoint: set to TRUE to verify against surface and/or upper-air observations, respectively • Surface/UpperPointVerificationVariables: GRIB table variable name for variables on which to perform verification(e.g., Surface: TMP,DPT,WIND,PRMSL; Upper: TMP,DPT,WIND) • VerticalObsWindow: Vertical pressure range (hPa) over which upper-air observations will be accepted for forecast matching • StatsFilter: Easiest to just set this to MPR for now

namelist.met.conf.multiExperiments, cont. • &GridStatInfo: provides the information to run Grid Stat • NeighborhoodGridBox: comma-separated list (no spaces) that provides the width of the neighborhood grids over which verification is performed • Must be an odd number • If set to 1, will only do grid point to grid point matching • PercentCoverage: determines percentage of pixels in neighborhood box where forecast meets threshold criterion (i.e., “hit”); also used to determine fractional skill scores (see MET presentation) • UseVerify*: set to TRUE to verify against precipitation and/or surface/upper-air gridded analyses

namelist.met.conf.multiExperiments, cont. • &GridStatInfo, cont: • AccumulatedPrecipitationHours: list of comma-separated accumulated precipitation hours (totals must be >= forecastVerifyInterval) • AccumulationMethod: STAGGER or OVERLAP • If STAGGER, then stats will be produced at time intervals equivalent to AccumulationPrecipitationHours. (e.g., 12-h accumulated precip. every 12 hours at 12, 24, 36, etc.) • If OVERLAP, then precip stats will be produced in overlapping time windows equal to forecatVerifInterval, or output frequency of UEMS GRIB2 files (e.g., 12-h accumulated precip. every hour at 12, 13, 14, etc.). NOTE: This option produces considerably more precip output statistic files, but nice plots! • PrecipitationThresholds: list of comma-separated precipitation thresholds intensities (in mm) • Surface/UpperGridVerificationVariables: GRIB table variable name of variable on which to verify (same as in &PointStatInfo)

namelist.met.conf.multiExperiments, cont. • &PlottingInfo: produces quick-look plots using python’s matplotlib plotting utility • MakePlots: set to TRUE to generate plots or FALSE to make your own plots from the ASCII *.dat output files (e.g., import to Excel) • PlotStationLocations: set to TRUE to generate plots on each model grid of the stations used to produce verification statisitcs • ContinuousPlotStatistics: point error stats plots to generate • PrecipitationPlotStatistics: precipitation stats to generate • PlotColors: color of line(s) for each forecast (in same order as in ExperimentNamesvariable under &ForecastInfo (defined in runSPoRTMETScripts_multiExperiments.py; next slide)

runSPoRTMETScripts_multiExperiments.py • Script requires 3 input arguments • verifyInitYYYY: 4-digit year of model initialization • verifyInitMM: 2-digit month of model initialization • verifyInitDD: 2-digit day of model initialization • Edit verifyInitHH, startFH, stopFH, and fint fields • verifInitHH: 2-digit model initialization hour • startFH: usually 0 (forecast hour zero or initialization) • stopFH: usually the last forecast hour of model run, but user can test over any range of forecast hours • fint: Output frequency of UEMS output GRIB2 files • Edit experiment and domain parameters • experimentList/experimentName: Leading text string on exported UEMS GRIB2 files (consistent with UEMS setup) • domainList: ‘01’ (for only a single grids); ‘02’ for a nested grid • modelOutputDir: archived directory for each experiment • lowerLat/upperLat/leftLon/rightLon: approximate bounds of model grids for d01 and (if a nested grid) d02 • To run, type:./runSPoRTMETScripts_multiExperiments.py YYYY MM DD

Finally, set up scripts and namelist.metfor generating summary statistics across multiple forecast runs namelist.met.conf.aggregate and runSPoRTMETScripts_aggregate.py

namelist.met.conf.aggregate • Only a few settings should change compared to namelist.met.conf.multiExperiments: • &ObservationInfoObtainPrecipitation: set to FALSE • &METInfoRunPointStat: set to FALSE • &METInfoRunGridStat: set to FALSE • &METInfoRunPointStatAnalysis: set to TRUE • &METInfoRunGridStatAnalysis: set to TRUE • &METInfoRunAggregate: set to TRUE • &METInfoNumDaysAggregate: set to desired list of days (e.g., “7,30”) • &PlottingInfoMakePlots and PlotStationLocations: TRUE if one wants to generate error plots • It is recommended to make plots only after running multiExperiments script and/or aggregate scripts. • Multiple instances of MakePlots can be run for individual forecast plots (RunAggregate = FALSE) or summary plots across multiple runs (RunAggregate = TRUE and NumDaysAggregate set to a list of days) • It is recommended to use aggregate script and namelist to generate plots

runSPoRTMETScripts_aggregate.py • Script requires 3 input arguments • verifyInitYYYY: 4-digit year of model initialization • verifyInitMM: 2-digit month of model initialization • verifyInitDD: 2-digit day of model initialization • Edit verifyInitHH, startFH, stopFH, and fint fields • verifInitHH: 2-digit model initialization hour • startFH: usually 0 (forecast hour zero or initialization) • stopFH: usually the last forecast hour of model run, but user can test over any range of forecast hours • fint: Output frequency of UEMS output GRIB2 files • Edit experiment and domain parameters • experimentList/experimentName: Leading text string on exported UEMS GRIB2 files (consistent with UEMS setup) • domainList: ‘01’ (for only a single grids); ‘02’ for a nested grid • modelOutputDir: archived directory for each experiment • lowerLat/upperLat/leftLon/rightLon: approximate bounds of model grids for d01 and (if a nested grid) d02 • To run, type:./runSPoRTMETScripts_aggregate.py YYYY MM DD

Thank you for your attention! Clear as mud now? [Prepared for MANY questions…]

SPoRT-MET Scripting Package Tutorial for Regional NWP Model Verification