1 / 32

Improving HYSPLIT Forecasts with Data Assimilation*

Improving HYSPLIT Forecasts with Data Assimilation*. Kostas Kalpakis Associate Professor Computer Science and Electrical Engineering Department University of Maryland Baltimore County April 5, 2011 Joint work with Shiming Yang and Yaacov Yesha. * Supported in part by an IBM grant.

asha
Download Presentation

Improving HYSPLIT Forecasts with Data Assimilation*

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving HYSPLIT Forecasts with Data Assimilation* Kostas Kalpakis Associate Professor Computer Science and Electrical Engineering Department University of Maryland Baltimore County April 5, 2011 Joint work with Shiming Yang and YaacovYesha *Supported in part by an IBM grant.

  2. Outline • Introduction • Motivation and Goal • Data Assimilation • Our approach • State-Space Models • The NOAA HYSPLIT Model • The LETKF Algorithm • Experiments and Evaluation • CAPTEX • California wildfires, August 2009 • Summary

  3. Motivation • High volume real-time sensor data streams for monitoring and forecasting applications are becoming ubiquitous • Bridging the gap between predictions and real-time observations is needed • Demands for environmental monitoring and hazard prediction are pressing • Need to incorporate measurements from the thousands of sensors that underlie IBM’s “smarter Planet” initiatives into various geophysical processes

  4. Goals • Our goals are to • incorporate a data assimilation capability into HYSPLIT • HYSPLIT is extensively used as a routine for many data products • utilize in-situ and remotely sensed observations for improved forecasts • apply to wildfire smoke prediction and monitoring • develop efficient data assimilation system using InfoSphereStream’s SPADE framework for distributed high-performance platforms

  5. Data assimilation • Data assimilation is a set of techniques that • Incorporate real world observations into model analysis and forecast cycle • Help reduce model error growth (small correction and short range forecast) • Improve upon the estimation of model initial conditions for the next forecast cycle

  6. The state-space model • Model a system by • Where

  7. Data assimilation in state-space • Data assimilation becomes an estimation problem • Find a maximum likelihood estimate of the trajectory of the system states given a set of observations • Problem reduces to minimizing the cost function • Kalman filters, a recursive method, can be used to minimize this cost function efficiently for low-dimensional state space, with linear model and observation operators, and Gaussian noise processes • Otherwise, the problem is often computationally difficult

  8. Data assimilation via Kalman filters Background state … • Graphical view of data assimilation using Kalman filters Analysis state - - - Observation time

  9. The NOAA HYSPLIT Model • HYSPLIT • Hybrid Single Particle Lagrangian Integrated Trajectory Model • A model system that computes air parcel trajectories, dispersion and deposition of pollutants • Computes particle dispersion with the puff model or the particle model • Needs meteorology data and emission source information • Has been validated using ground truth observations* • Used as a routine for various data products • Air Quality Index (AQI) • Smoke Forecast System (SFS) *R.R. Draxler, J.L. Heffter, and G.D. Rolph. Datem: Data archive of tracer experiments and meteorology. August 2001. http://www.arl.noaa.gov/DATEM.php, last checked Jul. 2010

  10. System design

  11. Data assimilation for HYSPLIT • Utilize HYSPLIT as a model operator in a state-space model and assimilate observations into HYSPLIT • First, we need to carefully define the system state, so that we can extract it, modify it, and restart HYSPLIT • Second, since the model operator is non-linear and the system state is very large, standard extended Kalman filters are an expensive option for data assimilation • We use the LETKF algorithm, an ensemble transform Kalman filter

  12. Data assimilation for HYSPLIT • Use • the mass of the particles in HYSPLIT as the system state • the grid concentrations as the default observation operator

  13. LETKF Algorithm • LETKF (Local Ensemble Transform Kalman Filter)* • nonlinear model operators, linear observation operators • Gaussian state and observational noise processes • Reduces implementation costs since it does not need adjoints • It does analysis locally in the ensemble space • which is typically of low dimension (< 100) • avoids inverses of large matrices • It is embarrassingly parallel • We have implemented LETKF in C with MPI, and in IBM InfoSphereStreams *Brian Hunt, Eric Kostelich and Istvan Szunyogh, “Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter”, Physica D 230, pp 112-126, 2007.

  14. The LETKF Algorithm • Global steps: maintain an ensemble of K system states • Forward system state: • Analysis: construct background analysis ensemble , background observation ensemble , and their mean and covariance matrices • Local steps: for each grid point, choose local observation and background system state. Then calculate: • Analysis error covariance: • Perturbation: • Analysis ensemble in ensemble space: • Analysis ensemble in state space:

  15. Implementation using IBM InfoSphereStreams • InfoSphereStreams is a system developed by IBM for the very fast processing of large and fast data streams that supports • parallel and high performance stream processing • continuous ingestion and analysis • scaling over a range of hardware capabilities • flexible to changing user objectives, available data, and computing resource availability • the bursty nature of real-time observations of rapidly evolving physical phenomena • Uses SPADE to describe the stream operators

  16. SPADE Implementation Flowchart

  17. Experiments and evaluation • Experimentally evaluate our approach using the controlled releases of tracers available in DATEM datasets • Demonstrate our approach using in-situ and remotely sensed real data from a California fire in August 2009 • Observation and emission rates are taken from EPA AQS and GBBEP, and MODIS AOD when available

  18. Evaluation metrics • We use HYSPLIT’s statmain to compute evaluation metrics for a HYSPLIT forecast with respect to the ground truth • We report on the following metrics • The Normalized Mean Squared Error (NMSE) • The model rank, an overall quality of the model (larger values are better; the maximum value is 4).

  19. CAPTEX • CAPTEX (Cross-Appalachian Tracer Experiment) • Time: 2100 UTC Sep 18 to 2100 UTC Oct 29, 1983 • Area: U.S. and Canada • 6 releases (3hr duration each) of special tracer (PFT). • emission sources and rates are those in DATEM • Use DATEM CAPTEX observations as the ground truth • Observations at 84 stations every 3 hrs for 48 hrs after each release • Run 160 iterations, each iteration simulating a 3hr time period

  20. CAPTEX After 3hr Forecasts with data assimilation After 6hr After 9hr After 12 hr

  21. CAPTEX • CAPTEX with and w/o data assimilation

  22. CAPTEX • CAPTEX with and w/o data assimilation

  23. Modified CAPTEX • To assess whether our approach improves the forecasts given inaccurate emissions rates, we do the following • Use the CAPTEX concentrations as ground truth • Run HYSPLIT with modified emissions rate for CAPTEX in two modes (with and w/o data assimilation) • For the 2nd release that begins at 1700 UTC 25 Sep. 1983 use the emission rate of 33.5 Kg/h instead of the 67Kg/h given in DATEM • Compare with unmodified CAPTEX emissions w/o data assimilation

  24. Modified CAPTEX

  25. California wildfire, August 2009 • Experiments to forecast particulate matter PM2.5 concentrations from a wildfire in California on August 2009 • Data used • Ground observations from EPA’s Air Quality System (AQS) (hourly obs) • Satellite observations from • Terra/Aqua MODIS Aerosol Optical Depth (AOD) (daily obs) • Geostationary Operational Environmental Satellite (GOES) East/West AOD (hourly obs) • Emission rates from GBBEP (GOES-E/W Biomass Burning Emission Product) (hourly obs) • Data for SO2, NOx, CO, CO2, relative humidity are also available from these data sources but not used

  26. California wildfire, August 2009 • Experiment using AQS observations and GBBEP emission rates • Time: 2100 UTC Aug 9 to 2100 UTC Aug 20, 2009 • Area: California and Nevada • use hourly AQS data as ground truth observations • use GBBEP hourly PM2.5 emissions from 2019 source points • emission rates range from 200g/hr to 10Kg/hr • each iteration simulates a 1hr period

  27. California wildfire, August 2009 • AQS+GBBEP

  28. California wildfire, August 2009 • AQS+GBBEP

  29. California wildfire, August 2009 • AQS+GBBEP

  30. Summary • Our data assimilation system: • demonstrates improvement on statistical metrics, e.g. average 16.0% improvement on NMSE in DATEM/CAPTEX • uses state-of-the-art prediction model and assimilation algorithm • shows that LETKF offers good algorithmic efficiency • can easily utilize other models and multiple data sources • Uses data sources from ground sites and satellites for pollutant concentration and emission rates • Can be extended to other domains, e.g. volcanic ash • Demo website: • http://bluegrit.cs.umbc.edu/~shiming1/demo/

  31. Acknowledgments • We would like to thank • IBM for its generous support, and the InfoSphereStream team for its indispensible help • Drs. Ben Kyger and Roland Draxler for providing the HYSPLIT model and answering many of our questions • Dr. Milt Halem for his encouragement and support, and the Multicore Computing Center at UMBC for providing the computing environment • Dr. Hai Zhang of the UMBC Atmospheric Lidar Group, for his help on MODIS AOD • NASA for the MODIS data, NOAA for the GOES, GBBEP, and DATEM data, and EPA for the AQS data

  32. Thank you.

More Related