1 / 13

14 January 2009

14 January 2009. 2009 AMS Artificial Intelligence Conference A Data Mining Approach to Soil Temperature and Moisture Prediction. Bill Myers Seth Linden, Gerry Wiener. Project Overview and Goals. Improve soil temperature and moisture prediction Integrate and Evaluate NASA-MODIS data sets

Download Presentation

14 January 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 14 January 2009 2009 AMS Artificial Intelligence ConferenceA Data Mining Approach to Soil Temperature and Moisture Prediction Bill Myers Seth Linden, Gerry Wiener

  2. Project Overview and Goals • Improve soil temperature and moisture prediction • Integrate and Evaluate NASA-MODIS data sets • Leaf Area Index (LAI) • Green Vegetation Fraction (i.e. FPAR) • Albedo • Deliver tailored products to end users • Soil forecasts will drive Agriculture-specific models (e.g. pest models) • RAL partnered with DTN/Meteorlogix • DTN DSS delivers Ag-specific forecasts to 80,000 users

  3. Solar Energy Weather Subsurface Nodes Fixed Node Soil State Prediction • Current soil state modified by atmospheric forcing conditions • Heat and moisture are transferred between adjacent nodes • Typically done with a physical model, called a Land-Surface Model (LSM)

  4. Physical Model • This project uses the High Resolution Land Data Assimilation System and the Noah LSM • Used by NCEP as part of the NAM (WRF model) • Many parameters are necessary to model soil type and land surface characteristics • Affect incident solar energy, heat transfer, etc • Parameters must be generalized • “Sandy loam” will have same parameterization at all sites • Chemical compositions of “sandy loam” differ between sites • Heat and moisture transfer will not be exact at ANY site • Goal of this study: Determine if a data mining approach can produce results comparable to those of the physical model

  5. Data Mining System • Regression Tree (Cubist) • Available from www.rulequest.com • Looks for patterns in data • Builds rule-based numerical models • Rules are developed based on training data • At each leaf node, a regression equation is developed that best fits that subset of the training data • Effectively, linear approximations are being made when certain conditions are met • Soil state forecasts are generated by applying rule set to forecast data • Training Data • 29 Soil Climate Analysis Network (SCAN) sites • Two years of observational history at each site used to develop rules • NCAR scientists were consulted to determined most important inputs to soil state evolution • These were extracted or derived from observed variable set

  6. Regression Tree Model Generation • 10 Regression trees were developed for each site • One regression tree for soil temperature and soil moisture at each depth (5, 10, 20, 50, 100 cm) • Input variables: • Julian day • Air Temperature • Delta air temperature (in current hr) • Downward Shortwave Radiation • Wind Speed • Dew point temperature • Precip amt • Previous soil state: • Previous hour’s soil temperature and moisture at adjacent depths • A target variable (e.g. Current Soil Temp at 5 cm) was provided with each hour’s data

  7. Example training data • | Names file for 5cm temperature prediction • ST5_curr | Predictand in list of variables below • siteID: ignore | SCAN site ID • date: ignore | YYYYMMDDHH • mon: continuous | fraction of Julian year • AirT: continuous | 2m air temp (avg over last hr) • deltaT: continuous | air temp change over last hour • dsw: continuous | avg downward shortwave radiation over last hr • wspd: continuous | avg wind speed over last hour • TD: continuous | avg dew point temp over last hour • qpf: continuous | precip amt over last hour • ST5_prev: continuous | 5 cm soil temp at previous hour • ST10_prev: continuous | 10 cm soil temp at previous hour • SM5_prev: continuous | 5 cm soil moisture at previous hour • SM10_prev: continuous | 10 cm soil moisture at previous hour • ST5_curr: continuous | 5 cm soil temp at previous hour Sample line of training data 2001, 2007110211, 0.9167, 4.53, -0.89, 0.00, 2.81, -3.28, 0.00, 8.158, 9.847, 33.858, 39.616, 8.32 Time of year Wind Speed No Precip Previous hour’s soil moisture at 5 cm and 10cm Air Temp Dewpoint Temp Current hour’s 5 cm Soil T (Predictand) Previous hour’s soil temperature at 5 cm and 10cm Air Temp Falling in this hour No downward Radiation (night)

  8. Rules Development and Application • Regression Trees generated for each predictand at each site • Separate tree for Soil Temperature and Moisture at each depth • Two years of training data for most sites • Example rule and associated regression: if dsw <= 0.09 and ST5_prev > 12.05 ST5_curr = -0.211 + 0.3165 dsw + 0.83 ST5_prev + 0.13 ST10_prev + 0.02 AirT + 0.02 TD • 48 hour forecasts were generated iteratively • Starting with observed soil state and first hour’s weather predictions • Regression trees were applied for each predictand to generate forecast state at hour 1 • Using the forecast soil state and weather predictions, the next hours’ forecasts were generated iteratively • Soil forecasts generated for 2007 growing season (April-June) • Data Mining and HRLDAS forecasts were compared to observations

  9. Results • Statistically, data mining better than HRLDAS at nearly all the 29 SCAN sites • Median (and quartile) MAEs significantly lower for data mining • Data mining errors generally 30%+ lower than HRLDAS errors

  10. Summary • Data mining with Cubist Regression Trees • Reduces soil temperature and moisture errors • Simple to develop rules • Rules/Regressions can be displayed easily • Regression Tree forecasts tuned to the site • HRLDAS forecast parameters are more generic • Applicability to non-observing sites • Rules, as developed are site specific • Not valid away from that location • HRLDAS can generate forecasts at any location • Observing sites do not begin to cover all land use and soil type combinations

  11. Future Directions • Add vegetation state (from NASA MODIS data) to data mining training sets to determine see these results can be improved upon • Train Cubist with all obs sites lumped together but include land use and soil type as input variables • Investigate combining data mining approach and LSM to get best of both

  12. Acknowledgements • This research effort has been supported by a NASA-ROSES grant. • We appreciate the help provided by personnel at the USDA Natural Resources Conservation Service, and various NASA labs. • Soil forecast web site: • www.rap.ucar.edu/projects/nasa-ag/ • hrldas/display_hrldas_animation.html • Cubist is available at www.rulequest.com

More Related