1 / 23

Evaluating the Uncertainty of Land-Use Regression Models

Halûk Özkaynak US EPA, Office of Research and Development National Exposure Research Laboratory, RTP, NC Presented at the CMAS Special Symposium on Air Quality October 13, 2010. Evaluating the Uncertainty of Land-Use Regression Models. Land-Use Regression (LUR) Models.

Download Presentation

Evaluating the Uncertainty of Land-Use Regression Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Halûk Özkaynak US EPA, Office of Research and Development National Exposure Research Laboratory, RTP, NC Presented at the CMAS Special Symposium on Air Quality October 13, 2010 Evaluating the Uncertainty of Land-Use Regression Models

  2. Land-Use Regression (LUR) Models • Widely-used methodology for estimating individual exposure to ambient air pollution in epidemiologic studies

  3. Able to capture smaller-scale variability in community health studies Less resource intensive – Easier to develop and apply compared with other methods for measuring or estimating subject-specific values (e.g., household measurements, physical modelling) Land-use data widely available LUR Strengths

  4. Inputs Require accurate monitoring data at large number of sites - e.g., in highly industrialized urban areas with many types of emission sources Application in health studies Not transferable from one urban area to another Do not address multi-pollutant aspects of air pollution Lack the fine-scale temporal resolution needed for estimating short-term exposure to air pollution Often estimate ambient air pollution only versus indoor and personal Lack the ability to connect specific sources of emissions to concentrations for developing pollution mitigation strategies LUR Limitations

  5. Use air pollution predicted by coupled regional (CMAQ) and local (AERMOD) scale air-quality models Develop and evaluate land-use regression models for: Benzene Nitrogen oxides (NOx) Particulate matter (PM2.5) Examine (in future) the implications of alternate LUR development strategies on model efficacy for multiple pollutants Analysis Goals: New Haven Case Study* Source: Johnson, M., Isakov, V., Touma, J.S., Mukerjee, S., and Özkaynak, H. (2010). Evaluation of Land Use Regression Models Used to Predict Air Quality Concentrations in an Urban Area. Atmospheric Environment, Vol. 44, pp: 3660-3668.

  6. Air pollution concentrations were predicted at 318 census block group sites in New Haven, Connecticut using a coupled air quality model (Isakov et al., 2009) Air Pollution Data • Predicted daily concentrations for 2-month periods in winter and summer (2001) were used to calculate seasonal average concentrations for benzene, NOx, and PM2.5 at each site • July- August for summer • January- February for winter • Annual averages were based on 365 daily means for 2001 Isakov et al. 2009. Journal of the Air and Waste Management Association; 59(4):461-472.

  7. LUR Model Structure and Inputs • Multivariate linear regression models • Initial pool of 60 potential predictors • Eliminated variables based on • High correlation (R-squared ~1.0) with other selected predictors and/or • Lack of interpretability 19 land-use variables included in model selection

  8. Site Selection • Sites • Census block group centroids • Training Sites • Sites used to fit LUR models • Selected from 318 census block groups in the study area • Stratified random selection among 4 census regions • Test Sites • Remaining sites withheld from training set - minimum of 10% used for independent model evaluation

  9. Variable selection Examined correlation structure for predictive variables Model development All subsets with 3-7 independent predictors Model selection based on AIC, Mallow’s C(p), adjusted r-squared, and variance inflation factor Model evaluation Cross-validation within training dataset Hold-out evaluation within test dataset Models for multiple pollutants and training sites Benzene, NOx, PM2.5 25, 50, 75, 100, 125, 150, 200, and 285 Automated, iterative process Site selection -> model development Repeated 100x for each pollutant and number of training sites Model Development and Evaluation

  10. Model Performance in Test versus Training Sites: Benzene

  11. Model Performance in Test versus Training Sites: NOx

  12. Model Performance in Test versus Training Sites: PM2.5

  13. LUR Prediction Errors: NOx • Prediction error = • Average (+/- SD) of mean predicted minus observed input values • For 100 iterations - aka 100 LUR models • Analyzed by low, medium, and high NOx concentration based on total NOx distribution • Low = 0 - 25th percentile • Medium = 25th - 75th • High = 75th - max

  14. Rotterdam Area LUR versus Dispersion Model (Hoek et al., 2010) ) F Dispersion Model LUR Model

  15. LUR Model Evaluation in Oslo from Hoek et al., 2010 Courtesy: Christian Madsen (Oslo)

  16. Comparison of Two LUR Models for Amsterdam (Hoek et al., 2010)

  17. Comparison of Two LUR Models for Amsterdam Denoting Sites Impacted by Traffic or Urban Sources (Hoek et al., 2010)

  18. Summary and Conclusions • We used air pollution concentrations predicted by coupled regional and local scale AQ models to develop and evaluate LUR models in New Haven, CT for benzene, PM2.5, and NOx • Model performance and robustness improved as number of sites used to build the models increased • R-squares were inflated for models based on pollutant concentrations from 25 trainings sites compared with models based on 100 -285 training sites • R-squared for LUR model (training dataset) and R-squared predicted versus observed (test dataset) converged as training sites increased • It is critical to evaluate LUR performance using site-specific independent measurement data sets • Analysis suggests that coupled air quality models could provide a useful tool for improving LUR estimates of exposure to ambient air pollution in epidemiologic studies • LUR model performance may be considerable poorer than emissions based modeling results for urban environments with complex sources and landscape characteristics

  19. Markey Johnson Vlad Isakov Joe Touma Shaibal Mukerjee Luther Smith (Alion Incorporated) Ellen Kinnee (Computer Science Corporation) Acknowledgements* *Although this work was reviewed by EPA and approved for publication, it may not necessarily reflect official Agency policy

  20. Additional Slides

  21. Models Based on 25 Training Sites Models Based on 285 Training Sites LEGEND Mean Contribution of Land-Use Factors in Benzene Models Models Based on 100 Training Sites

  22. Models Based on 25 Training Sites Models Based on 100 Training Sites Models Based on 285 Training Sites LEGEND Mean Contribution of Land-Use Factors in NOx Models

  23. Mean Contribution of Land-Use Factors in PM2.5 Models Models Based on 25 Training Sites Models Based on 100 Training Sites Models Based on 285 Training Sites LEGEND

More Related