1 / 30

Stefan Falke Center for Air Pollution Impact and Trend Analysis

Demonstration of a Distributed Emissions Inventory using Web Technologies. Compilation and Design of a Functioning Distributed Database of North American Electric Generating Emissions. Stefan Falke Center for Air Pollution Impact and Trend Analysis Washington University in St. Louis

britain
Download Presentation

Stefan Falke Center for Air Pollution Impact and Trend Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Demonstration of a Distributed Emissions Inventory using Web Technologies Compilation and Design of a Functioning Distributed Database of North American Electric Generating Emissions Stefan Falke Center for Air Pollution Impact and Trend Analysis Washington University in St. Louis Gregory Stella Alpine Geophysics, LLC Terry Keating US EPA – Office of Air & Radiation

  2. Background Air pollutant emission inventories for the US, Canada, and Mexico are compiled, stored and disseminated using different methods The development of a single comprehensive and accurate emissions inventory is essential for the coordinated reporting, policy development, transport analyses, and socio-economic studies that create an environment for collaboration among international researchers, policy-makers, and the interested public In support of this longer term goal, the Commission on Environmental Cooperation (CEC) and the US EPA have initiated a project to develop a prototype web tool for enabling uniform access to distributed emissions data from North American electricity generating power plants.

  3. Distributed Data and Management Networks Cyberinfrastructure NSF’s initiative to apply new IT to building new ways of conducting collaborative research http://www.communitytechnology.org/nsf_ci_report/ Earth Observation Summit International effort to build comprehensive, coordinated, and sustained Earth observation systems http://www.earthobservationsummit.gov Ecoinformatics EPA’s vision for national and international cooperation in data and technology development http://oaspub.epa.gov/sor/user_conference$.startup Advances in information science and technology are driving the trend toward distributed networks and virtual communities for science and management. Integrated Ocean Observing System International network of ocean related monitoring, assessment, and communication http://www.ocean.us/ Linked Environments for Atmospheric Discovery Network of high-performance computers and software to gain new insights into weather http://lead.ou.edu/ Virtual Observatory Network for astronomical data sharing and distributed analysis http://www.us-vo.org/

  4. Emissions Community Collaborative Activities • NIF Data standards • Standard format and submission • NEI XML schema • Environmental Information Exchange Network • Network linking EPA, States, and other partners through the Internet and standardized data formats • Facility Registry System • Standard facility codes and locations • Data Sharing Efforts • States, Tribes, Local agencies, RPOs • North America

  5. NEISGEI Networked Environmental Information System for Global Emissions Inventories …is both a conceptual framework and implementation effort for the development of a fully integrated, distributed air emissions inventory – and the foundation for an all-media environmental information network • Tie together data at all spatial and temporal scales using emerging distributed database technologies • Provide shared, online tools for processing and analysis • Provide for the seamless merging, manipulation and analysis of Internet accessible air quality-relevant data through the development of emerging Internet-oriented technologies • Make use of existing resources – link and partner with other efforts • Build a broad-based air quality user community: scientists, regulators, policy analysts and the public • Create the network and toolkit piece-wise through multiple, connected projects Ongoing Efforts: NSF-EPA Digital Government Funded Projects: The California Air Resources Network Fire & Air Quality Data and Tools Network Future Effort: EPA OAR RFA on Distributed Air Quality Data in Support of NEISGEI

  6. US EPA States AQMDs Municipalities Tribes CAREN: The California Air Resources Network Eduard Hovy, Jose-Luis Ambite, Andrew Philpot USC Information Sciences Institute • Environmental data sharing among international, national, state and local governments, the public and academic and other non-governmental research organizations is a difficult challenge. • Barrier: Technological incompatibilities • Barrier: Data format incompatibilities • Barrier: Financial (staff time) limitations RPOs • The Solution Strategy (First Step): • Automate the integration of heterogeneous databases Use semi-automated information integration methods to generate translation protocols between related information sources, e.g. AQMD and CARB. ???

  7. Fire and Air Quality Data Network Data Sources Data Wrappers CAPITA BLM Fire History FS Coarse Spatial Data Data Catalog NEI Fire Emissions ftp HMS Fire Detection Wildland Fire Assessment System text tables RDBMS Spatial Interpolation Service VIEWS • Data ‘wrappers’ are used to translate the format of data sets into a uniform format. • The data are either stored on the CAPITA database server or dynamically accessed from its original source • The datasets are registered with metadata in the data catalog • GIS-type interfaces provide users with ways to view, analyze, and export the data

  8. Envisioned Emissions Community Resource of Data & Tools Users & Projects Data Data Catalogs Geospatial One-Stop Wrappers XML Emissions Inventory Catalog Report Generation RDBMS Data Analysis Emissions Inventories Web Tools/Services Activity Data Comparison of Emissions Methods Spatial Allocation GIS Emissions Factors Model Development Estimation Methods Transport Models Surrogates

  9. Current Project Objectives • Recommend and demonstrate to the CEC approaches for the comparability of techniques and methodologies for data gathering and analysis, data management, and electronic data communications for promoting access to publicly available electric utility emissions • Identify, collect, and review existing sources of electric generating utility (EGU) emissions and activity databases, and provide a summary of the state-of-science • Build a prototype web browser tool to query, retrieve, and explore emissions data from heterogeneous databases • Demonstrate the utility of new information technologies in creating an integrated network of distributed data and tools • Dynamically link multiple existing data without requiring substantial modification on the provider’s end and provide interfaces that make the links transparent to the end user. • “Add value” to the linked data through the application of data analysis and processing tools The project’s focus is on criteria pollutants and toxics because of their availability and accessibility.

  10. Design Objectives • Distributed • Non-intrusive to data provider • Transparent to end user • Flexible • Extendable • Light on user requirements

  11. Process of Building Demonstration • Identify and access relevant data (build wrappers) • Build relational database to temporarily store the data that are not accessible in a distributed manner • Acquire authorization and access to those data that are dynamically accessible through internet interfaces • Create field name “mappings” among datasets • Identify available web technologies for building a distributed emissions tool • Develop new components necessary for the prototype • Build web tool prototype for demonstrating the feasibility of exploring emissions data

  12. Available Internet Accessible Emissions Data These are publicly available, on-line accessible emissions data. Other data resources are available, but at this time only in hard copy form and therefore not usable in demonstrating distributed database concepts. • NEI, NPRI, and eGrid data were downloaded and stored in relational databases on the CAPITA server • BRAVO Mexican emissions data were obtained in electronic format and imported to the CAPITA server • Clean Air Markets was identified as the source most suitable for demonstrating distributed access

  13. Emissions Data Characteristics Web browser query Web map server Recent database structure upgrade Web browser query Remotely accessible using SecuRemote Not yet publicly accessible Web browser query Remotely accessible using SecuRemote Oracle database Downloadable Excel Spreadsheets Plans for a dynamic web system were shelved

  14. Database Fields Mapping Emissions inventories are based on different underlying data models. Each inventory uses a uniquely defined set of field names. However, many of these field names are similar to (or their content is similar to) fields in another country’s inventory. Some of the key relationships among the inventories have been captured by developing a “mapping” among fields. These mappings provide a set of connections that can subsequently be applied to automated query and integration of data from multiple inventories. SO2 SO2_Ann SO2 SulfurDioxide SO2Yr

  15. Leveraging Multiple Projects The challenges in distributed information systems should be addressed by collaborative efforts across governments, agencies, researchers, and disciplines. Common underlying goals among projects provide opportunities to naturally design systems to interoperate Avoids one-time, stand alone solutions that cannot be reused

  16. DataFed.Net The Aerosol Data and Services Federation, (http://www.datafed.net), is a network of providers and users for sharing atmospheric data and processing services. DataFed includes a Community of participants who share and use data and processing services, Mediator software component to homogenize data access and Peer-to-peercomputing for composing web applications. R. Husar, CAPITA

  17. Catalog of Air Quality Related Data The emissions data are registered in the DataFed.net catalog select data domain: aerosols, emissions, fire … Metadata for each dataset are registered in a catalog allowing users to browse available datasets and determine which datasets to use for their particular application. The catalog entries includes data access instructions. http://capita.wustl.edu/dvoy_2.0.0/dvoy_services/datafed.aspx?

  18. Preview Data

  19. Modular Applications for Working with Online Data Emissions data is multi-dimensional (plant, year, pollutant, fuel type, boiler capacity, etc.). Its multidimensionality requires multiple “views” of the data in a variety of end-use applications. Data views can be created in the DataFeds.net framework, including maps, time series, and tables. Each view is independently linked to its data sources and described (i.e. geographic and temporal extents). Using the data access instructions registered in the catalog, a view can be dynamically assembled. The modularly designed views can be embedded and controlled in web pages using Javascript, ASP or other web application programming languages.

  20. Web Services Substantial progress has been achieved in data interoperability. One the next advances required is interoperable data analysis/processing tools. Web services are applications that are used over the Web. Because they are self-contained and use XML-based standards (SOAP, WSDL, UDDI) for describing themselves and communicating with other web resources, they can be reused in a variety of independent applications. Many of the analysis and processing tools used by the air quality management community could benefit from web service technology. Not only can their data be shared but their heterogeneous, distributed tools that operate on that data can be shared as well. A longer term vision for web services is to be able to “orchestrate” or “chain” multiple services from multiple providers so that new “third party” applications can be constructed.

  21. Example Web Services Application Emissions data from multiple databases are displayed on maps, time series, and tables. Tools are included for browsing and querying the data. In this example, the user can change the pollutant, date, map zoom data on/off, and map/time scales.

  22. CAM eGrid NPRI NEI BRAVO North American Emissions Demonstration Data Flow MapPointAccess MapPointRender DataSet = NPRI Year=1999 Parameter=SO2 Color = Yellow Symbol=Bar Width=8 Data Catalog MapPointAccess MapPointRender DataSet = eGrid Year=1999 Parameter=SO2 Color = Red Symbol=Bar Width=8 MapImageOverlay Layer Order = N.Am, NEI, eGRid, NPRI MapPointRender MapPointAccess Color = Blue Symbol=Bar Width=8 Wrappers DataSet = NEI Year=1999 Parameter=SO2 MapImageAccess MapImageRender Color= Maroon Size=2 DataSet = N.Am. Borders The settings of each web service can be changed by the user, creating a dynamic application Name = web service Settings

  23. Embedded Images and Controllers in Web Page Parameter Controller Date Controller Query Controller The controllers and map image view can be linked and assembled in a web page. Changing the settings of a controller changes the URL of the map image and updates the web page. The web page can be constructed using standard web application programming languages, such as JavaScript and ASP.

  24. Project Results • Technology has been demonstrated to be at the point where we can begin to apply some of the distributed database concepts to “real” situations • Emissions data present unique challenges due to their complex relational dimensionality • Collaborative efforts in the near future could generate a distributed North American emissions inventory • The initial versions of the inventory would help clarify the issues related to handling complex queries • Building and using a distributed emissions tool will assist in creating consensus data naming conventions

  25. Current Project Challenges • Dynamic Access to Data - technical snafus - security issues - slow performance • Complexity of Emissions Data • Quickly evolving technology

  26. What’s Missing • Metadata • More complete metadata would help in relating heterogeneous databases • More complete access to distributed datasets • A process for creating trusted provider-user agreements would help address issues of security and data misuse • More comprehensive content • Networked data and tools that spark additional interest in the technology’s potential • Actual Implementations • FASTNet will demonstrate the use of many of these technologies in the real time monitoring of major aerosol events this summer

  27. Next Steps Advance the prototype tool to make it more representative of what an emissions inventory would ultimately use EPA’s RFA on Distributed Air Quality Data in Support of NEISGEI Link to other web services (Geospatial One-Stop, TerraServer) and other relevant data sources, including Web Map Servers Establish collaborative partnerships with other researchers and agencies developing related networks and tools (ReVa, Earth Science Federation (NASA), NSF Cyberinfrastructure efforts) Build tools that add value to the distributed data and provide incentives for data providers to join networks Clarify the handling of distributed data for optimal system performance

  28. Mediators • The flow of data from provider to user passes through brokers, or mediators. • The data continues to be maintained by original providers • Contracts are used to retain a constant link to the original data • Data is still not centrally stored but is “cached” in a format that allows efficient queries and analyses • Mediators provide an interface between the user and the data that enhances the effective information exchange between the two sides

  29. Adding Value to data through Services Mediator Services analytical web services multidimensional data cubes Server Client Data Sources Data Users

  30. Demo Links http://capita.wustl.edu/NAMEN/DemoFeb27.htm

More Related