1 / 1

Abstract

Building an integrated information system for publishing heterogeneous Critical Zone Observatory data.

rosina
Download Presentation

Abstract

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building an integrated information system for publishing heterogeneous Critical Zone Observatory data Thomas Whitenack1, Mark Williams2, David Tarboton3, Ilya Zaslavsky1, MatejDurcik4, Ryan Lucas5, Charles Dow6, XiandeMeng5, Brian Bills7, Miguel Leon8, Chi Yang2, Melanie Arnold6, Anthony Aufdenkampe6, Kim Schreuders3, Otto Alvarez5 1San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA, United States; 2University of Colorado, Boulder, Boulder, CO, United States; 3Utah State University, Logan, UT, United States; 4Department of Hydrology and Water Resources, University of Arizona, Tucson, AZ, United States. 5University of California, Merced, Merced, CA, United States; 6Stroud Water Research Center, Chester, PA, United States; 7Penn State University, State College, PA, United States; 8University of Pennsylvania, Philadelphia, PA, United States; Abstract System design CZO data publicationmodel Display file describes components of measurement: where (location), when (datetime), what (attribute), how (method), who (investigator) + value  \doc (title, abstract, investigator, var names, etc.)  \headerDEFAULT_PARAMETER (pertains to entire file unless overridden) Column headers (define each column – i.e. time series or group of time series): COL4. label=VariableName, value=pH, units=pH units, missing value indicator=-9999  \data GREEN LAKE 4,820311,,6.4,18,88.51,0.40,,114.77,… CZO follows service-oriented architecture design. Data in standard CZO formats are harvested into a community data repository, and presented as standard pre-registered CZO services for each site. The services are used to harvest metadata and add it to the CZO metadata catalog. Using desktop and online tools, users can discover and retrieve the data, and create derived CZO data products, which are in turn registered at the CZO Central. We also plan to collaborate with DataNet on long term preservation and register CZO services in various domain registries (e.g. CUAHSI HydroCatalog, EarthChem Portal) The Critical Zone Observatory (CZO) program is a collaborative effort to advance scientific understanding of multi-scale environmental interactions in the critical zone from bedrock to the atmospheric boundary layer. CZO sites use a mix of established and novel data collection methods to examine hydrogeochemical and physical processes in the critical zone. Publishing, analyzing and archiving these data in a consistent and integrated manner across all CZO sites is challenging due to the inherent heterogeneity in data collection and processing techniques. A goal of the CZO program is an integrated data management model across all CZO sites that can be used to discover, browse, retrieve and unambiguously interpret CZO data. This paper describes the ongoing development of such a system by a team comprising data managers from each CZO site and cyberinfrastructure researchers. The design follows a uniform, standards-based approach that draws on experience and software developed in related NSF-supported cyberinfrastructure projects (LTER, CUAHSI HIS, EarthChem, etc.) We describe the information model used (adapted from CUAHSI's observations data model), present a web based mechanism for publishing CZO point observations data, and discuss potential extensions of the publication model to other types of data. In this system each CZO site maintains its own data management system, and generates human- and computer-readable “display files” that follow an agreed upon format and contain necessary metadata. The display files reside on individual CZO web sites, but are harvested by a centralized CZO application that parses the files, identifying and validating new data and related metadata, ultimately loading the new data on to a CZO-modified version of the CUAHSI Observations Data Model. The ingested data values are then published using web services that follow the Water Markup Language (WaterML) specification, and can be retrieved and analyzed by WaterML-compatible client applications such as HydroDesktop. The display file format incorporates information model enhancements such as multiple types of named vertical and horizontal offsets and data loggers collecting information from multiple sampling locations within a single ‘site’. Additionally, the display file provides a tiered approach to publishing environmental data.  Harvesting these data via a centralized system provides continuity to the data collected across all CZO sites that ultimately facilitates cross-site data exploration and analysis.   Accessing CZO services CZO community data registry and repository Once CZO web service is updated and registered in CZO Central, it can be discovered in HydroDesktop (CZODesktop), an open source application with rich mapping and time series analysis capabilities. The services can be also accessed by other clients, including Matlab, R, Excel and ArcGIS. We work with Open Geospatial Consortium towards WaterML 2.0 as an OGC standard. • A CZO scheduler application checks all CZO web sites for new display files at regular intervals. • New or updated data are retrieved and parsed by CZO Data Interpreter, and validated against shared vocabularies. • After the Data Interpreter loads data into respective ODM databases, the services are updated, and the CZO Central harvester uses the services to retrieve metadata and populate the CZO Central metadata catalog. • The catalog is searchable from client applications via search web services HydroDesktop, showing one of 31 newly ingested time series from Boulder Creek CZO OGC/WMO Hydrology Domain Working Group http://external.opengis.org/twiki_public/bin/view/HydrologyDWG/WebHome The CZO Central service registry Leveraging earth science projects Synthesizing information management experience and software from CZO partners and neighboring earth science projects into a standards-based system for publishing environmental data to emphasize the “critical zone” nature of our shared data sets Towards CZO web services model A water data service page for the Jemez River Basin CZO CZO data publication system is designed to be extensible to different types of data collected at CZO sites: point time series, geochemical samples, geophysical and biological data, spatial data, etc. Each type of data will be available as web services following OGC service interface specifications and community standards for data exchange. Conclusion The CZO integrated data publishing system enables CZO participants to share data in standard formats via web services. The display file format is flexible to accommodate information model enhancements and extensible to other types of CZO data beyond point time series The CZO data publication model provides an attractive option for publishing environmental data : a) all current data are available from individual CZO web sites in human-readable format, b) CZOs maintain their own data systems and are not required to install or maintain additional servers, and c) the data are harvested, validated, versioned and archived at a central site (eventually, hosted on the cloud) which is responsible for making them available as standards-based web services, and evolve web service format as new environmental standards are adopted. A water data service page for the Boulder Creek CZO. The page is used by data managers to present general metadata about their observations (including an abstract and recommended citation), test the services and associate variable names with hydrologic concepts This CZO Central user interface is used to associate CZO variables with terms in a hydrologic concept hierarchy, to support concept-based search

More Related