1 / 1

Abstract

Integration of hydrologic parameter ontology in CUAHSI HydroCatalog. HydroSphere. CSW. IN41C-1367. HICentral Web Service. Ilya Zaslavsky 1 , David Valentine 1 , Thomas Whitenack 1 , Michael Piasecki 2 , Richard Hooper 3 , Yoori Choi 3 , David Maidment 4

niles
Download Presentation

Abstract

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integration of hydrologic parameter ontology in CUAHSI HydroCatalog HydroSphere CSW IN41C-1367 HICentralWeb Service Ilya Zaslavsky1, David Valentine1, Thomas Whitenack1, Michael Piasecki2, Richard Hooper3, Yoori Choi3, David Maidment4 1San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA, United States; 2Drexel University, Philadelphia, PA, United States; 3CUAHSI Central Office, Boston, MA, United States; 4DUniversity of Texas, Austin, TX, United States Abstract What is HydroCatalog Hydrologic Ontology Nomenclatures of hydrologic parameters are large and very fragmented. One of the key goals of the CUAHSI Hydrologic Information System project (http://his.cuahsi.org) is to unify semantically diverse hydrologic observations and organize them so the data can be easily discovered, accessed and analyzed in different types of research scenarios, by different types of users. The core of the system is a hydrologic metadata catalog, which describes observational data available from multiple repositories via a standard set of CUAHSI water data web services. To address needs of different types of users, the HydroCatalog is being designed as a multi-level information system. At the lower level, a CUAHSI HIS time series catalog contains metadata about 23.3 million time series from government and academic data sources (hiscentral.cuahsi.org).  The time series representation organized by primary data sources is suitable for hydrologists and data managers who need to discover and access hydrologic observations in a format they were published, without additional interpretations or data conversions. However, such a representation doesn’t fully address data discovery and access needs of hydrologic analysts and modelers who prefer to work with curated and interpreted hydrologic data collections organized by thematic categories. Therefore, an additional layer of commonly requested hydrologic data products (“hydrologic themes”) is being constructed, where a theme represents a derived spatio-temporal aggregation of observational data. Information supporting semantics-based discovery is needed at both levels of the HydroCatalog. At the time series catalog level, the focus is on discovery of observations based on a community-curated hierarchy of hydrologic concepts, on associating variables with these concepts, and on translating concepts-based queries into queries specific to individual sources of primary data. At the theme catalog level, the variable-concept associations are used to group time series into “data carts”, which are the basis for generating hydrologic themes; thus the main issue is recording themes’ semantic provenance and supporting reconciliation of units, time support and other characteristics that prepare a theme for visualization or modeling use. We describe the organization of semantic information in the CUAHSI HydroCatalog, introduce software tools for managing hydrologic parameter ontology, and present initial results of concept-variable tagging. In particular, we discuss the results of using a hydrologic concept hierarchy based on the USGS and EPA Substance Registry System (SRS) for tagging hydrologic parameters in the metadata catalog. Currently, over 2000 catalog variables are available for concept-based search, primarily from observations made in water or suspended sediment. Additional work is needed for tagging variables in other mediums, and for managing concept-variable mapping as concept hierarchy evolves. CUAHSI HIS is an online distributed system to support the sharing of hydrologic data from multiple repositories and databases via standard water data service protocols; software for data publication, discovery, access and integration. Permissible Search Keywords Tagging Targets Organization of the CUAHSI HIS concept hierarchy(the case of Nitrogen) The current version of the CUAHSI HIS concept hierarchy includes 4095 concepts (3999 leaf concepts), which are organized into three major groups: physical, chemical and biological parameters. It incorporates concepts from several sources, including the EPA/USGS Substance Registry System (SRS) and biological nomenclatures. The hierarchy is visualized as an InxightStartree. At the general level, CUAHSI HIS includes three key components: data publication platform (HydroServer); data discovery and integration platform (HydroCatalog) and a data synthesis and research platform (represented by HydroDesktop). Internally, HydroCatalog consists of services that are responsible for harvesting hydrologic metadata from registered services, managing ontology and variable-ontology mappings, monitoring, logging and validation of services, and supporting a query API. • The CUAHSI concept hierarchy is stored in SQL Server databases as a set of four primary tables: • Concepts: contains the entire list of concepts • Synonyms: concepts with equivalent definitions to terms that exist in the Concepts table • Hierarchy: maintains the parent/child relationships between the concepts • ConceptPaths: derived from the Concepts and Hierarchy tables to create a “conceptPath” attribute for each concept – to simplify determining the upstream/downstream lineage for each concept Substance RegistrySystem Water data web services are registered at the Central HIS service registry. The HISCentral application harvests observation metadata from the service (sites, variables, and periods of record that are accessed via the service) at regular intervals and appends it to the central metadata catalog. In addition, HISCentral supports semantic tagging of the registered data, by associating the harvested variables with concepts from hydrologic concept hierarchy. HISCentral web service enables data discovery by HIS client applications. NWIS unique parameter codes with associated period of record Variables found in the catalog dump, tagged, and have data MappedVariables: 3567 2103 Number of data requests brokered by the catalog has been growing. 9178 7218 Unmapped Variables Catalog variables not in SRS: taxonomic IDs, set number and similar metadata, context observations, surrogate measures, some organics… 2236 Current catalog content: 60+ public services 18,000+ variables 1.96 million sites 23.3 million series 5.1 billion data values Don’t have data Have data 4339 Not tagged yet: Mostly variables in mediums other than water or suspended sediment Matching the content of the USGS National Water Information System catalog with SRS concepts in the CUAHSI HydroCatalog Federal agency data services at HISCentral Semantic annotation and search Towards DistributedHydroCatalogs CUAHSI HydroCatalog is evolving towards compliance with OGC Catalog Services for the Web (CSW) specifications. Water data service are exposed as Web Feature Services (WFS), which contain time series information. They are registered in ESRI’s GeoPortal, which supports browsing and querying the catalog via CSW methods. In turn, this allows us to develop the HydroCatalog into a distributed system of HydroCatalogs, which can harvest service information from each other. While syntactic heterogeneity is managed by water data being described using ODM and WaterML, and accessed via uniform Web services, semantic differences across observation networks require a different approach. HydroDesktop Vocabularies used by each data source, are matched up with a common controlled vocabulary. In the process of water data service registration, variable names in each source are associated with concepts in the concept hierarchy. This provides for semantics-aware data discovery and integration regardless of naming conventions or synonyms used by individual sources. About CUAHSI MetaCatalog The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an organization representing 120+ universities in the US and 11 international affiliates. As part of its mission, CUAHSI supports the development of cyberinfrastructure for the hydrologic sciences. The CUAHSI HIS (Hydrologic Information System) project is a multi-year multi-institution effort focused on consistent management and integration of observational data available from several federal agencies (USGS, EPA, USDA, NOAA, etc.) as well as published by academic investigators. HIS Central Catalog UTexas Catalog NWIS Catalog UTexas Services NWIS Services HIS Central Series Metadata Series Metadata Series Metadata Data Data Data An experimental HydroPortal registering WFS services from HISCentral University of Texas US Geological Survey San Diego Supercomputer Center The planned system of distributed HydroCatalogs Query expansion based on conceptID-variableID mappings Conclusion The content of HydroCatalog, including the concept hierarchy and semantic mappings, is exposed via HISCentral Web Service. They can use by applications such as HydroDesktop to discover time series based on concepts The CUAHSI HIS HydroCatalog has been efficient in supporting semantics-based search over 23.3 million time series representing over 60 observation networks with different variable semantics. The concept-based query is supported by CUAHSI concept hierarchy which is composed of several vocabularies, and mappings between source-specific vocabularies and leaf concepts in the hierarchy. The project is working on setting up a community-focused ontology management system, based on semantic wiki, to enable crowd-sourcing of further ontology enhancement. GetMappedVariables GetSearchableConcepts SetSeriesCatalogForBox GetServicesInBox GetSitesInBox GetWordList GetOntologyConceptCode GetOntologuKeyword GetOntologyTree This HISCentral web application is used to associate variables in submitted datasets with terms in a hydrologic concept hierarchy, to support concept-based search Links: Project web site: http://his.cuahsi.org HISCentral: http://hiscentral.cuahsi.org Visualization of the current concept hierarchy: http://hiscentral.cuahsi.org/startree.aspx

More Related