1 / 25

A Semantically-Enabled Provenance-Aware Water Quality Portal

A Semantically-Enabled Provenance-Aware Water Quality Portal. Deborah L. McGuinness Tetherless World Senior Constellation Chair Professor of Computer and Cognitive Science Rensselaer Polytechnic Institute Troy, NY, USA.

lduarte
Download Presentation

A Semantically-Enabled Provenance-Aware Water Quality Portal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Semantically-Enabled Provenance-Aware Water Quality Portal Deborah L. McGuinness Tetherless World Senior Constellation Chair Professor of Computer and Cognitive Science Rensselaer Polytechnic Institute Troy, NY, USA Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano

  2. Introduction • Real Life Motivation Example: • In 2009, in Bristol County, Rhode Island, Children start getting sick with symptom like diarrhea. The cause was found to be polluted water. • Public concerns: “When did the contamination begin?”, “How did this happen?”, “How can we keep it from happening again?” • We need an environmental informatics systems that can automatically integrate and analyze water quality.

  3. Challenges • Raw data from multiple sources and in different format – difficult to integrate and query. • Semantics of the water quality data are not explicitly encoded in the data – machine can’t process data automatically. • Large amount of data due to large spatial region, long time span, and large number of pollutants and regulated limit – analysis can be time consuming and complex.

  4. TWC-SWQP • Identify point sources of water pollution, including water sites monitored by USGS and polluting facilities regulated by EPA. • Demonstrates the effectiveness of semantic web technologies in addressing the challenges faced by environmental informatics systems. • Enable/Enpower citizens & scientists to better explore water related information.

  5. System Architecture Virtuoso access

  6. SemantAQUA Workflow Publish CSV2RDF4LOD Direct visualize derive derive integrate archive Archive CSV2RDF4LOD Enhance

  7. Ontology • Core TWC Water ontology • Extends existing best practice ontologies, e.g. SWEET, OWL-Time. • Includes terms for relevant pollution concepts • Can use to conclude: “any water source that has a measurement outside of its allowable range” is a polluted water source. Portion of the TWC Water Ontology.

  8. Ontology • Regulation Ontology • model the federal and state water quality regulations for drinking water sources • Can use to define: for example, in California, “any measurement has value 0.01 mg/L is the limit for Arsenic” • Combine with core ontology, we can infer “any water source contains 0.01 mg/L of Arsenic is a polluted water source.” Portion of Cal. Regulation Ontology.

  9. Provenance • Preserves provenance in the Proof Markup Language (PML). • Data Source Level Provenance: • Thecaptured provenance data are used to support provenance-based queries. • Reasoning level provenance: • When water source been marked as polluted, user can access supporting provenance data for the explanations including the URLs of the source data, intermediate data and the converted data.

  10. Visualization • Presents analyzed results with Google Map • Presents explanation of water source pollution • Presents possible health effect of contaminant • Presents “Facet” type filter to select type of data • Presents link to the authority, where user can report problems. 5 4 3 2 1 http://was.tw.rpi.edu/swqp/map.html

  11. Visualization • Time series Visualization: • Presents data in time series visualization for user to explore and analyze the data Violation, measured value: 50 Limit value: 15 http://was.tw.rpi.edu/swqp/trend/epaTrend.html?state=RI&county=3&site=http%3A%2F%2Ftw2.tw.rpi.edu%2Fzhengj3%2Fowl%2Fepa.owl%23facility-110000312135

  12. Demo

  13. Data • EPA Data: • Provides measurements of pollutants in the water discharged by the facilities, and also the threshold values for up to five test types for each pollutant. • USGS Data: • Provides measurements of substances contained in water samples collected at USGS data-collection stations • Regulation Data: • Provides lists of pollutants and their maximum contaminant level

  14. Selected Follow-up options Violation Limit

  15. Results • Semantic Data Integration provides an effective and low cost approach for integrating data from various sources. • SWQP integrates data from various sources, including EPA, USGS, and state governments. • Linking to external data: “twcwater:Arsenic”, linked to “dbpedia:Arsenic” using owl:sameAs. • We have generated 89.58 million triples for the USGS datasets and 105.99 million triples for the EPA datasets. Requires only 2-person days.

  16. Results • Query and reasoning supported by semantic technologies improves responsiveness and simplifies the development of web applications. • SPARQL queries narrows down the data, we can reason over only the relevant data on one selected regulation. • Reasoning eases the complexity of queries a developer needs to write for software applications.

  17. Results • Provenance information encoded using semantic web technology supports transparency and trust. • SWQP provides detailed provenance information: • Original data, intermediate data, data source • “What if” Senario: user may trust data from certain authorities only. • User can apply a stricter regulation from another state to a local water source.

  18. Discussion • Future Work • Expand SWQP to support all 50 states. • Add flood/weather information, and their effect on water sources • model the health effects from exposure to the excessive pollutants in water and support reasoning over these effects. • Expand SWQP to other environmental topics: soil quality, air quality • Get community involved: user can put comment on each water source, or report problem to the authorities.

  19. Conclusion • SWQP is a web portal that allows citizens and professionals to easily explore water quality information. • SWQP illustrated benefits of applying semantic web technologies to water quality research. • Data integration, provenance, automatic reasoning. • Architecture of SWQP can be easily apply to other environment topics • Air quality, soil quality, etc.

  20. Questions? http://tw.rpi.edu/web/project/SemantAQUA http://inference-web.org/wiki/Semantic_Water_Quality_Portal

  21. BACKUP SLIDES

  22. Related work • Other work focuses on facilitating water quality management [13, 14] and wastewater treatment [15] via knowledge sharing and reuse. • [13] presents system that integrates water quality data from multiple sources and retrieves data using semantic relationships among data. • [14] presented an ontology-based Knowledge Management system (KMS) that can be integrated into the numerical flow and water quality modeling to provide assistance on the selection of a model and its pertinent parameters • [15] is an environmental decision-support system for wastewater management, which augments classic rule-based and case-based reasoning with a domain ontology. • SWQP: • SWQP differs from these projects in that it supports provenance based query. • SWQP is built upon standard semantic technologies (e.g. OWL, SPARQL, Pellet, Virtuoso) and thus can be easily replicated or expanded.

  23. Queries for result 2 SELECT * WHERE { ?watersource twcwater:hasMeasurement ?measurement. ?measurement twcwater:hasValue ?value; twcwater:hasCharacteristic ?charactericsitc; twcwater:hasUnit ?unit. (1) ?regulation twcwater:hasValue ?limit; twcwater:hasCharacteristic ?characteristic; twcwater:hasUnit ?unit. ?watersource geo:lat ?lat; geo:long ?long. FILTER( ?value > limit ) } SELECT * WHERE { ?watersource rdf:type twcwater:pollutedWaterSource. geo:lat ?lat; (2) geo:long ?long. }

  24. New Ontology • New Regulation ontology • Reuse sweet:Measurement instead of use owl:sameAs • Defines cardinality restriction • Defines Datatype restriction Portion of the EPA regulation ontology

  25. New Ontology • TWC Environment Monitoring Ontology • Can be extended to use different regulation • Uses sweet ontology • More general ontology: aim for not just monitoring water, but anything relate to environment: air quality.

More Related