1 / 20

Ideas to Improve Semantics for CUAHSI Controlled Vocabularies

Ideas to Improve Semantics for CUAHSI Controlled Vocabularies. Gary Berg-Cross SOCoP Executive Secretary CO-PI Spatial Ontology Community of Practice INTEROP Grant Co-Chair RDA WG on Data Foundations and Terminology Presented at 2013 CUAHSI Conference on Hydroinformatics and Modeling

vonda
Download Presentation

Ideas to Improve Semantics for CUAHSI Controlled Vocabularies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ideas to Improve Semantics for CUAHSI Controlled Vocabularies Gary Berg-Cross SOCoPExecutive Secretary CO-PI Spatial Ontology Community of Practice INTEROP Grant Co-Chair RDA WG on Data Foundations and Terminology Presented at2013 CUAHSI Conference on Hydroinformatics and Modeling July 17 – 19, 2013, Utah State University, Logan UTAH

  2. Outline of Talk • “Vocabulary”/”Ontology” Overview: • Leverage Controlled Vocabulary & work from BioMedarea • Threeideas for improvement • Internal & External Audits - Examples • Judging Vocabulary qualities using an audit of existing work. • Incrementally adding Basic Semantics – Better Semantic Relations • Adding Semantics via Design Pattern Schema • Spatial Semantics & Semantic Trajectory example • Final Thoughts

  3. Controlled Vocabulary (CV) • “Temp” =“ Temperature” = “Water Temperature” • MYOCARDIAL Infarction synonymous with heart attack • Name Equivalent Semantics- meaning terms refer to same concept • A consensus, standardized set of terms used to refer to concepts • Term equivalence is important • “mg/l” has synonym of “milligrams per liter” In 2004 269,864 classes, named by 407,510 names Jan. 31, 2013 Concepts: ~300,000 active  Terms: ~1.1M active “descriptions” Available at http://www.ihtsdo.org/ • Water has salinity • County has city • River has tributary …different meaning of Has

  4. Useful Comparison to BioMed Work to Standardize Vocabularies CLINICAL QUALITY MEASURES & Medicare & Medicaid incentive payments for adopting certified EHR technology and use it to achieve specified objectives Olivier Bodenreiderolivier@nlm.nih.gov “Quality assurance of biomedical ontologies” talk at Ontology Summit, 2013

  5. Quick Look at Multiple Views - Variables, Tags & Ontology Concepts ODM uses RDB structure to Integrate files & handle heterogeneity, Good MD attributes -Limited semantics Classifier type structures –for Navigation, tagging & keyword search HydroTagger HasConstituent Diverse Classes? HasConcentration Classifier type structures Connect to variable terms Navigation 1220Deca_Chloro_PCB5

  6. 1. The Flavor of Quality Audits of Vocabularies Types of Intrinsic/Internal to Vocabulary Analysis • Lexical – consider separating types of modifiers likedissolved, suspended, total from core “chemicals” concepts (GALEN, a BioMed ontology, does this) • “Dissolved” and “suspended” are Features of some mass of chemical/element while “total” is a qualifier of them • A modifier like TOTAL is applied to dissolved, suspended or # of organisms in sample. • Structural classification/hierarchy not every level used in same way in current CUAHSI CV/Ontology • E.g. Storet WQX Domain has enormous list of Characteristic(s) which include very different things like 1-Naphthalenamine & Oxygen that the ontology helps organize (More on Next Slide) • Redundancy • In Spreadsheet there are two place levels for “acidity” • 122 acidity 2 (use at this level different than lower level) • 2277 Acidity 9999 (this is the real one for data) • Missing Concepts • Semantic Analysis - Missing or Inaccurate Relations • Compliance with ontological principles etc.

  7. Organization of Features • Similar physical items like volume @level 2 & severity @level 4 • Physical>Volume • Physical>Water>Water, descriptive>Severity • Both are characteristic/feature properties, like temperature or biomass. Is another organization useful to help handle heterogeneity? More relation types here.

  8. Better Conceptualization of Properties Organize Properties like size as a physical qualities -inheres in a physical object. Include measured properties like stream flow, level, pollutants, evapotranspiration etc. • Currently we have them at many levels • E.g. 2291 Major, bulk properties 4 hasLayer ….. Grams /cm3 Water Density Unit Water Density Water Body hasDensity Unit hasConstituent hasFeature hasUnit HasFeature IsA hasValue Area Real Number Area Quantity Chesapeake Bay Sq Miles hasQuantity hasUnit • External Audit • For connecting to Chem/BioChem ontologies there might be sub-categories of Physical Features for elements – optical, hardness, color • See Dumontier Lab  ontologies to represent bio-scientific concepts and relations. • http://dumontierlab.com/?page=ontologies

  9. 2. Adding Better Semantic Relations/Properties (External Audit) Data models & SKOS offer some relations, but they are limited with some relations embedded in Variable attributes or Var names. SKOS is more useful for terms than concepts Consider Irreflexive, Anti-symmetric& Transitive constructs that captures common understanding. Observation –Streams flow into rivers etc. • Property “flows-into” is irreflexive • any one river or stream cannot flow into itself as a loop • “flows-into” is also anti-symmetric • if one river flows into the second, the second one can’t flow into the first. • Transitive property for Regions means that the subRegionOf property between Regions is transitive • <owl:TransitivePropertyrdf:ID="subRegionOf"> <rdfs:domainrdf:resource="#Region"/> <rdfs:rangerdf:resource="#Region"/> </owl:TransitiveProperty> If Logan, Cache County and Utah are regions, and Logan is a subRegion of Cache County , Cache County is a subRegion of Utah, then Logan is also a subRegion of Utah.

  10. Organizing Relations - Three Kinds of “Structure” Relations in GeoSPARQL • X has material constituent Y only if Y is tangible and pervasive in X • Great Salt Lake has-constitutent salty water. fishing zone has-depth with average value x What does ``X has Y” Mean? Gulf of Mexico has-part gulf fishing zone which has-volume y which is-inside Gulf pollution zone Zone A has area Z……...is-inside Gulf…..has-constituent-nitrogen

  11. 3. Adding Relations Incrementally: Richer Schemata & Reusable Patterns or salty, acidic…. River, sub-surface water…. or height, salinity, acidity…. Simple Feature-State Model (from GRAIL) becomes a richer schema Every River is a Water Body described by a path, made of a mass of water & has parts source and mouth……

  12. Ontology Design Patterns (ODPs) of Semantic Trajectory Hydro Observations as Annotations • ODPs (aka microtheories) small, modular, & coherent schemas like Temperature. • Relatively autonomous but conceivably composable with other schemas. • E.g. compose a Semantic Trajectory Pattern • Trajectories/spatial paths/segments • Point Of Interest (POI)- observation area etc. • Environmental Observations fit into this schema. • Fixes may be hydrometric feature observations & at some PoI(and offset Fix) for some point or period of time denoting important activities and/or decision points, that researchers may be interested in labeling and classifying. • Observations including timeseries sets might be applied to something like streamflow or temperature plots or a pollution plume • You may query Schema : • “Show locations within Gulf of Mexico fishing area with colored dissolved organic matter” Hydro Obs/Device Hydro Var & attr/data or value type of Interest Paths & POIs Have Geometries including Polygon Areas Hydro Object or moving device A Geo-Ontology Design Pattern for Semantic Trajectories COSIT 2013: YingjieHuet al.

  13. Wrapping Up • The 3 things discussed here –audits, standard relations, schema – are possible paths to improved semantics for Hydro and related vocabularies • Work can leverage existing efforts • Lots of work in BioMed on structures and processes & audits • Methods to build ODP for general and specific use • DOLCE ROCKS Ontology - Integrates DOLCE + GeoSciMLorSSWOBoyanBrodaric & TorstenHahmann. • Work might be focused by a set of requirements and Use Cases Work supported by National Science Foundation under Grant No. 0955816

  14. Thank You Questions? For information on SOCoP free workshops on ontology building see http://ontolog.cim3.net/cgi-bin/wiki.pl?SocopWorkshops/Socop2012Workshop & VoCamps at http://vocamp.org/wiki/Main_Page#Previous_VoCamps

  15. Backup Slides

  16. “When” Time, T t A data value vi (s,t) “Where” s Space, S Vi “What” Variables, V Useful Schema - Content Ontology Design Patterns (ODPs) –Semantic Trajectory Pattern Example • ODPs (aka microtheories) small, modular, & coherent schemas like Temperature. • Relatively autonomous but conceivably composable with other schemas. • E.g. Trajectories/spatial paths, Point Of Interest (POI)- observation area. • Semantic Trajectory example • Indexed by Space-Time-Variable dimensions • When we annotate path points of interest (aka Fix) & object motion it is called a Semantic trajectory ODPs developed at GeoVoCampSB2012 & DaytonGeoVocamp2012 Zhixian Yan. Towards Semantic Trajectory Data Analysis: A Conceptual and Computational Approach. VLDB 2009.

  17. In ODM MCV too Audit of Coverage - Anything Missing? • Omissions • In HydroTagger water acidity missing variables? • less coverage than alkalinity in http://hiscentral.cuahsi.orgtool • Sub-surface water missing variables compared to surface water etc. • Missing axioms to clarify things, like what causes or influences what • Missing primitives to connect things etc. ODM MCV Acid neutralizing capacity (biochem?) Acidity, CO2 acidity Acidity, hot Acidity, mineral acidity Acidity, total acidity How about metal acidity?

  18. External Audits -Simple Example Body of Water Ontology From RPI work From SWEET New sub-class uses IntersectionOf for definition with restricted measures. <owl:Restriction> <owl:onProperty rdf:resource="&pol;hasMeasurement"/> <owl:someValuesFrom rdf:resource="#WaterMeasurement"/> </owl:Restriction> escience.rpi.edu/ontology/

  19. What does ``X has Y” Mean? Distinguish ideas of regions, material, possesses, part, component What do we mean when we say X has-region Y? Salt lake has-region Antelope Island, Gulf of Mexico has-region hypoxic zone X has-region Y if • Y is a region of space defined in relation to X • We associate regions (e.g. Antelope Island) with measures such as length, area, or volume X has material constituent Y only if Y is tangible and pervasive in X • Great Salt Lake has-constitutent salty water. X possesses/characterisitized-by Y ( example, lake possesses temperature gradient) X has-Part Y X has element/component Y - Chem

  20. Even Areas like RX with no Hierarchy have defined Conceptual Relations

More Related