1 / 29

Dictionaries, Vocabularies, Namespaces, Thesauri, Ontologies, and all that

Dictionaries, Vocabularies, Namespaces, Thesauri, Ontologies, and all that. Rob Raskin NASA/Jet Propulsion Laboratory Raskin@jpl.nasa.gov June 21, 2011. Why care about data semantics?. Current data may need to be archived for decades or centuries

amalia
Download Presentation

Dictionaries, Vocabularies, Namespaces, Thesauri, Ontologies, and all that

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dictionaries, Vocabularies, Namespaces, Thesauri, Ontologies, and all that Rob Raskin NASA/Jet Propulsion Laboratory Raskin@jpl.nasa.gov June 21, 2011

  2. Why care about data semantics? • Current data may need to be archived for decades or centuries • Global change analysis requires consistent comparisons across decades or centuries • Synonyms • multiple words, same meaning • Homonyms • same word, multiple meanings • Measurement ambiguities • Sea “surface” temperature - at what “height”?

  3. Semantic Understanding is Difficult! Time flies like an arrow. Fruit flies like a pie. Let’s eat, Grandma. Let’s eat Grandma. “Mission accomplished. Major combat operations in Iraq have ended” -Pres. Bush, 2003 Variable t: temperature Variable t: time Data quality= 5 Data quality= 3 LA Times headline Surface wind: measured 3 m above surface Surface wind: measured at surface

  4. Semantic Spectrum Semantics Formal Hierarchy Terms inherent properties/ meaning of parent Formal Hierarchy w/ Relations Relations between children defined Informal Hierarchy Terms classified by categories (e.g. GCMD) Catalog List of controlled words VocabularyOntology Human-Readable Machine-Readable

  5. Scope of Representation Parameter names Scientific units Spatial/temporal extent/resolution Data quality Data provenance Data type Data services CF

  6. What is an Ontology? An approach to store knowledge Machine-readable and human-readable Provides definition of words or phrases expressed relative to other terms Offers shared understanding of concepts and knowledge reuse Provides semantics for machine-to-machine (or human-to-human) communications

  7. Practically, an ontology is a… • Framework for classifying knowledge • Ensures there is a “place” to store components of knowledge

  8. Ontology Languages:RDF and OWL • W3C has adopted languages that specialize XML • Resource Description Formulation (RDF) • Ontology Web Language (OWL) • Languages predefine specific tags • RDF: Class, subclass, property, subproperty • Class-property similar to Entity-Relation of DBMS theory

  9. RDF Class and Subclass • Class • The basic element or “thing” or “noun” • Subclass • Inherits all attributes of parent class • Typically, adds Properties to distinguish subclass from its parent • Can have multiple parent classes is a Cat Animal has Legs 4

  10. RDF Property & Subproperty • Property • A “verb” • Examples: • measures, hasLocation, hasArea, northOf • Properties can have attributes: • domain, range, transitive, … • Subproperty • Inherits parent attributes

  11. OWL Language • Extends RDF to predefine further tags • cardinality • transitive relations • inverse relations • same as, different from • union, intersection • domain, range • Import (from one ontology to another, to enable sharing and reuse of the work of others) • …

  12. OWL Ontology Example <Class “WaterPollution> <SubClassOf “Pollution”> <Restriction> <OnProperty “hasSubstance”> <AllValuesFrom “Water”> </Restriction> </SubClassOf> </Class>

  13. Statements about Statements • OWL allows us to make statements about statements • Degree of belief • Timestamps • Provenance / Lineage • Probability / Uncertainty • Security issues • Author / Source / Community • Community dialect • … Corn Crop is a Observed Feature Landsat has Source 0.75 has Probability

  14. Why are Ontologies Useful? (1) • Ontologies provide a common namespace • Documents, web pages, data, people, and other resources can be mapped/ categorized to this namespace • Anybody can create or extend the namespace

  15. Why are Ontologies Useful? (2) • Dictionary • Concepts in the namespace not just “listed” (a taxonomy), but “defined” (in terms of others) • Concepts defined via specializations of broader concepts -- with properties that distinguish each child from the broader parent concept • Reductionist approach of science • Arbitrary levels of specialization are possible • As with Library of Congress and Dewey Decimal numbering systems

  16. Why are Ontologies Useful? (3) • Disambiguation • Reduces semantic mismatch • Synonym support (multiple terms with same meaning) • label available to indicate preferred term for each community • Homonym support (multiple meanings of same term) • separate namespaces (President:Bush vs Plant:Bush)

  17. Why are Ontologies Useful? (4) • Machine readable • Ontologies are generally stored in a format (XML) that is readable by both humans and computers • Computer accessibility enables automated reasoning

  18. Why are Ontologies Useful? (5) • Knowledge retention • Corporations use knowledge management to ensure institutional memory over time, as personnel come and go • Climate disciplines can do the same! • Facts/data can be represented and related in a consistent manner • Common sense knowledge is captured • Instrument characteristics

  19. Ontology Representation (1):Knowledge Base of Triples Noun-Verb-Noun representation Parent-child relations: • Flood is a Weather Phenomena • GeoTIFF is a File Format • Soil Type is a Physical Property • Pacific Ocean is a Ocean Or create your own relations: • Ocean has substance Water • Sensor measures Temperature

  20. Ontology Representation(2): Visual

  21. Ontology Representation (3): XML, RDF, and OWL • W3C has adopted XML-based standard ontology languages • Resource Description Formulation (RDF) • Ontology Web Language (OWL) • Languages predefine specific tags • RDF: Class, subclass, property, subproperty, … • OWL: Extends RDF to predefine further tags such as cardinality • Three flavors of OWL (Lite, DL, and Full) • Use of standard languages makes it easy to extend (specialize) work of others

  22. Global Warming Query in the Semantic Web Find data which demonstrates global warming at high latitudes during summertime and plot warming rate. Extract information from the use-case - encode knowledge Translate this into a complete query for data - inference and integration of data from instruments, indices and models “Global warming”= Trend of increasing temperature over large spatial scales “High latitude”= |Latitude| > 60 degrees “Summertime”= June-Aug (NH) and Jan-Mar (SH) “Find data”= Locate datasets using catalogs, then access and read it “Plot warming rate”= Display temperature vs time

  23. Semantic Web for Earth and Environmental Terminology (SWEET) • Concept space written in OWL • Initial focus to assist search for data resources • Funded by NASA • Later focus to serve as community standard (upper-level Earth system science ontology) • Enables scalableclassification of Earth system science and associated data concepts • Specialists can further refine SWEET concepts • SWEET 2.2 has 6600 concepts in 200 modular ontologies • http://sweet.jpl.nasa.gov

  24. SWEET Top-Level View

  25. CF vs SWEET Representation CF (traditional single-attribute parameter name): tendency_of_mole_concentration_of_dissolved_ inorganic_phosphorus_in_sea_water_due_to_ biological_processes SWEET (multi-attribute parameter name): Quantity= mole_concentration Transformation= tendency State= dissolved, inorganic Substance= phosphorous Medium= sea_water Process= biological_processes

  26. SWEET Data Ontology • Dataset characteristics • Format, data model, dimensions, … • Provenance • Source, processing history, … • Parameters • Scale factors, offsets, … • Data services • Subsetting, reprojection, … • Quality measures • Special values • Missing, land, sea, ice, ...

  27. Best Practices • Keep ontologies small, modular • Use higher level ontologies where possible • Identify hierarchy of concept spaces • Try to keep dependencies unidirectional • Gain community buy-in • Involve respected leaders

  28. Ontology Development Tools: CMAP • Free, downloadable tool for knowledge representation and ontology development • Visual language with input/export to OWL • Supports subset of OWL language • http://cmap.ihmc.us/coe

  29. Resources • ESIP Semantic Web Cluster • Monthly telecons • Tutorials • Ontology development • Datatypes • data services • SWEET • http://sweet.jpl.nasa.gov • Rob Raskin raskin@jpl.nasa.gov

More Related