1 / 196

Semantic Enhancement

Semantic Enhancement. Barry Smith 1/12/2012. Outline of Day 1. 10:00 What is Semantic Technology? Introduction: Miserable failures and glorious successes Semantic Technology and the DoD : some examples Best practices for ontology development 12:00 Lunch

karim
Download Presentation

Semantic Enhancement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Enhancement Barry Smith 1/12/2012

  2. Outline of Day 1 • 10:00 What is Semantic Technology? • Introduction: Miserable failures and glorious successes • Semantic Technology and the DoD: some examples • Best practices for ontology development • 12:00 Lunch • 13:00 A strategy to ensure consistency of data across multiple domains • A repeatable process for creating ontologies • The Semantic Enhancement approach • 14:30 Ontology for the intelligence analysts (B. Mandrick) • Ontology and military doctrine • A repeatable process for creating ontologies • 16:00 Close

  3. The roots of Semantic Technology • You build a site. • Others discover the site and they link to it • The more they link to it, the more important and well known the page becomes (this is what Google exploits) • Your page becomes important, and others begin to rely on it • The same network effect works on the raw data • Many people link to the data, use it • Many more (and diverse) applications will be created than the authors would even dream of! • Secondary use Ivan Herman Network effect of the Web

  4. The problem: doing it this way, we end up with data in many, many silos • To avoid silos: • the raw data needs to be available in a standard way on the Web. • There must be links among the datasets Photo credit “nepatterson”, Flickr

  5. Need for a common terms & links The roots of Semantic Technology • To avoid / connect the silos: • The raw data needs to be available in a standard way on the Web. • There should be links among the datasets to create a web of data • Vocabularies should capture common meanings – computable definitions

  6. What is Semantic Technology? • Technology in which • meanings • data and content files • application code • are encoded separately • - Standard languages for encoding meaning which should evolve slowly

  7. Semantic technology • Tools • for autorecognition of topics • for information and meaning extraction, • for categorization • Goal of semantic interoperability • Goal of “linked open data”

  8. Semantic interoperability • Business models change rapidly • Hardware changes rapidly • Organizations rapidly forming and disbanding collaborations • Data is exploding • Recognition of the benefits of collective intelligence • Web architecture for interconnected communities and vocabularies

  9. Ontology success stories, and some reasons for failure A fragment of the Linked Open Data in the biomedical domain

  10. Semantic technology • Tools • for autorecognition of topics • for information and meaning extraction, • for categorization • Goal of semantic interoperability • Goal of “linked open data”

  11. Goals of Semantic Technology • Resource and data registries • Metadata management • Support for Natural Language Understanding • Semantic SOA • Semantic wikis • Education, human collaboration • Ontology-driven systems

  12. Where we stand today • html demonstrated the power of the Web to allow sharing of information • increasing availability of semantically enhanced data • increasing power of semantic technology software applications, of tools for reasoning with semantically enhanced data • increasing use of semantic technology to create a Web 2.0 which will allow algorithmic reasoning with online information based on XLM, RDF and OWL • increasing use of RDF and OWL in attempts to break down silos, and create useful integration of on-line data and information

  13. Problems in achieving these goals • Weak expressivity of OWL (e.g. re time) • Poor quality coding, poor quality ontologies, poor quality ontology management • Confusion as to the meaning of ‘linked’ • Strategy often serves only retrieval, not reasoning

  14. Uncontrolled proliferation of links

  15. Above all: The more Semantic Technology is successful, the more we fail to achieve our goals OWL breaks down silos via controlled vocabularies for the formulation of data dictionaries Unfortunately the very success of this approach led to the creation of multiple, new, semantic silos – because multiple ontologies are being created in ad hoc ways The Semantic Web framework as currently conceived and governed by the W3C yields minimal standardization

  16. Reasons for this effect • Shrink-wrapped software mentality – you will not get paid for reusing old and good ontologies (Let a million ‘lite’ ontologies bloom) • Belief that there are no ‘good’ ontologies (just arbitrary choices of terms and relations …) • Information technology (hardware) changes constantly, not worth the effort of getting things right

  17. Reasons for this effect

  18. Ontology success stories, and some reasons for failure Can we solve the problem by means of mappings?

  19. What you get with ‘mappings’ All in Human Phenotype Ontology (= all phenotypes: excess hair loss, splayed feet ...) mapped to all organisms in NCBI organism classification allose in ChEBI chemistry ontology Acute Lymphoblastic Leukemia (A.L.L.) in National Cancer Institute Thesaurus

  20. What you get with ‘mappings’ all phenotypes (excess hair loss, duck feet) all organisms allose (a form of sugar) Acute Lymphoblastic Leukemia (A.L.L.)

  21. Mappings are hard • They are fragile, and expensive to maintain • Need a new authority to maintain, yielding new risk of forking • The goal should be to minimize the need for mappings • Invest resources in disjoint ontology modules which work well together

  22. Why should you care? • you need to create systems for data mining and text processing which will yield useful digitally coded output • if the codes you use are constantly in need of ad hoc repair huge resources will be wasted, manual effort will be needed on each occasion of use

  23. How to do it right? OWL Web Ontology Language Pro: Part of HTML, XML, RDF, … stack State of the art W3C Standard Leverages net-centricity Many sophisticated tools Editors (TopBraid, Protégé, …) Reasoners (Racer, Fast, Pellet, …) Thoroughly tested for many different kinds of data T-box vs. A-box Statement A: Approved for Public Release. Distribution is unlimited (01 September 2011).

  24. How to do it right? OWL Web Ontology Language Con: OWL reasoning breaks for very large data sets Limited expressivity Works only up to binary relations Mary is in Baghdad on Wednesday Mary is in Fairfax, VA on Thursday Forces complex workarounds Statement A: Approved for Public Release. Distribution is unlimited (01 September 2011).

  25. How to do it right? From OWL 2 Primer, 5.2 Property Restrictions: EquivalentClasses(    :HappyPerson ObjectIntersectionOf( ObjectAllValuesFrom( :hasChild :HappyPerson ) ObjectSomeValuesFrom( :hasChild :HappyPerson ) ) ) The All() defines  “a happy person exactly if all their children are happy persons” in the preceding  example. What is “the aforementioned intended reading”, and how does the Some() function help in there? Statement A: Approved for Public Release. Distribution is unlimited (01 September 2011).

  26. How to do it right? • create an incremental, evolutionary process, where what is good survives, and what is bad fails • create a scenario in which people will find it profitable to reuse ontologies, terminologies and coding systems which have been tried and tested • silo effects will be avoided and results of investment in Semantic Technology will cumulate effectively

  27. Biomedical Ontology in PubMed

  28. By far the most successful: GO (Gene Ontology)

  29. GO provides a controlled vocabulary of terms for use in annotating (describing, tagging) data Gene Ontology (GO) • multi-species, multi-disciplinary, open source • contributing to the cumulativity of scientific results obtained by distinct research communities • compare use of kilograms, meters, seconds in formulating experimental results • natural language and logical definitions for all terms to support consistent human application and computational exploitation

  30. Hierarchical view representing relations between represented types

  31. The Ontology Spectrum

  32. The ontology spectrum (data focus) glossary: A simple list of terms and their definitions. controlled vocabulary: A simple list of terms, definitions and naming conventions to ensure consistency. data dictionary: Terms, definitions, naming conventions and representations of the data elements in a computer system. data model (e.g. JC3IEDM): Terms, definitions, naming conventions, representations and the beginning of specification of the relationships between data elements. taxonomy: A complete data model in an inheritance hierarchy where all data elements inherit their behaviors from a single "super data element". ontology: A complete, machine-readable specification of a conceptualization

  33. The ontology spectrum (reality focus) glossary: A simple list of terms and their definitions. controlled vocabulary: A simple list of terms, definitions and naming conventions to ensure consistency. taxonomy: A controlled vocabulary in which the terms form of a hierarchical representation of the types and subtypes of entities in a given domain. The hierarchy is organizes by the is_a(subtype) relation ontology: A controlled vocabulary organized by is_aand by further formally defined relations, for example part_of.

  34. The Periodic Table Periodic Table

  35. Ontology • a controlled vocabulary which includes • a backbone taxonomy • logical definitions of all terms • logically defined relations between terms • In simple terms: A vocabulary machines can understand (a computerized dictionary) representing the entities in a given domain of reality and the relations between them

  36. Organ Part Organ Subdivision Anatomical Space Anatomical Structure Organ Cavity Subdivision Organ Cavity Organ Organ Component Serous Sac Tissue Serous Sac Cavity Subdivision Serous Sac Cavity is_a Pleural Sac Pleura(Wall of Sac) Pleural Cavity part_of Parietal Pleura Visceral Pleura Interlobar recess Mediastinal Pleura Mesothelium of Pleura

  37. In graph-theoretical terms: Ontology Components: • terms form nodes of the graph • relationships between terms form the edges of the graph • definitions and relationslogically formulated

  38. The Idea of Common Controlled Vocabularies GlyProt MouseEcotope sphingolipid transporter activity DiabetInGene GluChem

  39. The Idea of Common Controlled Vocabularies GlyProt MouseEcotope Holliday junction helicase complex DiabetInGene GluChem

  40. compare: legends for maps compare: legends for maps

  41. common legends allow (cross-border) integration compare: legends for maps

  42. California Land Cover Maps link legends to reality Legends: representations of types x

  43. Compare: legends for diagrams

  44. Legends • help human beings use and understand complex representations of reality • help human beings create useful complex representations of reality • help computers process complex representations of reality • help glue data together • help comparison as data changes over time

  45. Annotations using common ontologies can enhance access to and promote integration of data of all kinds

  46. What is the key to GO’s success? GO is developed, maintained and by experts who adhere to ontology best practices over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO experimental results reported in 52,000 scientific journal articles manuallyannoted by expert biologistsusing GO $100 mill. invested in literature and data curation using GO ontology building and ontology QA are two sides of the same coin

  47. Already good, logical definitions can bring benefits Making it work COIs that need to cooperate can learn that they disagree on use of terms Defined terms contribute to authoritative descriptions

  48. If controlled vocabularies are to serve data interoperability Making it work they have to be used in annotations by many owners of data they have to be updated by respected experts who are trained in best practices of ontology maintenance they have to be respected by many owners of data as a framework for semantic enhancement that ensures accurate description of their data for the GO, the benchmark for accuracy (the ground truth) is provided by the results of scientific experiment what is the corresponding benchmark in military domains?

  49. DoD and Related Ontology Projects: Some Examples Barry Smith

More Related