1 / 46

Geri Steve, Aldo Gangemi, Domenico M. Pisanelli

Ontological Analysis & Integration of Terminologies: Towards An Environmental Reference Ontology Library. Geri Steve, Aldo Gangemi, Domenico M. Pisanelli. Istituto di Tecnologie Biomediche, CNR, Rome, Italy http://saussure.irmkant.rm.cnr.it {steve,gangemi,pisanelli}@saussure.irmkant.rm.cnr.it.

reese
Download Presentation

Geri Steve, Aldo Gangemi, Domenico M. Pisanelli

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontological Analysis & Integration of Terminologies: Towards An Environmental Reference Ontology Library Geri Steve, Aldo Gangemi, Domenico M. Pisanelli Istituto di Tecnologie Biomediche, CNR, Rome, Italy http://saussure.irmkant.rm.cnr.it {steve,gangemi,pisanelli}@saussure.irmkant.rm.cnr.it

  2. Which part are you talking about? • If my liver is part of my digestive system, and that system is part of me, is my liver part of me? • If my liver is a part of me and I am part of the CNR, is my liver part of the CNR? • My liver is a component of my digestive system, while I am a member of CNR. No rule for composing component and member relations • Moreover, I am a body, but I am also a person. A living person depends on a body. Nevertheless, a living person can be member of CNR, but a body cannot

  3. Object or place? • A body region is an object that one could cut, or a place? • A gene is a DNA fragment, or a DNA region (allele)? • A river is an orographic object, or the geographic place of a watercourse? • Despite many differences, such three cases seem analogous: they share a polysemy partly dependent on an abstract difference between objects and regions, and a related axiom specifying that objects must be located at some region

  4. River in the GEMET thesaurus

  5. Should we worry about those things? • Even in presence of polysemous names, a standalone application using a local databank or terminological repository may be able to accomplish its task without serious flaws. • However, when it is integrated with another application, semantic mismatches constitute a serious obstacle for the agent or interface that is negotiating or sharing information. • The ever-increasing demand of data sharing has to rely on a solid conceptual foundation in order to give a semantics to the terabytes available in different databases and eventually traveling over the networks. • Ontologies are currently recognized as the answer to the needs of conceptual foundation.

  6. The advantages of ontologies • to allow a more effective data and knowledge sharing • to facilitate knowledge re-use in decision support systems • to give theoretical foundation to vocabulary standardization activity

  7. Our task • We learn domain ontologies (in medicine, environment) by integrating the conceptual models that can be extracted from terminological sources • The goal is building Domain Reference Ontologies in the form of modular libraries of formal theories • In our ONIONS methodology, ontology learning needs both incremental bottom-up learning from sources, and incremental definition and reuse of general theories that can account for the intended meaning of terms

  8. ONtologic Integration Of Naïve Sources

  9. Minimal history • ONIONS methodology for ontology integration has been developed since the early 1990s to account for the problem of conceptual heterogeneity. It addresses some problems encountered in the context of the European project GALEN and the Italian projects SOLMC (Ontological and Linguistic Tools for Conceptual Modeling) and ONTOINT (Ontological Integration of Information)

  10. Some related research projects • GALEN & GALEN-IN-USE • CYC anatomy • SNOMED RT • HL7 vocabulary committee • MED

  11. What is an ontology? • «A specification of a conceptualization» • (Gruber, 1993) • «The subject of ontology is the study of the categories of things that exist or may exist in some domain. The product of such a study, called an ontology, is a catalog of the types of things that are assumed to exist in a domain of interest D from the perspective of a person who uses a language L for the purpose of talking about D. [...] » • (Sowa, 1997) • «A partial and indirect specification of a conceptualization» • -restricted notion- (Guarino, 1998)

  12. What is an ontology (restricted notion)? • An ontology is a set of axioms that account for the intended meaning (the intended models) of a vocabulary (the namespace of a logical language) • A set of axioms usually only approximate such intended models that on their turn only approximate the conceptualization of vocabulary items • A conceptualization is a set of conceptual relations that range over a domain and a set of relevant states of affairs (possible worlds) for that domain • Therefore, a precise definition of "ontology" (in a restricted, formal sense) might be "a partial specification of the intended models of the conceptualization of a vocabulary"

  13. Types of ontologies (broad notion) • Catalog of normalized terms, e.g. a list of terms used in the reports from a laboratory: no taxonomy, no axioms, and no glosses • Glossed catalog, e.g. a dictionary of medicine: a catalog with glosses. • Thesaurus, e.g. many parts of the UMLS Metathesaurus, GEMET: a hierarchical collection of terms; the hierarchical link is usually polysemous • Taxonomy, e.g. the ICD10: a collection of classes with a partial order induced by inclusion (classification) • Axiomatized taxonomy, e.g. the GALEN Core Model: a taxonomy with axioms • Ontology library, e.g. the Ontolingua repository: a set of axiomatized taxonomies with relations among them. Each element of the library is a module, which can be included into another one. Also, a concept from a module can be only used into another one. Ontology modules can be considered subdivisions of the namespace of a model

  14. From Data Integration to Conceptual Integration • Heterogeneous texts • Heterogeneous semi-structured texts (retrieval of web data types and descriptions) • Heterogeneous databases (schema integration, information brokering) • => In all these cases, heterogeneity concerns the conceptualization of the terminology used in the sources

  15. Polysemy and overlapping • Since the primary causes of heterogeneity are • polysemy (conceptual disalignment, difference of intended meaning of one name), and • conceptual overlapping (different names having overlapping meaning) • that arise in the union of the vocabularies of two any sources, ontologies are a major component to provide semantic access to (and integration of) terminological resources • Incidentally, polysemy is usually found within the same source as well (views, themes, homonyms):

  16. Ontology Learning • From Natural Language • From Semi-structured Data • From Structured Data • From Terminologies • => Integration of sources needs: • (Principled) Conceptual Abstraction

  17. Conceptual abstraction: an example • The domain ontology A has body region with the intended meaning of «loosely specified part of the body that can be cut, filled, etc.» • The domain ontology B has body region with the intended meaning of «region of the body at which body parts are located» • There is a metonymy acting on body region in A, whose intended meaning concerns body parts located at some region, although they are denoted by referring to the region itself (the intended meaning in B) • Hence, the metonymic name should be distinguished from the plain name, and correctly related to it • The distinction between objects (body parts) and regions, and the notion of a localization relation holding between objects and regions are both necessary to make the metonymy clear, and cannot be found in the specifications given in A or B. They have to be found in some generic theory

  18. Ontology integration: conceptual issues • Ontology integration is – generally speaking – the construction of an ontology C that formally specifies the union of the vocabularies of two other ontologies A and B • To be sure that A and B can be integrated at some level, C has to commit to both A's and B's conceptualizations. In other words, the intension of the concepts in A and B should be mapped to the intension of C's concepts • Unfortunately, this cannot be realized using only the conceptual relations specified in A and B for local tasks (for a specific context). The methodological principle adopted here is that generic ontologies reused from the philosophical, linguistic, mathematical, AI literature must found the comparison of different intensions. Our approach may be called principled conceptualintegration

  19. Aspects of integration • Three aspects of an ontology are taken into account: • the intended models of the conceptualizations of its vocabulary • the domain of interest of such models, i.e. the 'topic' of the ontology • the namespace of the ontology • The most interesting case is when A and B are supposed to commit to the conceptualization of the same domain of interest or of two overlapping domains. In particular, A and B may be:

  20. Some integration cases for the same topic • Alternative ontologies: the intended models of the conceptualizations of A and B are different (they partially overlap or are completely disjoint) while the domain of interest is (mostly) the same. This is a typical case that requires integration: different descriptions of the same topic are to be integrated • Truly overlapping ontologies: both the intended models of the conceptualizations of A and B and their domains of interest have a substantial overlap. This is another frequent case of required integration: descriptions of strongly related topics are to be integrated • Equivalent ontologies with vocabulary mismatches: the intended models of the conceptualizations of A and B are the same, as well as the domain of interest, but the namespaces of A and B are overlapping or disjoint. This is the case of equivalent theories with alternative vocabularies

  21. Ontological integration: operational issues • Depending on the amount of change necessary to the operational integration of A and B, different levels of interoperability can be distinguished: • Mediation: it requires no changes to A and B, but only mapping relations that describe the equivalence (partial or total) of A's and B's elements to C's elements. This may result in weak interoperability, since usually the intended models of A and B overlap only: some concepts from A may not have a correspondent in B, and vice-versa. This is the design choice for some recent information brokering architectures. However, such architectures, have a weak commitment towards a principled way of conceptual integration, possibly for its additional cost • Alignment: it requires some change to fill the biggest gaps of A and B respect to an ideal C that completely integrates A and B. Therefore, alignment requires at least a partial conceptual integration. It may support a limited interoperability; for example, deep inferences may be excluded • Unification: it may require a major reorganization of A and B, which are 'harmonized'. Unification intervenes on the inferential features of the systems, and consists in a complete operational integration: everything can be made in one system, can be made in the other. It results in the most complete interoperability but requires a complete conceptual integration as well. From the conceptual viewpoint, unification consists in the adoption of C as a standard in the systems using A or B

  22. Ontology integration: practical issues • Lack of hierarchies • Ambiguous hierarchies • Informality • Lack of modularity • Polysemy • Uncertain semantics • Prototypical descriptions • Ontological opaqueness • Lack of a (minimal) set of axioms • Confusing lexical clues • Awkward naming policy • 'Remainder' partitions • 'Exception' partitions • Terminological cycles • Meta-level soup • Low maintenance capabilities

  23. Ontologies: some desiderata • An explicit taxonomy with subsumption among concepts • Semantic explicitness of links • Modularity of namespace • A stratified design of the modules • Absence of polysemy within a module • Disjointness of concepts within a module and within the top-level • A proper interface between the ontology namespace and one or more sets of lexical realizations • Linguistically meaningful naming policy (cognitive transparency) • Rich documentation • Some minimal axiomatization to detail the difference among sibling concepts • Explicit linkage to concepts and relations from generic theories • Meta-level assignments to distinguish among the formal primitives assigned to concepts • Languages and implementations that support the previous needs as well as the possibility of collaborative modeling

  24. The ONIONS Methodology • ONIONS implementation is meant to provide extensive axiomatization, clear semantics, and ontological depth to a domain terminology • Extensive axiomatizationis obtained through a conceptual analysis of the terminological sources and their representation in a logical language with a rigorous semantics • Ontological depthis obtained by reusing a library of generic ontologies, on which the axiomatization depends. Such library may include multiple choices among partially incompatible ontologies. In particular, we suggest the importance of mereology or theory of parts, topology or theory of wholes, connexity and boundaries, morphology, or theory of form and congruence, localization, or theory of regions, time theory, actors, or theory of participants in a process, dependence theory, and the theory of environmental niches

  25. The main steps (I) • 0. Semantically opaque hierarchies and lists are pre-processed in order to create ‘clean’ taxonomies • 1. All concepts, relations, templates, rules, and axioms from a source ontology are represented in the ONIONS formalisms, currently Loom, Ontolingua, and OKBC • 2. When available, plain text descriptions are analyzed and axiomatized (text formalization) • 3. The union of such products is integrated by means of a set of generic ontologies. This is the most characteristic activity in ONIONS, which can be briefly described as follows:

  26. II • 3.1. For any set of sibling concepts in a taxonomy, the conceptual difference between each of them is inferred, and such difference is formalized by axioms that reuse the relations and concepts already in the library. If no concept is available to represent the difference, new concepts are added to the library • 3.2. For any set of polysemous senses of a term, different concepts are stated and placed within the library according to their topic and to the available modules. (Polysemy occurs when two concepts with overlapping or disjoint intended models have the same name.) • 3.3. Often, polysemous senses of a term - as well as different 'alternative' concepts - are metonymically related. For example: process/outcome (as in inflammation), region/object (as in body region), etc. Alternatives must be properly defined by making it explicit the relationship between them: e.g. "has-product" for inflammation, "location" for body-region • 3.4. When stating new concepts, the relations necessary to maintain the consistency with the existing concepts are instantiated. If conflicts arise with existing theories, a more general theory is searched which is more comprehensive. If this is impracticable, an alternative theory is created

  27. III • 3.5. Relevant integration cases. Since ONIONS requires the use of generic theories to axiomatize alternative theories, the integration of a concept C from an ontology O is performed by comparing C with the concepts D1,…,n already present in the evolving ontology library L, whose ontology set M1,…,n contains at least a significant subset of generic ontologies and the set of domain ontologies at that state in the evolution of L. The following cases appear relevant to the methodology: • 3.5.1. C's name is polysemous in O (internal polysemy). Iterate 3.2 ÷ 3.4 • 3.5.2. C's name is homonym with the name of a Di. (Homonymy occurs when both the intended models and the domains of two concepts with the same name are disjoint.) Homonyms must be differentiated by modifying the name, or by preventing the homonyms to be included in the same module namespace • 3.5.3. C's name is synonym with the name of a Di. (Synonymy is the converse of homonymy and occurs when two concepts with different names have both the same intended model and the same domain.) Synonyms must be preserved, or included in the set of lexical realizations related to the concept • 3.5.4. C is subsumed by some Di in L, but it has no total mapping on any Dj in L. The gap in L must be filled by adding C as a subconcept of Di

  28. IV • 3.5.5. C is an intersection between two concepts Di and Djin L. Solved by distinguishing types and roles, or different defining elements • 3.5.6. C has an alternative concept Di in L (same domain, but overlapping or disjoint intended models): • 3.5.6.1. If Cmetonymically depends on Di, C is properly related to Di • 3.5.6.2. If C and Di are different viewpoints on the same domain of interest, both concepts are kept; if the case, they are included in separate modules • 3.5.6.3. If the intended model of C is finer than Di's, Di is substituted with C • 3.5.6.4. If the intended model of C is coarser than Di's, C is ignored (but track of it is kept for mapping between sources)

  29. V • 4. The library of generic, intermediate, and domain ontologies should be stratified, say domain modules should include intermediate modules - that should include generic modules - so that each set of modules can be plugged or unplugged from its more general set without affecting the coherence of the entire library • 5. The source ontologies are explicitly mapped to the integrated ontology, in order to allow interoperability. The only admitted mappings are equivalent and coarser equivalent. Formally: for any source ontology SO and an ontology IO that is supposed to result (also)from the integration of SO, for any concept Ci in SO, there is a Di in IO such that CiI= DiI (equivalence of possible interpretations), or there is a disjunctive concept (orDi Dj) in IO such that CiI= DiIDjI(equivalence of possible interpretations to a disjunction of concepts – i.e. to a union of finer concepts) • 5.1. Partial mappings must have been already resolved through the methodology: if any, some step in the integration procedure must be iterated

  30. Ambiguous hierarchies

  31. A principled formalization • (defconcept ununited-fracture • :is-primitive (and fracture • (some morphology • (and bone • (or (some embodies malunion) • (not integral)))) • (some dependently-postdates fracture) • (all interpretant clinical-condition)))

  32. Some UMLS concepts pertaining the intersection: Amino Acid, Peptide, or Protein & Carbohydrate • (|hamster oviduct-specific glycoprotein|) • (|Par j I|) • (|(Man)6(GlcNAc)2Asn|) • (|Zn(+2)-IAA|) • (|collapsing factor|) • (|BDV 18K glycoprotein|) • (|SI-gene-associated glycoprotein, Nicotiana|) • (|FdI allergen|) • (|sca gene product|) • (|EPV20 protein|) • (|lubricin|) • (|Pluritene|) • (|Par h 1 allergen|) • (|Wnt11 gene product|) • (|I-D-Gal-BSA|) • (|mannose-bovine serum albumin conjugate|) • (|acrosome granule lysin|) • (|sulfatide activator|) • (|vaccinia virus A34R protein|) => More than 118,000 UMLS concepts (25%) are classified under an intersection

  33. Ontological analysis of the intersection • (defconcept |Amino Acid, Peptide, or Protein & Carbohydrate| • "834 instances. This conjunct includes two sibling types. • A protein containing a carbohydrate." • :annotations ((Sugg.Name "carbohydrate-containing-protein") • (onto-status integrated)) • :is-primitive (:and protein • (:some has-component carbohydrate)) • :context :substances)

  34. Morphologies • Names of anatomical morphologies are often polysemous: • Both a condition and the function that caused the condition ("inflammation", "ulcer", "fracture", "wound", "hyperplasia") • Both an object and the function that produced the object ("neoplasm", "hemorrhage") • Both an object O and the condition created in another object O' by O ("obstruction") • For example: "the fracture has been caused by a fall" vs. "the fracture is transverse"; "the obstruction occurred in the jejunum" vs. "the obstruction has been removed" • Conceptual analysis puts into evidence other issues concerning morphologies: • The dependence between a morphological condition, a function, and the related organ. For example, an "ulcer" (as a condition) of a stomach implies that the stomach embodies an ulceration function (an ulcer as a function) • The mereological import of morphologies: some are featured by an organ, some only by a part of an organ. For instance, an "ectopic heart" is wholly ectopic, but an "ulcerated stomach" is only partly ulcerated

  35. Morphologies analyzed • a property ("color", "consistency", "thickness", "size", "number", "shape") • a condition: • a topologically relevant condition: • an alteration of connection: • that creates a configuration(a new property) in an object ("fracture", "wound") • in the holey interior of an object ("obstruction") • between several objects ("fusion") • an alteration of the boundary between an object holey interior and the object complement: • creating a configuration in the boundary ("cavitation", "ulcer") • producing a substance flow ("hemorrhage", "ulcer") • an abnormal placement ("dislocation", "ectopia", "absence") • a form alteration condition ("deformity", "hyperplasia", "hypoplasia") • a condition involving the alteration of several properties ("inflammation", "eruption") • an abnormal, foreign object ("mass", "neoplasm", "calculus", "obstruction")

  36. Expliciting relations

  37. Medical source ontologies • The UMLS top-level (1998 edition: 132 "semantic types", 91 "relations", and 412 "templates"), • The Snomed-III top-level (510 "terms" and 25 "links"), • GMN top-level (708 "terms"), • The Icd10 top-level (185 "terms"), and • The GALEN Core Model v.5h (2,730 "entities", 413 "attributes" and 1,692 axioms), etc. • The 1998 edition of the UMLS Metathesaurus (476,000 "concepts", 93,000 explicit templates, and 599,000 thesaurus-like templates)

  38. The current ON9.2 library

  39. The current top-level

  40. Tool for representation ONTOLINGUA Tool for representation and classificationLOOM Tool for intermediate representation and interchangeOKBC Tool for browsing and editingONTOSAURUS

  41. Results • ON9.2: integration of the medical top levels within a library of generic theories. It includes a set of 50 modules with about 1,500 concepts. It is available in both Ontolingua and Loom languages • Explicitation of the Metathesaurus terminological knowledge: intersections of UMLS semantic types, relations defined by sources (IS_A and other relations) • Integration of the Metathesaurus intersections within ON9.2 • Contextualization of the Metathesaurus • An integrated model of clinical guidelines

  42. What is a Domain Reference Ontology? • An ontology usable to build new ontologies in a domain, or to plug existing ontologies in it • Our research in medical conceptual structures aims at defining a Medical Reference Ontology (library) • The current research in environmental metadata could be reconsidered as the construction of an Environmental Reference Ontology • We are confident that our methodology is suitable to this task without substantial revision • Warning: at first sight, conceptual heterogeneity in environment seems harder than medicine

  43. "Es gibt nichts praktischers als eine gute Theorie" • (Ludwig von Boltzmann)

  44. "Es gibt nichts praktischers als eine gute Theorie" • "There is nothing more practical than a good theory" • (Ludwig von Boltzmann)

  45. References • for generalities, the library, and conceptual investigations: • Gangemi A, Pisanelli DM, Steve G, "An overview of the ONIONS project: Applying ontologies to the integration of medical terminologies", Data and Knowledge Engineering, 31 (1999), 183-220 • for the investigation of the UMLS: • Pisanelli DM, Gangemi A, Steve G, "An Ontological Analysis of the UMLS Metathesaurus", Journal of American Medical Informatics Association, vol. 5 (symposium supplement), 1998 • for the pre-processing of informal terminological repositories: • Steve G, Gangemi A, Pisanelli DM, "Integrating Medical Terminologies with ONIONS Methodology", in Kangassalo H, Charrel JP (eds.) Information Modelling and Knowledge Bases VIII, Amsterdam, IOS Press 1997 • for the integration of clinical guidelines: • Pisanelli DM, Gangemi A, Steve G, "Toward a Standard for Guideline Representation: an Ontological Approach", Journal of American Medical Informatics Association, vol. 6 (symposium supplement), 1999

More Related