1 / 84

-Ontologies: Bio-Ontologies: Their Creation and Design

-Ontologies: Bio-Ontologies: Their Creation and Design . Dr. Peter Karp SRI, http://www.ai.sri.com/~pkarp/ Dr. Robert Stevens & Professor Carole Goble University of Manchester, UK http://img.cs.man.ac.uk/tambis. Advertisement. The Fourth Annual Bio-Ontologies Meeting

Angelica
Download Presentation

-Ontologies: Bio-Ontologies: Their Creation and Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. -Ontologies: Bio-Ontologies: Their Creation and Design Dr. Peter Karp SRI, http://www.ai.sri.com/~pkarp/ Dr. Robert Stevens & Professor Carole Goble University of Manchester, UK http://img.cs.man.ac.uk/tambis

  2. Advertisement The Fourth Annual Bio-Ontologies Meeting "Sharing Experiences and Spreading Best Practice” Sponsored by GlaxoSmithKline Pharmaceuticals Tivoli Gardens, Copenhagen, Denmark, 26th July 2001 Organised by: Richard Chen, Carole Goble, Robert Stevens, Peter Karp, Pat Hayes, Robin McEntire and Eric Neumann. http://img.cs.man.ac.uk/stevens/workshop01

  3. Outline • What is an ontology? • Motivation for ontologies in bioinformatics • Definition of an ontology • Naming the parts & comparing the types • Knowledge representation • Building an ontology • Methodologies, pprinciples and pitfalls • Running example: a macromolecule fragment • Ontology Tools • Development tools

  4. Ontologies:Definitions, Components, Subtypes

  5. Outline • Motivations for ontologies in bioinformatics • Definition of ontology • Principles and pitfalls of ontology design • GKB Editor ontology development tool

  6. Definition of an Ontology • Conceptualization of a domain of interest • Concepts, relations, attributes, constraints, objects, values • An ontology is a specification of a conceptualization • Formal notation • Documentation • A variety of forms, but includes: • A vocabulary of terms • Some specification of the meaning of the terms • Ontologies are defined for reuse

  7. Roles of Ontologies in Bioinformatics • Success of many biological DBs depends on • High fidelity ontologies • Clearly communicating their ontologies • Prevent errors on data entry and interpretation • Common framework for multidatabase queries • Controlled vocabularies for genome annotation • Riley ontology, GO • EC numbers

  8. Roles of Ontologies in Bioinformatics • Information-extraction applications • Reuse is a core aspect of ontologies • Reuse of existing ontologies faster than designing new ones • Reuse decreases semantic heterogeneity of DBs • Schema-driven Software • Knowledge-acquisition tools • Query tools

  9. Definitions • Data Model: • Primitive data structuring mechanism in which an ontology is expressed • Relational data model, object-oriented data model, frame data model • Ontology: • Domain specific conceptualization expressed within some data model

  10. Components of an Ontology • Concepts • AKA: Class, Set, Type, Predicate • Gene, Reaction, Macromolecule • Taxonomy of concepts • Generalization ordering among concepts • Concept A is a parent of concept B iff every instance of B is also an instance of A • Superset / subset • “A kind of” vs “a part of”

  11. Taxonomy of Concepts

  12. Components of an Ontology • Objects • AKA: Instances, members of the set • trpA Gene, Reaction 1.1.2.4 • Strictly speaking, an ontology with instances is a knowledge base • Relations and Attributes • AKA: Slots, properties • Product of Gene, Map-Position of Gene • Reactants of Reaction, Keq of Reaction • Values • The Product of the trpA Gene is tryptophan-synthetase • trpA.Product = tryptophan-synthetase

  13. Components of an Ontology • Constraints and other meta information about relations • Slot Product: • Value type: Poypeptide or RNA • Domain: Genes • Slot Map-Position: • Value type: Number • Domain: Genes • Cardinality: At-Most 1 • Range: 0 <= X <= 100 • General Axioms • Nucleic acids < 20 residues are oligonucleiotides

  14. More on Concepts • Primitive: properties are necessary • Globular protein must have hydrophobic core, but a protein with a hydrophobic core need not be a globular protein • Defined: properties are necessary + sufficient • Eukaryotic cells must have a nucleus. Every cell that contains a nucleus must be Eukaryotic.

  15. Ontology Subtypes Expressiveness • Controlled vocabulary • List of terms • Taxonomy • Terms in a generalization hierarchy • DB schemas (relational and object-oriented) • More implementation specific • No instance information • Limited constraints • Frame knowledge bases • Description Logics

  16. Ontology Subtypes • Database schema • Concepts, relations, constraints • Perhaps no taxonomy • At most hundreds of concepts • Taxonomy • Concepts, taxonomy, perhaps a few relations • Thousands of concepts • Knowledge base • Concepts, relations, constraints, objects, values • Hundreds to hundreds of thousands of concepts and objects

  17. Ontology Subtypes • Generic (a.k.a. upper, core or reference) • common high level concepts • “Physical”, “Abstract”, “Structure”, “Substance” • useful for ontology re-use • important when generating or analysing natural language expressions • Domain-oriented • domain specific (e.g. E.coli) • domain generalisations (e.g. gene function) • Task-oriented • task specific (e.g. annotation analysis) • task generalisations (e.g. problem solving)

  18. Knowledge Representation • Ontology are best delivered in some computable representation • Variety of choices with different: • Expressiveness • The range of constructs that can be used to formally, flexibly, explicitly and accurately describe the ontology • Ease of use • Computational complexity • Is the language computable in real time • Rigour • Satisfiability and consistency of the representation • Systematic enforcement mechanisms • Unambiguous, clear and well defined semantics • A subclassOf B don’t be fooled by syntax!

  19. Languages • Vocabularies using natural language • Hand crafted, flexible but difficult to evolve, maintain and keep consistent, with poor semantics • Gene Ontology • Object-based KR: frames • Extensively used, good structuring, intuitive. Semantics defined by OKBC standard • EcoCyc (uses Ocelot) and RiboWeb (uses Ontolingua) • Logic-based: Description Logics • Very expressive, model is a set of theories, well defined semantics • Automatic derived classification taxonomies • Concepts are defined and primitive • Expressivity vs. computational complexity balance • TAMBIS Ontology (uses FaCT)

  20. Vocabularies: Gene Ontology • Hand crafted with simple tree-like structures • Position of each concept and its relationships wholly determined by a person • Flexible but… • Maintenance and consistency preservation difficult and arduous • Poor semantics • Single hierarchies are limiting

  21. Description Logics • Describe knowledge in terms of concepts and relations • Concept defined in terms of other roles and concepts • Enzyme = protein which catalyses reaction • Reason that enzyme is a kind of protein • Model built up incrementally and descriptively • Uses logical reasoning to figure out: • Automatically derived (and evolved) classifications • Consistency -- concept satisfaction

  22. Frames and Logics • Frames • Rich set of language constructs • Impose restrictive constraints on how they are combined or used to define a class • Only support primitive concepts • Taxonomy hand-crafted • Description logics • Limited set of language constructs • Primitives combined to create defined concepts • Taxonomy for defined concepts established through logical reasoning • Expressivity vs. computational complexity • Less intuitive • Ideal: both! Current OIL activity uses a mixture. Logics provide reasoning services for frame schemes.

  23. Ontology Exchange • To reuse an ontology we need to share it with others in the community • Exchanging ontologies requires a language with: • common syntax • clear and explicit shared meaning • Tools for parsing, delivery, visualising etc • Exchanging the structure, semantics or conceptualisation?

  24. Frames: modelling primitives, OKBC Description Logics: formal semantics & reasoning support OIL Web languages: XML & RDF based syntax Ontology Exchange Languages • XOL eXtensible Ontology Language • XML markup • Frame based • Rooted in OKBC • http://www.ai.sri.com/pkarp/xol/ • OIL Ontology Interface LayerOntology Inference Layer • Gives a semantics to RDF-Schema • http://www.ontoknowledge.org/oil

  25. OIL: Ontology Metadata (Dublin Core) Ontology-container title “macromolecule fragment” creator “robert stevens” subject “macromolecule generic ontology” description “example for a tutorial” description.release “2.0” publisher “R Stevens” type “ontology” formal “pseudo-xml” identifier “http://www.ontoknowledge.org/oil/oil.pdf” source “http://img.cs.man.ac.uk/stevens/tambis-oil.html” language “OIL” language “en-uk” relation.haspart “http://www.ontoRus.com/bio/mmole.onto”

  26. The Three Roots of OIL Description Logics: Formal Semantics & Reasoning Support Frame-based Systems: Epistemological Modelling Primitives OIL Web Languages: XML- and RDF-based syntax

  27. OIL primitive ontology definitions slot-def has-backbone inverse is-backbone-of slot-def has-component inverse is -component-of properties transitive class-def nucleic-acid class-def rna subclass-of nucleic-acid slot-constraint has-backbone value-type ribophosphate class-def ribophosphate class-def deoxyribophosphate subclass-of NOT ribophosphate

  28. OIL defined ontology definitions class-def defined dna subclass-of nucleic-acid AND NOT rna slot-constraint has-backbone value-type deoxyribophosphate class-def defined enzyme subclass-of protein slot-constraint catalyse has-value reaction class-def defined kinase subclass-of protein slot-constraint catalyse has-value phosphorylation-reaction

  29. OIL in XML • OIL has a DTD, an XML Schema and a mapping to RDF-Schema. See web site for details <slot-def> <slot-name = “has-component”/> <inverse> <slot-name = “is-component-of”/> </inverse> <properties> <transitive/> </properties> </slot-def> <class-def> <class-name= “nucleic-acid”/></class-def> <class-def> <class-name= “rna”/> <subclass-of> <class name = “nucleic-acid”/> </subclass-of> <slot-constraint> <slot-name = “has-backbone”/> <value-type> <class name= “ribophosphate” </value-type> </slot-constraint> </class-def>

  30. OIL Remarks • Tools: • Protégé II editor • FaCT reasoner • Other projects: • Semantic Web projects (http://www.semanticweb.org) • Agents for the web projects (e.g. DAML) A knowledge representation language and inference mechanism for the web

  31. OIL Features • Based on standard frame languages • Extends expressive power with DL style logical constructs • Still has frame look and feel • Can still function as a basic frame language • OILcore language restricted in some respects so as to allow for reasoning support • No constructs with ill defined semantics • No constructs that compromise decidability • Has both XML and RDF(S) based syntax

  32. OIL Features • Semantics clearly defined by mapping to very expressive Description Logic, e.g.: • slot-constraint reverse-transcribe-from has-valuemRNA or (part-of has-value mRNA) • eats.meat eats.fish • Note the importance of clear semantics: • eats.(meat  fish) • is inconsistent (assuming meat and fish are disjoint) • Mapping can also be used to provide reasoning support from a Description Logic system (e.g., FaCT)

  33. Why Reasoning Support? • Key feature of OIL core language is availability of reasoning support • Reasoning intended as design support tool • Check logical consistency of classes • Compute implicit class hierarchy • May be less important in small local ontologies • Can still be useful tool for design and maintenance • More important with larger ontologies/multiple authors • Valuable tool for integrating and sharing ontologies • Use definitions/axioms to establish inter-ontology relationships • Check for consistency and (unexpected) implied relationships • Already shown to be useful technique for DB schema integration

  34. Classifying by Reasoning

  35. Finding Inconsistencies

  36. Changing Classifications

  37. DAML+OIL • OIL merged with DAML • Originally retained frame syntax • DAML more concerned with deploymnent rather than building and managing • OIL mapped to DAML+OIL, but not reliably reversed • FRAME look and feel may return • Web ontology language

  38. Building Ontologies

  39. Building Ontologies • No field of Ontological Engineering equivalent to Knowledge or Software Engineering; • No standard methodologies for building ontologies; • Such a methodology would include: • a set of stages that occur when building ontologies; • guidelines and principles to assist in the different stages; • an ontology life-cycle which indicates the relationships among stages. • Gruber's guidelines for constructing ontologies are well known.

  40. The Development Lifecycle • Two kinds of complementary methodologies emerged: • Stage-based, e.g. TOVE [Uschold96] • Iterative evolving prototypes, e.g. MethOntology [Gomez Perez94]. • Most have TWO stages: • Informal stage • ontology is sketched out using either natural language descriptions or some diagram technique • Formal stage • ontology is encoded in a formal knowledge representation language, that is machine computable • An ontology should ideally be communicated to people and unambiguously interpreted by software • the informal representation helps the former • the formal representation helps the latter.

  41. A Provisional Methodology • A skeletal methodology and life-cycle for building ontologies; • Inspired by the software engineering V-process model; • The overall process moves through a life-cycle. The left side charts the processes in building an ontology The right side charts the guidelines, principles and evaluation used to ‘quality assure’ the ontology

  42. Ontology in Use The V-model Methodology Evaluation: coverage, verification, granularity Identify purpose and scope Knowledge acquisition User Model Conceptualisation Principles: commitment, conciseness, clarity, extensibility, coherency Conceptualisation Integrating existing ontologies Conceptualisation Model Encoding/Representation principles: encoding bias, consistency, house styles and standards, reasoning system exploitation Encoding Representation Implementation Model

  43. The ontology building life-cycle Identify purpose and scope Knowledge acquisition Building Language and representation Conceptualisation Integrating existing ontologies Available development tools Encoding Evaluation

  44. User Model: Identify purpose and scope • Decide what applications the ontology will support • EcoCyc: Pathway engineering, qualitative simulation of metabolism, computer-aided instruction, reference source • TAMBIS: retrieval across a broad range of bioinformatics resources • The use to which an ontology is put affects its content and style • Impacts re-usability of the ontology

  45. User Model: Knowledge Acquisition • Specialist biologists; standard text books; research papers and other ontologies and database schema. • Motivating scenarios and informal competency questions – informal questions the ontology must be able to answer • Evaluation: • Fitness for purpose • Coverage and competency

  46. Ontology Scenario • A molecule ontology; • Describes the molecules stored in bioinformatics databases and annotated therein; • It should cover the molecules and other chemicals described in the resources; • The ontology will be used for querying and annotating information in bioinformatics resources.

  47. Competency Questions • Cover the macromolecules found in molecular biology resources and courses; • Should accommodate various views on the macromolecules; • should cover the queries people want to ask of macromolecules; • In reality, need more detail on these questions- “give me tRNA genes with anticodon x, from aardvark”.

  48. Acquiring Knowledge • Find your knowledge! • An important source is your head, but… • Use text books, glossaries (many of which lie on the web) and domain experts; • Use other ontologies – what did they include and how did they do it? • Record your sources of knowledge. • Use your competency questions;

  49. Starting Concept List • Chemicals – atom, ion, molecule, compound, element; • Molecular-compound, ionic-compound, ionic-molecular-compound, …; • Ionic-macromolecular-compound and ionic-msall-macromolecular-compound; • Protein, peptide, polyprotein, enzyme, holo-protein, apo-protein,… • Nucleic acid – DNA, RNA, tRNA, mRna, snRNA, …

  50. Conceptualisation Model: Conceptualisation • Identify the key concepts, their properties and the relationships that hold between them; • Which ones are essential? • What information will be required by the applications? • Structure domain knowledge into explicit conceptual models. • Identify natural language terms to refer to such concepts, relations and attributes;

More Related