ontologies bio ontologies their creation and design l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
-Ontologies: Bio-Ontologies: Their Creation and Design PowerPoint Presentation
Download Presentation
-Ontologies: Bio-Ontologies: Their Creation and Design

Loading in 2 Seconds...

play fullscreen
1 / 84

-Ontologies: Bio-Ontologies: Their Creation and Design - PowerPoint PPT Presentation


  • 283 Views
  • Uploaded on

-Ontologies: Bio-Ontologies: Their Creation and Design . Dr. Peter Karp SRI, http://www.ai.sri.com/~pkarp/ Dr. Robert Stevens & Professor Carole Goble University of Manchester, UK http://img.cs.man.ac.uk/tambis. Advertisement. The Fourth Annual Bio-Ontologies Meeting

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '-Ontologies: Bio-Ontologies: Their Creation and Design' - Angelica


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ontologies bio ontologies their creation and design

-Ontologies: Bio-Ontologies: Their Creation and Design

Dr. Peter Karp

SRI, http://www.ai.sri.com/~pkarp/

Dr. Robert Stevens & Professor Carole Goble

University of Manchester, UK

http://img.cs.man.ac.uk/tambis

advertisement
Advertisement

The Fourth Annual Bio-Ontologies Meeting

"Sharing Experiences and Spreading Best Practice”

Sponsored by

GlaxoSmithKline Pharmaceuticals

Tivoli Gardens, Copenhagen, Denmark,

26th July 2001

Organised by: Richard Chen, Carole Goble, Robert Stevens, Peter Karp, Pat Hayes, Robin McEntire and Eric Neumann.

http://img.cs.man.ac.uk/stevens/workshop01

outline
Outline
  • What is an ontology?
    • Motivation for ontologies in bioinformatics
    • Definition of an ontology
    • Naming the parts & comparing the types
    • Knowledge representation
  • Building an ontology
    • Methodologies, pprinciples and pitfalls
    • Running example: a macromolecule fragment
    • Ontology Tools
    • Development tools
outline5
Outline
  • Motivations for ontologies in bioinformatics
  • Definition of ontology
  • Principles and pitfalls of ontology design
  • GKB Editor ontology development tool
definition of an ontology
Definition of an Ontology
  • Conceptualization of a domain of interest
    • Concepts, relations, attributes, constraints, objects, values
  • An ontology is a specification of a conceptualization
    • Formal notation
    • Documentation
  • A variety of forms, but includes:
    • A vocabulary of terms
    • Some specification of the meaning of the terms
  • Ontologies are defined for reuse
roles of ontologies in bioinformatics
Roles of Ontologies in Bioinformatics
  • Success of many biological DBs depends on
    • High fidelity ontologies
    • Clearly communicating their ontologies
      • Prevent errors on data entry and interpretation
  • Common framework for multidatabase queries
  • Controlled vocabularies for genome annotation
    • Riley ontology, GO
    • EC numbers
roles of ontologies in bioinformatics8
Roles of Ontologies in Bioinformatics
  • Information-extraction applications
  • Reuse is a core aspect of ontologies
    • Reuse of existing ontologies faster than designing new ones
    • Reuse decreases semantic heterogeneity of DBs
  • Schema-driven Software
    • Knowledge-acquisition tools
    • Query tools
definitions
Definitions
  • Data Model:
    • Primitive data structuring mechanism in which an ontology is expressed
    • Relational data model, object-oriented data model, frame data model
  • Ontology:
    • Domain specific conceptualization expressed within some data model
components of an ontology
Components of an Ontology
  • Concepts
    • AKA: Class, Set, Type, Predicate
    • Gene, Reaction, Macromolecule
  • Taxonomy of concepts
    • Generalization ordering among concepts
    • Concept A is a parent of concept B iff every instance of B is also an instance of A
    • Superset / subset
    • “A kind of” vs “a part of”
components of an ontology12
Components of an Ontology
  • Objects
    • AKA: Instances, members of the set
    • trpA Gene, Reaction 1.1.2.4
    • Strictly speaking, an ontology with instances is a knowledge base
  • Relations and Attributes
    • AKA: Slots, properties
    • Product of Gene, Map-Position of Gene
    • Reactants of Reaction, Keq of Reaction
  • Values
    • The Product of the trpA Gene is tryptophan-synthetase
    • trpA.Product = tryptophan-synthetase
components of an ontology13
Components of an Ontology
  • Constraints and other meta information about relations
    • Slot Product:
    • Value type: Poypeptide or RNA
    • Domain: Genes
    • Slot Map-Position:
    • Value type: Number
    • Domain: Genes
    • Cardinality: At-Most 1
    • Range: 0 <= X <= 100
  • General Axioms
    • Nucleic acids < 20 residues are oligonucleiotides
more on concepts
More on Concepts
  • Primitive: properties are necessary
    • Globular protein must have hydrophobic core, but a protein with a hydrophobic core need not be a globular protein
  • Defined: properties are necessary + sufficient
    • Eukaryotic cells must have a nucleus. Every cell that contains a nucleus must be Eukaryotic.
ontology subtypes expressiveness
Ontology Subtypes Expressiveness
  • Controlled vocabulary
    • List of terms
  • Taxonomy
    • Terms in a generalization hierarchy
  • DB schemas (relational and object-oriented)
    • More implementation specific
    • No instance information
    • Limited constraints
  • Frame knowledge bases
  • Description Logics
ontology subtypes
Ontology Subtypes
  • Database schema
    • Concepts, relations, constraints
    • Perhaps no taxonomy
    • At most hundreds of concepts
  • Taxonomy
    • Concepts, taxonomy, perhaps a few relations
    • Thousands of concepts
  • Knowledge base
    • Concepts, relations, constraints, objects, values
    • Hundreds to hundreds of thousands of concepts and objects
ontology subtypes17
Ontology Subtypes
  • Generic (a.k.a. upper, core or reference)
    • common high level concepts
    • “Physical”, “Abstract”, “Structure”, “Substance”
    • useful for ontology re-use
    • important when generating or analysing natural language expressions
  • Domain-oriented
    • domain specific (e.g. E.coli)
    • domain generalisations (e.g. gene function)
  • Task-oriented
    • task specific (e.g. annotation analysis)
    • task generalisations (e.g. problem solving)
knowledge representation
Knowledge Representation
  • Ontology are best delivered in some computable representation
  • Variety of choices with different:
    • Expressiveness
      • The range of constructs that can be used to formally, flexibly, explicitly and accurately describe the ontology
    • Ease of use
    • Computational complexity
      • Is the language computable in real time
    • Rigour
      • Satisfiability and consistency of the representation
      • Systematic enforcement mechanisms
    • Unambiguous, clear and well defined semantics
      • A subclassOf B don’t be fooled by syntax!
languages
Languages
  • Vocabularies using natural language
    • Hand crafted, flexible but difficult to evolve, maintain and keep consistent, with poor semantics
    • Gene Ontology
  • Object-based KR: frames
    • Extensively used, good structuring, intuitive. Semantics defined by OKBC standard
    • EcoCyc (uses Ocelot) and RiboWeb (uses Ontolingua)
  • Logic-based: Description Logics
    • Very expressive, model is a set of theories, well defined semantics
    • Automatic derived classification taxonomies
    • Concepts are defined and primitive
    • Expressivity vs. computational complexity balance
    • TAMBIS Ontology (uses FaCT)
vocabularies gene ontology
Vocabularies: Gene Ontology
  • Hand crafted with simple tree-like structures
  • Position of each concept and its relationships wholly determined by a person
  • Flexible but…
  • Maintenance and consistency preservation difficult and arduous
  • Poor semantics
  • Single hierarchies are limiting
description logics
Description Logics
  • Describe knowledge in terms of concepts and relations
  • Concept defined in terms of other roles and concepts
    • Enzyme = protein which catalyses reaction
    • Reason that enzyme is a kind of protein
  • Model built up incrementally and descriptively
  • Uses logical reasoning to figure out:
    • Automatically derived (and evolved) classifications
    • Consistency -- concept satisfaction
frames and logics
Frames and Logics
  • Frames
    • Rich set of language constructs
    • Impose restrictive constraints on how they are combined or used to define a class
    • Only support primitive concepts
    • Taxonomy hand-crafted
  • Description logics
    • Limited set of language constructs
    • Primitives combined to create defined concepts
    • Taxonomy for defined concepts established through logical reasoning
    • Expressivity vs. computational complexity
    • Less intuitive
  • Ideal: both! Current OIL activity uses a mixture. Logics provide reasoning services for frame schemes.
ontology exchange
Ontology Exchange
  • To reuse an ontology we need to share it with others in the community
  • Exchanging ontologies requires a language with:
    • common syntax
    • clear and explicit shared meaning
  • Tools for parsing, delivery, visualising etc
  • Exchanging the structure, semantics or conceptualisation?
ontology exchange languages

Frames:

modelling primitives,

OKBC

Description Logics:

formal semantics &

reasoning support

OIL

Web languages:

XML & RDF based syntax

Ontology Exchange Languages
  • XOL eXtensible Ontology Language
    • XML markup
    • Frame based
    • Rooted in OKBC
    • http://www.ai.sri.com/pkarp/xol/
  • OIL Ontology Interface LayerOntology Inference Layer
    • Gives a semantics to RDF-Schema
    • http://www.ontoknowledge.org/oil
oil ontology metadata dublin core
OIL: Ontology Metadata (Dublin Core)

Ontology-container

title “macromolecule fragment”

creator “robert stevens”

subject “macromolecule generic ontology”

description “example for a tutorial”

description.release “2.0”

publisher “R Stevens”

type “ontology”

formal “pseudo-xml”

identifier “http://www.ontoknowledge.org/oil/oil.pdf”

source “http://img.cs.man.ac.uk/stevens/tambis-oil.html”

language “OIL”

language “en-uk”

relation.haspart “http://www.ontoRus.com/bio/mmole.onto”

the three roots of oil
The Three Roots of OIL

Description Logics:

Formal Semantics &

Reasoning Support

Frame-based Systems:

Epistemological Modelling

Primitives

OIL

Web Languages:

XML- and RDF-based

syntax

oil primitive ontology definitions
OIL primitive ontology definitions

slot-def has-backbone

inverse is-backbone-of

slot-def has-component

inverse is -component-of

properties transitive

class-def nucleic-acid

class-def rna subclass-of nucleic-acid

slot-constraint has-backbone

value-type ribophosphate

class-def ribophosphate

class-def deoxyribophosphate

subclass-of NOT ribophosphate

oil defined ontology definitions
OIL defined ontology definitions

class-def defined dna

subclass-of nucleic-acid AND NOT rna

slot-constraint has-backbone

value-type deoxyribophosphate

class-def defined enzyme

subclass-of protein

slot-constraint catalyse

has-value reaction

class-def defined kinase

subclass-of protein

slot-constraint catalyse

has-value phosphorylation-reaction

oil in xml
OIL in XML
  • OIL has a DTD, an XML Schema and a mapping to RDF-Schema. See web site for details

<slot-def>

<slot-name = “has-component”/>

<inverse> <slot-name = “is-component-of”/> </inverse>

<properties> <transitive/> </properties>

</slot-def>

<class-def> <class-name= “nucleic-acid”/></class-def>

<class-def>

<class-name= “rna”/>

<subclass-of> <class name = “nucleic-acid”/> </subclass-of>

<slot-constraint>

<slot-name = “has-backbone”/>

<value-type> <class name= “ribophosphate” </value-type>

</slot-constraint>

</class-def>

oil remarks
OIL Remarks
  • Tools:
    • Protégé II editor
    • FaCT reasoner
  • Other projects:
    • Semantic Web projects (http://www.semanticweb.org)
    • Agents for the web projects (e.g. DAML)

A knowledge representation language and inference mechanism for the web

oil features
OIL Features
  • Based on standard frame languages
  • Extends expressive power with DL style logical constructs
    • Still has frame look and feel
    • Can still function as a basic frame language
  • OILcore language restricted in some respects so as to allow for reasoning support
    • No constructs with ill defined semantics
    • No constructs that compromise decidability
  • Has both XML and RDF(S) based syntax
oil features32
OIL Features
  • Semantics clearly defined by mapping to very expressive Description Logic, e.g.:
    • slot-constraint reverse-transcribe-from has-valuemRNA or (part-of has-value mRNA)
    • eats.meat eats.fish
  • Note the importance of clear semantics:
    • eats.(meat  fish)
  • is inconsistent (assuming meat and fish are disjoint)
  • Mapping can also be used to provide reasoning support from a Description Logic system (e.g., FaCT)
why reasoning support
Why Reasoning Support?
  • Key feature of OIL core language is availability of reasoning support
  • Reasoning intended as design support tool
    • Check logical consistency of classes
    • Compute implicit class hierarchy
  • May be less important in small local ontologies
    • Can still be useful tool for design and maintenance
    • More important with larger ontologies/multiple authors
  • Valuable tool for integrating and sharing ontologies
    • Use definitions/axioms to establish inter-ontology relationships
    • Check for consistency and (unexpected) implied relationships
    • Already shown to be useful technique for DB schema integration
daml oil
DAML+OIL
  • OIL merged with DAML
  • Originally retained frame syntax
  • DAML more concerned with deploymnent rather than building and managing
  • OIL mapped to DAML+OIL, but not reliably reversed
  • FRAME look and feel may return
  • Web ontology language
building ontologies39
Building Ontologies
  • No field of Ontological Engineering equivalent to Knowledge or Software Engineering;
  • No standard methodologies for building ontologies;
  • Such a methodology would include:
    • a set of stages that occur when building ontologies;
    • guidelines and principles to assist in the different stages;
    • an ontology life-cycle which indicates the relationships among stages.
  • Gruber's guidelines for constructing ontologies are well known.
the development lifecycle
The Development Lifecycle
  • Two kinds of complementary methodologies emerged:
    • Stage-based, e.g. TOVE [Uschold96]
    • Iterative evolving prototypes, e.g. MethOntology [Gomez Perez94].
  • Most have TWO stages:
    • Informal stage
      • ontology is sketched out using either natural language descriptions or some diagram technique
    • Formal stage
      • ontology is encoded in a formal knowledge representation language, that is machine computable
  • An ontology should ideally be communicated to people and unambiguously interpreted by software
    • the informal representation helps the former
    • the formal representation helps the latter.
a provisional methodology
A Provisional Methodology
  • A skeletal methodology and life-cycle for building ontologies;
  • Inspired by the software engineering V-process model;
  • The overall process moves through a life-cycle.

The left side charts the processes in building an ontology

The right side charts the guidelines, principles and evaluation used to ‘quality assure’ the ontology

the v model methodology

Ontology in Use

The V-model Methodology

Evaluation: coverage, verification, granularity

Identify purpose and scope

Knowledge acquisition

User Model

Conceptualisation Principles: commitment, conciseness, clarity, extensibility, coherency

Conceptualisation

Integrating existing ontologies

Conceptualisation Model

Encoding/Representation principles: encoding bias, consistency, house styles and standards, reasoning system exploitation

Encoding

Representation

Implementation Model

the ontology building life cycle
The ontology building life-cycle

Identify purpose and scope

Knowledge acquisition

Building

Language and representation

Conceptualisation

Integrating existing ontologies

Available development tools

Encoding

Evaluation

user model identify purpose and scope
User Model: Identify purpose and scope
  • Decide what applications the ontology will support
  • EcoCyc: Pathway engineering, qualitative simulation of metabolism, computer-aided instruction, reference source
  • TAMBIS: retrieval across a broad range of bioinformatics resources
  • The use to which an ontology is put affects its content and style
  • Impacts re-usability of the ontology
user model knowledge acquisition
User Model: Knowledge Acquisition
  • Specialist biologists; standard text books; research papers and other ontologies and database schema.
  • Motivating scenarios and informal competency questions – informal questions the ontology must be able to answer
  • Evaluation:
    • Fitness for purpose
    • Coverage and competency
ontology scenario
Ontology Scenario
  • A molecule ontology;
  • Describes the molecules stored in bioinformatics databases and annotated therein;
  • It should cover the molecules and other chemicals described in the resources;
  • The ontology will be used for querying and annotating information in bioinformatics resources.
competency questions
Competency Questions
  • Cover the macromolecules found in molecular biology resources and courses;
  • Should accommodate various views on the macromolecules;
  • should cover the queries people want to ask of macromolecules;
  • In reality, need more detail on these questions- “give me tRNA genes with anticodon x, from aardvark”.
acquiring knowledge
Acquiring Knowledge
  • Find your knowledge!
  • An important source is your head, but…
  • Use text books, glossaries (many of which lie on the web) and domain experts;
  • Use other ontologies – what did they include and how did they do it?
  • Record your sources of knowledge.
  • Use your competency questions;
starting concept list
Starting Concept List
  • Chemicals – atom, ion, molecule, compound, element;
  • Molecular-compound, ionic-compound, ionic-molecular-compound, …;
  • Ionic-macromolecular-compound and ionic-msall-macromolecular-compound;
  • Protein, peptide, polyprotein, enzyme, holo-protein, apo-protein,…
  • Nucleic acid – DNA, RNA, tRNA, mRna, snRNA, …
conceptualisation model conceptualisation
Conceptualisation Model: Conceptualisation
  • Identify the key concepts, their properties and the relationships that hold between them;
    • Which ones are essential?
    • What information will be required by the applications?
  • Structure domain knowledge into explicit conceptual models.
  • Identify natural language terms to refer to such concepts, relations and attributes;
slide51

Conceptualisation Sketch

Chemical

Molecule

Compound

Element

Ion

Atom

Molecular

Compound

Ionic

Compound

Molecular

Element

Ionic

Molecule

Non-Metal

Metal

Ionic Molecular

Compound

Metaloid

slide52

Molecule Conceptualisation Sketch

Ionic Macromolecular

Compound

Macromolecule

Small

Molecule

Polysaccharide

Protein

Nucleic

Acid

Peptide

Starch

Glycogen

Enzyme

DNA

RNA

snRNA

mRNA

tRNA

rRNA

conceptualisation model naming
Conceptualisation Model: Naming
  • Determine naming conventions
    • Consistent naming for classes and slots
    • EcoCyc:
      • Classes are capitalized, hyphenated, plural
      • Slot names are uppercase

A quality ontology captures relevant biological distinctions with high fidelity

conceptualisation model pitfalls
Conceptualisation Model: Pitfalls
  • Pitfall: Missing ontological elements
    • Missing classes: Swiss-Prot Protein complexes
    • Lack of Lipid and Cofactor in example ontology
    • Missing attributes: Genetic code identifier
    • Confuse 1:1 with 1:Many, or 1:Many with Many:Many
      • Cofactor as an attribute of reaction as well as protein
    • Important data is stored within text/comment fields
  • Pitfall: Extra ontological elements
  • Pitfall: Stop over-elaborating – when do I stop?
  • Pitfall: Relevance – do I really need all this detail?
conceptualisation partonomy
Conceptualisation: Partonomy
  • Part-of relationships very important
  • Several linds of part-of: component-of, region-of, mixture-of
  • Alpha-helix is a region of a protein, but a protein is compoennt of a complex
  • Care in placing transitivity
integrating existing ontologies
Integrating Existing Ontologies
  • Reuse or adapt existing ontologies when possible
    • Save time
    • Correctness
    • Facilitate interoperation
    • Reuse GO to give example ontology Function, Process and Location
  • Integration of ontologies
    • Ontologies have to be aligned
    • Hindered by poor documentation and argumentation
    • Hindered by implicit assumptions
    • Shared generic upper level ontologies should make integration easier
encoding implementation toolkit
Encoding: Implementation Toolkit
  • Construct ontology using an ontology-development system
    • Does the data model have the right expressivity?
      • Is it just a taxonomy or are relationships needed?
      • Is multiple parentage needed? Inverse relationships?
      • What types of constraints are needed?
    • Are reasoning services needed?
    • What are authoring features of the development tool?
    • Can ontology be exported to a DBMS schema?
    • Can ontology be exported to an ontology exchange language?
    • Is simultaneous updating by multiple authors needed?
    • Size limitations of development tool?
encoding
Encoding

Encode sketch in KRL;

  • Use OIL – a frame syntax with reasoning support if we want it;
  • Wide range of expressivity (see cofactor example later);
  • Hand craft a hierarchy – implement the sketch made earlier;
  • This hand-crafted version can be migrated to a more descriptive form later.
initial encoding
Initial Encoding

class-def chemical

subclass-of substance

class-def molecule

subclass-of chemical

class-def compound

subclass-of chemical

class-def molecular-compound

subclass-of molecule and compound

encoding ontology implementation pitfalls
Encoding: Ontology Implementation Pitfalls
  • Pitfall: Semantic ambiguity
    • Multiple ways to encode the same knowledge
    • Meaning of class definitions unclear
  • Pitfall: Encoding Bias
    • Encoding the ontology changes the ontology
encoding ontology implementation pitfalls61
Encoding: Ontology Implementation Pitfalls
  • Pitfall: Redundancy (lack of normalization)
    • Exact same information repeated
    • Presence of computationally derivable information
      • Date of birth and age
      • Sequence length
      • DNA sequence and reverse complement
    • More effort required for entry and update
    • In KB partial updates lead to inconsistency
    • OK if redundant information is maintained automatically
encoding the interaction problem
Encoding: The Interaction Problem
  • Task influences what knowledge is represented and how its represented
    • Molecular biology: chemical and physical properties of proteins
    • Bioinformatics: accession number, function gene
    • Underlying perspectives mean they may not be reconcilable
  • If an ontology has too many conflicting tasks it can end up compromised – TaO experience
evaluate it a guide for reusability
Evaluate it - A guide for reusability
  • Conciseness
    • No redundancy
    • Appropriateness – protein molecules at the atomic resolution when amino acid level would do
  • Clarity
  • Consistency
  • Satisfiability – it doesn’t contradict itself
  • Molecule and Compound disjoint, but molecular-cpound is (molecule and compound)
    • Commitment
    • Do I have to buy into a load of stuff I don’t really need or want just to get the bit I do?
documentation make ontology understandable
Documentation: Make Ontology Understandable!
  • Produce clear informal and formal documentation
    • An ontology that cannot be understood will not be reused
    • Genbank feature table
    • NCBI ASN.1 definitions
  • There exists a space of alternative ontology design decisions
    • Semantics / Granularity
    • Terminology
  • Pitfall: Neglecting to record design rationale
slide65

Molecules Revisited

Non-Ionic Macromolecular

Compound

Ionic Macromolecular

Compound

Macromolecule

Small

Molecule

Polysaccharide

Protein

Nucleic

Acid

Peptide

Starch

Glycogen

Enzyme

DNA

RNA

snRNA

mRNA

tRNA

rRNA

more encoding
More Encoding

class-def chemical

subclass-of substance

class-def defined molecule

subclass-of chemical

Slot-constraint contains-bond min-cardinality 1 has-value covalent-bond

class-def defined compound

subclass-of chemical

Slot-constraint has-atom-types greater-than 1

class-def defined molecular-compound

subclass-of molecule and compound

cofactor knowledge
Cofactor Knowledge
  • Gather knowledge about cofactors, coenzymes and prosthetic groups from glossaries and dictionaries etc.
  • Note that definitions are inconsistent and even contradictory.
  • Synthesise knowledge and make judgements.
encoding cofactor
Encoding Cofactor

Class-def defined cofactor

Subclass-of metal-ion or small-organic-molecule

Slot-constraint binds-to has-value protein

Class-def defined coenzyme

Subclass-of cofactor

Slot-constraint binds-loosley-to has-value protein

Class-def defined prosthetic-group

Subclass-of cofactor and (not metal-ion)

Slot-constraint binds-strongly-to has-value protein

cofactor discussion
Cofactor Discussion
  • Classifies as a kind of chemical;
  • Taken from IUPAC definition – document – not a child of organic-molecule and metal-ion;
  • Can express both disjunction and negation in OIL;
  • Uses a slot hierarchy in describing binds-to.
more discussion
More Discussion
  • Can we define sufficiency conditions for peptide?
  • Mass and length are not easy to use in definition – A protein is > 100 Kda;
  • What about a 99 Kda protein;
publish the ontology
Publish the Ontology
  • Formal and informal specifications
  • Intended domain of application
  • Design rationale
  • Limitations
  • See EcoCyc paper in ISMB-93/Bioinformatics 00
  • See TAMBIS paper in Bioinformatics 99
ontological pitfalls
Ontological Pitfalls
  • Stop-over – when do I stop over elaborating?
    • Proteins  amino acid residues  side chains  physical chemical properties ….
  • Relevance
    • Do we need to mention all the types of nucleic acid?
ontology developmenttools
Ontology DevelopmentTools
  • Development environments
  • Ontology Libraries
  • Ontology publishing and exchange
      • Across all representational forms (logic, frame, etc..)
      • Web compliant
  • Ontology delivery
      • Ontology servers
development environments
Development Environments
  • Considerations depend on ontology subtype!
    • Expressiveness of data model
    • Authoring features
    • DBMS export capabilities
    • Ontology-exchange language export capabilities
    • Distributed authoring
    • Size limitations
  • WebOnto
  • Ontosaurus
  • GKB Editor
  • Protégé II
  • Ontolingua
  • GRAIL toolkit etc…
  • Wondertools
gkb editor ontology development toolkit
GKB EditorOntology Development Toolkit
  • Graphical editor for KBs and ontologies
  • Ontologies stored in Ocelot object-oriented knowledge base
    • Expressive, scalable, distributed
    • EcoCyc ontology contains 1K classes, 15K instances
  • Knowledge is graphically portrayed in 3 viewers
  • All operations are schema driven
  • See http://www.ai.sri.com/~gkb/user-man.html
ocelot capabilities
Ocelot Capabilities
  • Frame data model
  • KBs and ontologies stored in files or Oracle
  • Oracle KBs and ontologies:
    • Better scalability -- frame faulting on demand and in background
    • Concurrency control system coordinates changes by multiple users
    • Transaction logging (recall operation history)
  • GFP API provides programmatic interface
distributed ontology development
Distributed Ontology Development

User 1

User 2

Internet

Oracle

Server

User 4

User 3

gkb editor
GKB Editor
  • Taxonomy Viewer
    • Create/delete classes and instances
    • Browse class taxonomy
    • Alter class/subclass links
  • Frame editor
    • Add/remove slots to/from classes
    • Create/delete/edit slot values for instances
  • Frame relationships viewer
    • View and update a network of relationships among instances
summary
Summary
  • A definition of ontology as a characterisation of conceptualisation -- capturing the things we know about a domain;
  • The knowledge within an ontology can be applied to a variety of tasks;
  • Building an ontology -- process and life-cycle;
  • Influences on the choice of encoding language;
  • The desirability of tools for the building, management and exchange of ontologies;
final remarks
Final remarks
  • The use of ontologies is growing within the bio-molecular world
  • They are a high-cost, but high-benefit solution to a variety of problems confronting the bioinformatics community.
some references 1
Some References (1)

Review

  • Stevens R., Goble C.A. and Bechhofer, S. Ontology-based Knowledge Representation for Bioinformatics accepted for Briefings in Bioinformatics

Bio-ontologies & Systems

  • Karp P. D. An ontology for biological function based on molecularinteractions Bioinformatics 2000;16 269-285
  • Ashburner et al Gene Ontology: Tool for the Unification of Biology, Nature Genetics Vol 25 pages 25-29
  • R. Altman, M. Bada, X.J. Chai, M. Whirl Carillo R.O. Chen, and N.F. Abernethy. RiboWeb: An Ontology-Based System for Collaborative Molecular Biology. IEEE Intelligent Systems, 14(5):68-76, 1999.
  • P.G. Baker, C.A. Goble, S. Bechhofer, N.W. Paton, R. Stevens, and A Brass. An Ontology for Bioinformatics Applications. Bioinformatics, 15(6):510-520, 1999.
  • R.O. Chen, R. Felciano, and R.B. Altman. RiboWeb: Linking Structural Computations to a Knowledge Base of Published Experimental Data. In Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, pages 84-87. AAAI Press, 1997.
      • Guarino, N. 1992. Concepts, Attributes and Arbitrary Relations: Some Linguistic and Ontological Criteria for Structuring Knowledge Bases. Data & Knowledge Engineering, 8: 249-261.
      • Guarino, N., Carrara, M., and Giaretta, P. 1994a. An Ontology of Meta-Level Categories. In J. Doyle, E. Sandewall and P. Torasso (eds.), Principles of Knowledge Representation and Reasoning: Proceedings of the Fourth International Conference (KR94). Morgan Kaufmann, San Mateo, CA: 270-280.
  • P. Karp and S. Paley Integrated Access to Metabolic and Genomic Data Journal of Computational Biology, 3(1):191--212, 1996.
  • P. Karp, M. Riley, S. Paley, A. Pellegrini-Toole, and M. Krummenacker. EcoCyc: Electronic Encyclopedia of phE. coli Genes and Metabolism. Nucleic Acids Research, 27(1):55-58, 1999.
  • S. Schulze-Kremer. Ontologies for Molecular Biology. In Proceedings of the Third Pacific Symposium on Biocomputing, pages 693-704. AAAI Press, 1998.
  • P.G. Baker, A. Brass, S. Bechhofer, C. Goble, N. Paton, and R. Stevens. TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources. An Overview. In Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology, pages 25--34. AAAI Press, June 28-July 1, 1998 1998.
some references 2
Some References (2)

Ontology development and exchange

  • T.R. Gruber. Towards Principles for the Design of Ontologies Used for Knowledge Sharing. In Roberto Poli Nicola Guarino, editor, International Workshop on Formal Ontology, Padova, Italy, 1993. Available as technical report KSL-93-04, Knowledge Systems Laboratory, Stanford University:ftp.ksl.ftanford.edu/pub/KSL_Reports/KSL-983-04.ps.
more references 3
More References (3)
  • I. Horrocks, D. Fensel, J. Broekstra, M. Crubezy, S. Decker, M. Erdmann, W. Grosso, C. Goble, F. Van Harmelen, M. Klein, M. Musen, S. Staab, and R. Studer. The ontology interchange language oil: The grease between ontologies. http://www.cs.vu.nl/ dieter/oil.
  • R. Jasper and M. Uschold A Framework for Understanding and Classifying Ontology Applications. In Twelfth Workshop on Knowledge Acquisition Modeling and Management KAW'99, 1999.
  • M. Uschold and M. Gruninger. Ontologies: Principles, Methods and Applications. Knowledge Engineering Review, 11(2), June
  • Guarino, N. and Welty, C. Identity, Unity, and Individuality: Towards a Formal Toolkit for Ontological Analysis, in H.\ Werner (Ed), Proceedings of ECAI-2000: The European Conference on Artificial Intelligence , IOS Press, Amsterdam August, 2000 219--223