1 / 105

Principles for Building Biomedical Ontologies

This resource provides an overview of the principles and challenges involved in constructing biomedical ontologies, with case studies from the National Center for Biomedical Ontology. It covers topics such as ontology definition, organizational challenges, and the role of ontologies in decision making.

levyd
Download Presentation

Principles for Building Biomedical Ontologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor

  2. Mark Musen Suzanna Lewis Barry Smith Sima Misra Daniel Rubin Michael Ashburner Monte Westerfield Ida Sim PI & Core 1: computer science (SMI) Co-PI & Core 2: bioinformatics (BiKR; GO) Core 6: Outreach and training (ECOR) Associate Program Director Program Director Core 3: Phenotype Project (Cambridge; FlyBase; and GO) Core 3: Phenotype Project (UOregon; PI of ZFIN) Core 3: HIV clinical trials Project (UCSF) National Center Biomedical Ontologyhttp://bioontology.org/

  3. BiKRs • Sima Misra • Shu Shengqiang • Christopher J. Mungall • Nomi Harris • John Day-Richter • Karen Eilbeck • Mark Gibson

  4. Outline for the Morning • A definition of “ontology” • Four sessions: • Organizational Challenges • Principles for Ontology Construction • Case Studies from the GO • Case Studies for group discussion.

  5. What data is missing? My newbie questions What I’ve heard • Organism, environment, data quality and attribution • Where is the data generated? • TIGR, Sanger, JGI, and coming soon to a 954 near you! • How will it be gathered? • Still an issue. Low threshold of effort relative to benefits of complying • What is the motivation? • Data it is accumulating on disks across the world and we’d like to be able to locate and use it The hardest part: Sharing (semantics)

  6. Ontologies help with decision making Where should I eat…? handy ontology tells us what’s there…

  7. Type of cuisine (Presumable) country of origin Ontologies don’t just organize data; they also facilitate inference, and that creates new knowledge, often unconsciously in the user.

  8. What a computer would likely infer about the world from this helpful ontology: Fresh Juice is a national cuisine… Flag of fresh juice Where delicatessen food hails from… ‘Frozen Yogurt’ cuisine in search of a national identity?

  9. Ontology is all about meaning • Communities form (scientific) theories • that seek to explain all of the existing evidence • and can be used for prediction • We make inferences and decisions based upon what we know about (biological) reality.

  10. Make our meanings clear enough for a computer to understand • An ontology is a computable representation of this underlying (biological) reality. • An ontology enables a computer to reason over the data in (some of) the ways that we do • particularly to query and locate relevant data. • A shared, common, backbone taxonomy of relevant entities, and the relationships between them, within an application domain. • Referred to by information scientists as an ’Ontology'.

  11. But really… • What is an Ontology? • From Aristotle to Artificial Intelligence • It is ”a formalism of what exists” • Follows formal rules for creating definitions originally laid down by Aristotle. • A definition is: the specification of the essence (nature, invariant structure) shared by all the members of a class or natural kind.

  12. The Aristotelian Methodology • Topmost nodes are the undefinable primitives. • The definition of a class lower down in the hierarchy is provided by specifying the parent of the class together with the relevant differentia. • Differentia tells us what marks out instances of the defined class within the wider parent class as in • Plasma membrane • is acell part [immediate parent] • that surrounds the cytoplasm [differentia]

  13. organism animal cat instances Siamese classes Physical object (substance) mammal leaf class frog all members of the class frog share a froggy nature

  14. Anatomical structures Lung Heart Thorax Cell Cornelius Rosse

  15. Content of FMA Challenge: Duplicate graphical model in symbolic model Universals or classes: Kinds of anatomical entities Adapted from Bloom & Fawcett: Textbook of Histology 1994 12th ed Chapman & Hall

  16. Content of FMA

  17. 1. Organizational Challenges http://obo.sourceforge.net

  18. So you want an ontology… What do you have to do to make/get/use/steal/beg one?

  19. Why Survey Domain covered? Public? Community? Active? Salvage Develop Applied? Improve yes no Collaborate & Learn

  20. What you must do • Justify exactly why there is a need • Scope it very, very tightly • Communicate with people

  21. The decisions you must make • What domain does it cover? • It is privately held? • Is it active? • Is it applied?

  22. Survey Why Domain covered? Public? Community? Active? Salvage Develop Applied? Improve yes no Collaborate & Learn (Listen to Barry)

  23. Due diligence & background research • Step 1: Learn what is out there • The most comprehensive list is on the OBO site. http://obo.sourceforge.net • Assess ontologies critically and realistically. • Make contact

  24. Why Survey Domain covered? Public? Community? Active? Salvage Develop Applied? Improve yes no Collaborate & Learn (Listen to Barry)

  25. Ontologies must be shared • Proprietary ontologies • Belief that ownership of the terminology gives the owners a competitive edge • For example, Incyte or Monsanto in the past, SNOMED for non-US. • Data cannot be shared if the ontologies describing the data are not shared. • Don’t reinvent—Use the power of combination and collaboration

  26. Why Survey Domain covered? Public? Community? Active? Salvage Develop Applied? Improve yes no Collaborate & Learn (Listen to Barry)

  27. Pragmatic assessment of an ontology • Is there access to help, e.g.: help-me@weird.ontology.net ? • Does a warm body answer help mail within a ‘reasonable’ time—say 2 working days ?

  28. Why Survey Domain covered? Public? Community? Active? Salvage Develop Applied? Improve yes no Collaborate & Learn (Listen to Barry)

  29. Use it to improve it • Every ontology improves when it is applied to actual data • It improves even more when these data are used to answer questions • There will be fewer problems in the ontology and more commitment to fixing remaining problems when important research data is involved that scientists depend upon • Be very wary of ontologies that have never been applied

  30. Improve Collaborate and Learn Work with that community • To improve (if you found one) • To develop (if you did not) • Getting it right • It is impossible to get it right the 1st (or 2nd, or 3rd, …) time. • What we know about reality is continually growing

  31. Implication: “prepare for change” • Establish a mechanism for change. • Use CVS or Subversion. • Changes must be reviewed by experts • Unique Identifiers • Versions • Archives

  32. Ontology development is hard • Have a stake in seeing it work. • Have broad, detailed domain knowledge. • Will engage in vigorous debate without engaging egos. • Will do concrete work and attend frequent working sessions (quarterly), phone conferences (weekly), e-mail correspondence (daily).

  33. 2. Principles for Ontology Construction

  34. Why do we need rules for good ontology? • Ontologies must be intelligible • to humans (for annotation) and • to machines (for reasoning and error-checking) • Unintuitive rules for classification lead to entry errors (problematic links) • Facilitate training of curators • Overcome obstacles to alignment with other ontology and terminology systems • Enhance harvesting of content through automatic reasoning systems • Following basic rules makes more useful ontologies

  35. Substance. Quantity. Quality. Relation. Location. Time. Position. Possession. Doing. Undergoing. Aristotle’s categories This is Aristotle’s list of types of predication, that is, the different ways in which things can be said to be. He identifies 10 mutually exclusive categories.

  36. Substance Body Structure Specimen Context-Dependent Categories* Attribute Finding* Staging and Scales Organism Physical Object Events Environments and Geographic Locations Qualifier Value Special Concept* Pharmaceutical and Biological Products Social Context Disease Procedure Physical Force SNOMED-CT Top Level

  37. Examples of Rules • Don’t confuse instances with universals • Your navel (instance) is not the abstract representation of all navels • Your microarray result is not the abstract representation of all microarray results • The meaning of an ontology should not change when the programming language changes

  38. First Rule: Univocity • Terms (including those describing relations) should have the same meanings on every occasion of use. • In other words, they should refer to the same kinds of instances in reality

  39. Example of univocity problem in case of part_of relation (Old) Gene Ontology: • ‘part_of’ = ‘may be part of’ • flagellum part_of cell • ‘part_of’ = ‘is at times part of’ • replication fork part_of the nucleoplasm • ‘part_of’ = ‘is included as a sub-list in’

  40. Second Rule: Positivity • Complements of classes are not themselves classes. • Terms such as ‘non-mammal’, or ‘non-frog’, or ‘non-membrane’ do not designate genuine classes.

  41. Third Rule: Objectivity • Which classes exist is not a function of our biological knowledge. • Terms such as ‘unknown’ or ‘unclassified’ do not designate biological natural kinds.

  42. C is_a2 B is_a1 A Fourth Rule: Single Inheritance • No class in a classificatory hierarchy should have more than one is_a parent on the immediate higher level • I.e. no diamonds

  43. Following the single inheritance rule • The position of a term within the hierarchy enriches its own definition by incorporating automatically the definitions of all the terms above it. • The entire information content of the term hierarchy can be translated very cleanly into a computer representation

  44. B C is_a1 is_a2 A ‘is_a’ no longer univocal Problems with multiple inheritance

  45. Fifth Rule: Clarity of Text Definitions • The terms used in a definition should be simpler (more intelligible) than the term to be defined • otherwise the definition provides no assistance to human understanding • Machines can cope with the full formal representation (it doesn’t need the text)

  46. Sixth Rule: Basis in Reality • When building or maintaining an ontology, always think carefully about how classes (types, kinds, species) relate to instances in reality • Axioms governing instances • Every class has at least one instance (exceptions will occur at top levels) • Each child class has a smaller collection of instances than its parent class

  47. Axiom: Every parent class has at least two children

  48. The reason that rules are important: Interoperability • Ontologies should work together • Avoid redundancy in ontology building • Support reuse • Ontologies should be capable of being used by other ontologies (cumulation)

  49. SNOMED MeSH UMLS NCIT HL7-RIM … None of these have clearly defined relations Still remain too much at the level ofTERMINOLOGY Not based on a common set of rules Not based on a common set of relations The problem of ontology re-use

  50. An example of unclear relationship use • A is_a B • ‘A’ is more specific in meaning than ‘B’ • HL7-RIM: • Individual Allele is_aAct of Observation • cancer documentation is_acancer • disease prevention is_adisease

More Related