1 / 89

Principles for Building Biomedical Ontologies

Principles for Building Biomedical Ontologies . Barry Smith. Computers are tools for scientists. this fact does not mean that the sciences themselves have new kinds of objects (data, information) bio-ontologies are about genes, cells, organisms not about terms, symbols, concepts, data.

sabin
Download Presentation

Principles for Building Biomedical Ontologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principles for Building Biomedical Ontologies Barry Smith

  2. Computers are tools for scientists • this fact does not mean that the sciences themselves have new kinds of objects (data, information) • bio-ontologies are about genes, cells, organisms • not about terms, symbols, concepts, data

  3. Overview • Following basic rules helps make better ontologies • We will work through the principles-based treatment of relations in ontologies, to show how ontologies can become more reliable and more powerful

  4. Why do we need rules for good ontology? • Ontologies must be intelligible both to humans (for annotation) and to machines (for reasoning and error-checking) • Unintuitive rules for typeification lead to entry errors (problematic links) • Facilitate training of curators • Overcome obstacles to alignment with other ontology and terminology systems • Enhance harvesting of content through automatic reasoning systems

  5. First Rule: Univocity • Terms (including those describing relations) should have the same meanings on every occasion of use. • In other words, they should refer to the same kinds of entities in reality

  6. MedDRA • a cold • cold (vs. hot) • C.O.L.D. (Chronic-Obstructive-Lung-Disease) code with ‘C.O.L.D.’ or call to check

  7. Second Rule: Positivity • Complements of types are not themselves types. • Terms such as ‘non-mammal’ or ‘non-membrane’ do not designate genuine types.

  8. Third Rule: Objectivity • Which types exist is not a function of our biological knowledge. • Terms such as ‘unknown’ or ‘untypeified’ or ‘unlocalized’ do not designate biological natural kinds.

  9. Fourth Rule: Single Inheritance No type in a typeificatory hierarchy should have more than one is_a parent on the immediate higher level

  10. Rule of Single Inheritance • no diamonds: C is_a2 B is_a1 A

  11. Problems with multiple inheritance B C is_a1 is_a2 A ‘is_a’ no longer univocal

  12. ‘is_a’ is pressed into service to mean a variety of different things • shortfalls from single inheritance are often clues to incorrect entry of terms and relations • the resulting ambiguities make the rules for correct entry difficult to communicate to human curators

  13. is_a Overloading • serves as obstacle to integration with neighboring ontologies • The success of ontology alignment depends crucially on the degree to which basic ontological relations such as is_a and part_of can be relied on as having the same meanings in the different ontologies to be aligned.

  14. Use of multiple inheritance • The resultant mélange makes coherent integration across ontologies achievable (at best) only under the guidance of human beings with relevant biological knowledge • How much should reasoning systems be forced to rely on human guidance?

  15. Fifth Rule: Intelligibility of Terms and Definitions • Terms should be intelligible • ‘apoptosis inhibitor activity’ is a function in GO • relations between function and the processes they enable become very difficult to state unless function terms designate functions in an intelligible way • structural constituent of tooth enamel

  16. extracellular matrix structural constituent • puparial glue (sensu Diptera) • structural constituent of bone • structural constituent of chorion (sensu Insecta) • structural constituent of chromatin • structural constituent of cuticle • structural constituent of cytoskeleton • structural constituent of epidermis • structural constituent of eye lens • structural constituent of muscle • structural constituent of myelin sheath • structural constituent of nuclear pore • structural constituent of peritrophic membrane (sensu Insecta) • structural constituent of ribosome – note possibility of confusion with ‘major ribosome unit’ (check) • structural constituent of tooth enamel • structural constituent of vitelline membrane (sensu Insecta)

  17. Fifth Rule: Intelligibility of Terms and Definitions • The terms used in a definition should be simpler (more intelligible) than the term to be defined • otherwise the definition provides no assistance • to human understanding • for machine processing

  18. To the degree that the above rules are not satisfied, error checking and ontology alignment will be achievable, at best, only with human intervention and via brute force

  19. Some rules are Rules of Thumb • The world of biomedical research is a world of difficult trade-offs • The benefits of formal (logical and ontological) rigor need to be balanced • Against the constraints of computer tractability, • Against the needs of biomedical practitioners. • BUT alignment and integration of biomedical information resources will be achieved only to the degree that such resources conform to these standard principles of typeification and definition

  20. Definitions should be intelligible to both machines and humans • Machines can cope with the full formal representation • Humans need to use modularity • Plasma membrane • is acell part [immediate parent] • that surrounds the cytoplasm [differentia]

  21. Terms and relations should have clear definitions • These tell us how the ontology relates to the world of biological instances, meaning the actual particulars in reality: • actual cells, actual portions of cytoplasm, and so on…

  22. Sixth Rule: Basis in Reality • When building or maintaining an ontology, always think carefully at how types (types, kinds, species) relate to instances in reality

  23. Axioms governing instances • Every type has at least one instance • Every genus (parent type) has an instantiated species (differentia + genus) • Each species (child type) has a smaller type of instances than its genus (parent type)

  24. Axioms governing Instances • Distinct types on the same level never share instances • Distinct leaf types within a typeification never share instances

  25. substance organism animal cat instances siamese species, genera mammal leaf type frog

  26. Interoperability • Ontologies should work together • ways should be found to avoid redundancy in ontology building and to support reuse • ontologies should be capable of being used by other ontologies (cumulation)

  27. Main obstacle to integration • Current ontologies do not deal well with • Time and • Space and • Instances (particulars) • Our definitions should link the terms in the ontology to instances in spatio-temporal reality

  28. Benefits of well-defined relationships • If the relations in an ontology are well-defined, then reasoning can cascade from one relational assertion (A R1 B) to the next (B R2 C). Relations used in ontologies thus far have not been well defined in this sense. • Find all DNA binding proteins should also find all transcription factor proteins because • Transcription factor is_a DNA binding protein

  29. How to define A is_a B A is_a B =def. • A and B are names of types (natural kinds, universals) in reality • all instances of A are as a matter of biological science also instances of B

  30. Biomedical ontology integration / interoperability • Will never be achieved through integration of meanings or concepts • The problem is precisely that different user communities use different concepts • What’s really needed is to have well-defined commonly used relationships

  31. Idea: • Move from associative relations between meanings to strictly defined relations between the entities themselves. • The relations can then be used computationally in the way required

  32. Key idea:To define ontological relations • For example: part_of, develops_from • Definitions will enable computation • It is not enough to look just at types or types. • We need also to take account of instances and time

  33. Kinds of relations • Between types: • is_a, part_of, ... • Between an instance and a type • this explosion instance_of the type explosion • Between instances: • Mary’s heart part_of Mary

  34. Seventh Rule: Distinguish types and Instances • A good ontology must distinguish clearly between • types (universals, kinds, species) and • instances (tokens, individuals, particulars)

  35. Don’t forget instances when defining relations • part_of as a relation between types versus part_of as a relation between instances • nucleus part_of cell • your heart part_ofyou

  36. Part_of as a relation between types is more problematic than is standardly supposed • testis part_of human being ? • heart part_of human being ? • human being has_part human testis ?

  37. Why distinguish types from instances? • What holds on the level of instances may not hold on the level of types • nucleus adjacent_to cytoplasm • Not: cytoplasm adjacent_to nucleus • seminal vesicle adjacent_to urinary bladder • Not: urinary bladderadjacent_to seminal vesicle

  38. part_of • part_of must be time-indexed for spatial types • A part_of B is defined as: Given any instance a and any time t, If a is an instance of the type A at t, then there is some instance b of the type B such that a is an instance-level part_of b at t

  39. instances derives_from (ovum, sperm  zygote ... ) C1 c1att1 C c att time C' c' att

  40. same instance C1 C c att c att1 time transformation_of pre-RNA  mature RNAchild  adult

  41. transformation_of • C2 transformation_of C1 =def. any instance of C2 was at some earlier time an instance of C1

  42. C1 C c att c att1 embryological development

  43. tumor development C1 C c att c att1

  44. Time menopause part_of aging aging part_of death ---------------------------------------- menopause part_of death

  45. The simple, formal details “Relations in Biomedical Ontologies” Genome Biology, 2005, 6 (5)

  46. Principles for Building Biomedical Ontologies:A GO Perspective David Hill Mouse Genome Informatics The Jackson Laoratory

  47. How has GO dealt with some specific aspects of ontology development? • Univocity • Positivity • Objectivity • Single Inheritance • Definitions • Formal definitions • Written definitions • Basis in Reality • Universals & Instances • Ontology Alignment

  48. The Challenge of Univocity:People call the same thing by different names Taction Tactition Tactile sense ?

  49. Univocity: GO uses 1 term and many characterized synonyms Taction Tactition Tactile sense perception of touch ; GO:0050975

  50. = bud initiation = bud initiation = bud initiation The Challenge of Univocity: People use the same words to describe different things

More Related