1 / 73

Principles for Building Biomedical Ontologies: A GO Perspective

Principles for Building Biomedical Ontologies: A GO Perspective. David Hill Mouse Genome Informatics The Jackson Laoratory. How has GO dealt with some specific aspects of ontology development?. Univocity Positivity Objectivity Single Inheritance Definitions Formal definitions

brynn-haney
Download Presentation

Principles for Building Biomedical Ontologies: A GO Perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principles for Building Biomedical Ontologies:A GO Perspective David Hill Mouse Genome Informatics The Jackson Laoratory

  2. How has GO dealt with some specific aspects of ontology development? • Univocity • Positivity • Objectivity • Single Inheritance • Definitions • Formal definitions • Written definitions • Basis in Reality • Universals & Instances • Ontology Alignment

  3. The Challenge of Univocity:People call the same thing by different names Taction Tactition Tactile sense ?

  4. Univocity: GO uses 1 term and many characterized synonyms Taction Tactition Tactile sense perception of touch ; GO:0050975

  5. = bud initiation = bud initiation = bud initiation The Challenge of Univocity: People use the same words to describe different things

  6. Bud initiation? How is a computer to know?

  7. = bud initiation sensu Metazoa = bud initiation sensu Saccharomyces = bud initiation sensu Viridiplantae Univocity: GO adds “sensu” descriptors to discriminate among organisms

  8. The Importance of synonyms for utility:How do we represent the function of tRNA? • Biologically, what does the tRNA do? • Identifies the codon and inserts the amino acid in the growing polypeptide Molecular_function Triplet_codon amino acid adaptor activity GO Definition: Mediates the insertion of an amino acid at the correct point in the sequence of a nascent polypeptide chain during protein synthesis. Synonym: tRNA

  9. The Challenge of Positivity Some organelles are membrane-bound. A centrosome is not a membrane bound organelle, but it still may be considered an organelle.

  10. The Challenge of Positivity: Sometimes absence is a distinction in a Biologist’s mind non-membrane-bound organelle GO:0043228 membrane-bound organelle GO:0043227

  11. Positivity • Note the logical difference between • “non-membrane-bound organelle” and • “not a membrane-bound organelle” • The latter includes everything that is not a membrane bound organelle!

  12. The Challenge of Objectivity: Database users want to know if we don’t know anything (Exhaustiveness with respect to knowledge) We don’t know anything about the ligand that binds this type of GPCR We don’t know anything about a gene product with respect to these

  13. Objectivity • How can we use GO to annotate gene products when we know that we don’t have any information about them? • Currently GO has terms in each ontology to describe unknown • An alternative might be to annotate genes to root nodes and use an evidence code to describe that we have no data. • Similar strategies could be used for things like receptors where the ligand is unknown.

  14. GPCRs with unknown ligands We could annotate to this

  15. Single Inheritance • GO has a lot of is_a diamonds • Some are due to incompleteness of the graph • Some are due to a mixture of dissimilar classes within the graph at the same level

  16. Is_a diamond in GO Process behavior locomotory behavior larval behavior larval locomotory behavior

  17. Is_a diamond in GO Function enzyme regulator activity enzyme activator activity GTPase regulator activity GTPase activator acivity

  18. Is_a diamond in GO Cellular Component organelle intracellular organelle non-membrane bound organelle non-membrane bound intracellular organelle

  19. Technically the diamonds are correct, but could be eliminated locomotory behavior larval behavior GTPase regulator activity enzyme activator activity non-membrane bound organelle intracellular organelle What do these pairs have in common?

  20. What do the middle pair of terms all have in common? locomotory behavior larval behavior GTPase regulator activity enzyme activator activity non-membrane bound organelle intracellular organelle

  21. They are all differentiated from the parent term by a different factor locomotory behavior larval behavior Type of behavior vs. what is behaving GTPase regulator activity enzyme activator activity What is regulated vs. type of regulator non-membrane bound organelle intracellular organelle Type of organelle vs. location of organelle

  22. Insert an intermediate grouping term behavior behavior of a thing descriptive behavior locomotory behavior larval behavior larval locomotory behavior

  23. locomotory behavior larval behavior rhythmic behavior adult behavior Why insert terms that no one would use? behavior By the structure of this graph, locomotory behavior has the same relationship to larval behavior as to rhythmic behavior

  24. locomotory behavior larval behavior rhythmic behavior adult behavior Why insert terms that no one would use? behavior Behavior of a thing Descriptive behavior But actually, locomotory behavior/rhythmic behavior and larval behavior/adult behavior group naturally

  25. Is_a diamond in GO Process behavior locomotory behavior larval behavior larval locomotory behavior The realtionships differentiate behavior in different ways

  26. GO Definitions A definition written by a biologist: necessary & sufficient conditions written definition (not computable) Graph structure: necessary conditions formal (computable)

  27. Relationships and definitions • The set of necessary conditions is determined by the graph • This can be considered a partial definition • Important considerations: • Placement in the graph- selecting parents • Appropriate relationships to different parents • True path violation

  28. Placement in the graph • Example- Proteasome complex

  29. The importance of relationships • Cyclin dependent protein kinase • Complex has a catalytic and a regulatory subunit • How do we represent these activities (function) in the ontology? • Do we need a new relationship type (regulates)? Molecular_function Catalytic activity Enzyme regulator activity protein kinase activity Protein kinase regulator activity protein Ser/Thr kinase activity Cyclin dependent protein kinase activity Cyclin dependent protein kinase regulator activity

  30. True path violationWhat is it? ..”the pathway from a child term all the way up to its top-level parent(s) must always be true". nucleus Part_of relationship chromosome Is_a relationship Mitochondrial chromosome

  31. True path violationWhat is it? ..”the pathway from a child term all the way up to its top-level parent(s) must always be true". nucleus chromosome Part_of relationship Is_a relationships Nuclear chromosome Mitochondrial chromosome

  32. GO textual definitions: Related GO terms have similarly structured (normalized) definitions

  33. Structured definitions contain both genus and differentiae Essence = Genus + Differentiae neuron cell differentiation = Genus: differentiation (processes whereby a relatively unspecialized cell acquires the specialized features of..) Differentiae: acquires features of a neuron

  34. Basis in Reality • GO is designed by a consortium • As long as egos don’t get in the way, GO represents universals rather than concepts • Large-scale developments of the GO are a result of compromise • Gene Annotators have a large say in GO content • Annotators are experts in their fields • Annotators constantly read the scientific literature

  35. cone cell fate commitment retinal_cone_cell Ontology alignmentOne of the current goals of GO is to align: Cell Types in GO Cell Types in the Cell Ontology with • keratinocyte • keratinocyte differentiation • fat_cell • adipocyte differentiation • dendritic_cell • dendritic cell activation • lymphocyte • lymphocyte proliferation • T_lymphocyte • T-cell homeostasis • garland_cell • garland cell differentiation • heterocyst • heterocyst cell differentiation

  36. id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629] is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 Alignment of the Two Ontologies will permit the generation of consistent and complete definitions GO + Cell type = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix. New Definition

  37. Alignment of the Two Ontologies will permit the generation of consistent and complete definitions id: GO:0001649 name: osteoblast differentiation synonym: osteoblast cell differentiation genus: differentiation GO:0030154 (differentiation) differentium: acquires_features_of CL:0000062 (osteoblast) definition (text): Processes whereby a relatively unspecialized cell acquires the specialized features of an osteoblast, the mesodermal cell that gives rise to bone Formal definitions with necessary and sufficient conditions, in both human readable and computer readable forms

  38. Other Ontologies that can be aligned with GO • Chemical ontologies • 3,4-dihydroxy-2-butanone-4-phosphate synthase activity • Anatomy ontologies • metanephros development • GO itself • mitochondrial inner membrane peptidase activity

  39. But Eventually…

  40. But, what about instances? What are the instances we are dealing with in our work as ontology builders and scientific curators?

  41. What knowledge are we trying to capture? We are interested in understanding how genes contribute to the biology of an organism.

  42. What do we mean by gene product? • Gene Product Type • An abstract representation of a gene • These are the representations we have in MODs • Gene Product Instance • A molecule of a gene product • It can be physically isolated • It takes up space

  43. How do wet-bench biologists learn about gene products? They do experiments! Experiments are designed to study the properties of gene product instances. Experimental biologists take on “The Burden of Proof”.

  44. How do we represent the accumulated knowledge We make annotations! Annotations connect what wet-bench biologists see in the lab with how we represent our understanding of biology

  45. So, where are the instances? The instances are in the lab. We use what people report about instances, but we never actually deal with them directly

  46. Examples of how we connect instances with knowledge representation in the GO What follows are examples of annotation of the biomedical literature using GO types, gene product types and evidence codes

  47. Example #1:Molecular Function using IDA Figure from Zhang M, Chen W, Smith SM, Napoli JL. Molecular characterization of a mouse short chain dehydrogenase/reductase active with all-trans-retinol in intact cells, mRDH1. J Biol Chem. 2001 Nov 23;276(47):44083-90.

  48. NADH H+ The Observation The Annotation: NAD+

  49. What are the instances in this experiment? • Gene product instances • Molecules of retinol dehydrogenase • Molecular function instances • Instances of execution of the molecular function revealed by the assay • Instances of molecular function associated with instances of retinol dehydrogenase. These instances are the potential of a molecule of retinol dehydrogenase to execute the function retinol dehydrogenase activity.

  50. Example #2:Molecular Function using IMP Figure from Schulz S, Lopez MJ, Kuhn M, Garbers DL. Disruption of the guanylyl cyclase-C gene leads to a paradoxical phenotype of viable but heat-stable enterotoxin-resistant mice. J Clin Invest. 1997 Sep 15;100(6):1590-5.

More Related