1 / 35

Practical Ontologies

Practical Ontologies. Lessons from the GO February 2011. The time was 1998-99. None of the model organism databases used standard terminology to describe biological function Drosophila sequence was imminent Largest genome sequenced at that time

ebony-blake
Download Presentation

Practical Ontologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Practical Ontologies Lessons from the GO February 2011

  2. The time was 1998-99 • None of the model organism databases used standard terminology to describe biological function • Drosophila sequence was imminent • Largest genome sequenced at that time • Two weeks, 3 dozen scientists, all new software • How could we organize the annotation? • microArray technology was the latest research tool, and results needed to be described • AI folk and ontologists organized the first “bio-ontologies” workshop at ISMB

  3. The Gene Ontology—the beginning • A handful of biologists (4) met in a bar in Montreal after the bio-ontologies workshop to share their frustrations and decided to just do it*… • Would demonstrate possibilities for data integration across the MODs (FlyBase, SGD, MGD) • Provided an organizing principle for the Drosophila genome annotation jamboree * i.e. Describe gene products in a biologically meaningful way.

  4. Late summer 1999 AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT AGCTAGGCTCCGGATGCGACCAGCTTTGATAGATGAATATAGTGT GCGCGACTAGCTGTGTGTTGAATATATAGTGTGTCTCTCGATATGT AGTCTGGATCTAGTGTTGGTGTAGATGGAGATCGCGTGCTTGAG TCGTTCGTTTTTTTATGCTGATGATATAAATATATAGTGTTGGTG GGGGGTACTCTACTCTCTCTAGAGAGAGCCTCTCAAAAAAAAAGCT CGGGGATCGGGTTCGAAGAAGTGAGATGTACGCGCTAGXTAGTAT ATCTCTTTCTCTGTCGTGCTGCTTGAGATCGTTCGTTTTTTTATGCT GATGATATAAATATATAGTGTTGGTGGGGGGTACTCTACTCTCTCT AGAGAGAGCCTCTCAAAAAAAAAGCTCGGGGATCGGGTTCGAAGA AGTGAGATGTACGCGCTAGXTAGTATATCTCTTTCTCTGTCGTGCT

  5. reads sequence assemble analysis Mountains of data Tentative function filtering Love-at-first-sight ‘GO’ directories Piles of data converging Functional knowns First-pass predictions

  6. The Gene Ontology project • Annotated now • The importance of stress-testing • Don’t delay, use your ontology today • Do no harm (KISS) • i.e. Target the low hanging fruit, work on the obvious, high-confidence steps • Collaborate on concrete projects • Focusing the mind

  7. Annotations • Have 3 primary components • The ontology term(s) • The entity instance (e.g. gene product) • The evidence for that assertion • An annotation is an evidence-based assertion which indicates that this entity is best classified/described by this term(s)

  8. Identify genes Read paper(s) SPCC622.16c PMID:17449867 SPCC622.16c GO:0005720 IDA PMID:17449867 IDA Identify GO terms What type of evidence? Identify GO terms associated with each gene GO:0005720

  9. Classification rule: Disambiguation = bud initiation = bud initiation = bud initiation The same name can be used to describe different things.

  10. Classification rule: Disambiguation = toothbud initiation = cellularbud initiation = flowerbud initiation Include plain “bud initiation” as a synonym for each of these terms

  11. Disambiguation • Glucose synthesis • Glucose biosynthesis • Glucose formation • Glucose anabolism • Gluconeogenesis Exactly the same thing can be described with different terms • Comparison is difficult, especially across species or across databases that each use one of these different variants • Use a single term, and plenty of synonyms

  12. Annotation for a healthy ontology • Easier to find the most accurate term(s) to use • Avoids annotation errors • Easier for new curators to learn and understand • Develop annotation guidelines and training material • Enables automatic reasoning for searching & inference • Bottom line: • Following basic construction rules makes more useful ontologies

  13. Improvement needed: Closing the loop Typical ontology developer Typical wet lab PI annotating data Doh! I get it now, says the computer.

  14. The Gene Ontology project • Annotated now • The importance of stress-testing • Don’t delay, use your ontology today • Do no harm (KISS) • i.e. Target the low hanging fruit, work on the obvious, high-confidence steps • Collaborate on concrete projects • Focusing the mind

  15. GO in 2000-2008

  16. Filling in annotation gaps GO:0016301 kinase activity July 2008 GO:0016310 phosphorylation 2230 3823 1410 |P| = 3640 |F| = 6053 |F ∩ P| = 2230 |F ∩ not P| = 3823

  17. part_of

  18. part_of annotations propagate over part_of KIC1 IDA

  19. part_of annotations propagate over part_of KIC1 IDA

  20. part_of annotations propagate over part_of NDK1 IDA

  21. part_of annotations propagate over part_of NDK1 IDA

  22. Filling in annotation gaps GO:0016310 phosphorylation GO:0016301 kinase activity 2009

  23. The H word—2011 time divergence • Characters in common are due to inheritance • Allows inferences about common ancestor

  24. Evolution of MSH2 subfamilybiological process Somatic hypermutation of immunoglobulin genes Apoptosis Maintenance of DNA repeats Homologous recombination DNA repair

  25. Ancestral inference E.c. Biochemistry: purification and assay A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 S.p. S.c. MET12 C.e. D.m. A.g. D.r. G.g. H.s. MTHFR R.n. M.m. Genetics: mutant phenotypes divergence • Integration at points of common ancestry • Infer “hidden” character of living organisms • Explicitly leverage evolutionary relationships

  26. Integrating different GO annotations PAINT Phylogenetic Annotation and Inference Tool

  27. The Gene Ontology project • Annotated now • The importance of stress-testing • Don’t delay, use your ontology today • Do no harm (KISS) • i.e. Target the low hanging fruit, work on the obvious, high-confidence steps • Collaborate on concrete projects • Focusing the mind

  28. Scoping 2009 • The ontology has a clearly specified and clearly delineated content. SGD MGD FlyBase GO

  29. Decisions to make the work easier • Provide definitions for everything • Intelligible ontologies are more useful • To humans (for annotation) and • To machines (for searching, reasoning and error-checking) • Use content-free unique identifiers • Drive all semantics away from tracking • Don’t confuse the representational technology with the conceptual modeling

  30. Implicit ontologies within the GO: • cysteine biosynthesis (ChEBI) • myoblast fusion (Cell Type Ontology) • hydrogen ion transporter activity (ChEBI) • snoRNA catabolism (Sequence Ontology) • wing disc pattern formation (Drosophila anatomy) • epidermal cell differentiation (Cell Type Ontology) • regulation of flower development (Plant anatomy) • B-cell differentiation (Cell Type Ontology)

  31. Implicit anatomy ontology within the GO: GO brain development hindbrain development metencephalon development pons development trigeminal motor nucleus development

  32. of is bearer of has part Alpha-Synuclein Mouse number Lewy body Substantia nigra Ischemic Mouse is bearer of number of Condensed Mitochondrion Condensed Mitochondrion Nucleus Golgi Apparatus Condensed Mitochondrion Lysosome Condensed Mitochondrion Dark Material Orthodox Mitochondrion

  33. Common Interest • Sociology—to enlist the community, the ontology must meet each individual group’s immediate needs. • Too many people => Too many requirements • Outstanding problems • Closing the loop between ontology construction and ontology application • QC improvements • Prioritizing tasks • Visualization • …

  34. A cast of thousands

More Related