1 / 64

The Gene Ontology project and its application to fission yeast functional genomics data

The Gene Ontology project and its application to fission yeast functional genomics data. Valerie Wood. Introduction to the Gene Ontology (GO) project. What is GO? (requirement, implementation). How does it work? (annotation and ontology development).

Download Presentation

The Gene Ontology project and its application to fission yeast functional genomics data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Gene Ontology project and its application to fission yeast functional genomics data Valerie Wood

  2. Introduction to the Gene Ontology (GO) project • What is GO? (requirement, implementation) • How does it work? (annotation and ontology development) • What can I use it for? (applications) • How can I use it? Practical exercises • Tools for using GO for data analysis • Data mining the fission yeast genome data

  3. Gene Ontology Why?

  4. Gene 1 mRNA export protein phosphorylation transcription mitotic cell cycle … Gene 2 mRNA export DNA recombination RNA elongation (pol II) … Traditional analysis • requires literature searching • gene by gene basis • time-consuming

  5. Gene 1 mRNA export protein phosphorylation transcription mitotic cell cycle … Gene 2 mRNA export DNA recombination RNA elongation (pol II) … Not scalable! Gene 3 mRNA export transcription (pol II) … Gene 4 mRNA export transcription polyadenylation … Gene 5 mRNA export RNA elongation … Gene 6 mRNA export rRNA transcription DNA topological change … Gene 5000 cell cycle chromosome segregation kinetochore assembly protein localization …

  6. Help! The problem gets bigger and bigger and bigger! http://www.teamtechnology.co.uk/f-scientist.jpg

  7. What is the size of the ‘annotation problem’? Fission yeast + pombe gives 8170 results Including cell cycle gives 3467 The literature corpus Including DNA repair gives 555 How will we ever extract all of this information?

  8. Grouping by process Cell cycle Gene 1 Gene 7 Gene 8 … transcription Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 .. mRNA export Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 protein phosphorylation Gene 1 Gene 7 Gene 10 … cell wall organization and biogenesis Gene 10 Gene 15 Gene 18 …

  9. time Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Puparial adhesion Molting cycle hemocyanin Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Immune response Immune response Toll regulated genes control attacked GO can be used to spot patterns in thousands of genes typically obtained by functional genomics data Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.

  10. A controlled vocabulary GO is also necessary for handling different terminology used between and within scientific communities: • Different phrases have the same or related meanings • The same phrase is used to describe different ‘entities’

  11. late endosome to vacuole transport MVB sorting multivesicular body sorting late endosome to vacuole transport ; GO:0045324

  12. Bud initiation?

  13. Bud initiation? tooth bud initiation cellular bud initiation flower bud initiation

  14. So what is GO ? GO provides a “controlled vocabulary” for biological knowledge that can be interpreted identically both within and between genomes Species independent, therefore enabling cross species comparisons Provides a way to capture and represent biological knowledge in a computable form

  15. Gene Ontology Content and structure

  16. What is Ontology? • Dictionary: A branch of metaphysics concerned with the nature and relations of being. • In philosophy, the most fundamental branch of metaphysics. It studies being or existence as well as the basic categories thereof—trying to find out what entities and what types of entities exist. – Wikipedia 1606 1700s

  17. So what does that mean? From a practical view, ontology is the representation of something we know about. “Ontologies" consist of a representation of things, that are detectable or directly observable, and the relationships between those things. Ontologies provide controlled, consistent vocabularies to describe concepts and relationships, thereby enabling knowledge sharing Gruber 1993 is part of

  18. Ontology Includes: • A vocabulary of terms (names for concepts) • Definitions • Defined logical relationships to each other

  19. What information might we want to capture about a gene product? • GO divided into three parts: • What does the gene product do? • Where and when does it act? • Why does it perform these activities? molecular function cellular component biological process

  20. Cellular Component • where a gene product acts (location or complex) Images from http://microscopy.fsu.edu

  21. insulin binding insulin receptor activity drug transporter activity glucose-6-phosphate isomerase activity Molecular Function • What a gene product does (activity)

  22. transcription cell division gluconeogenesis Biological Process Broad objective or goal

  23. Analogy: Gene Product = hammer Function (what) Process (why) Drive stake (into soil) Gardening Drive nail (into wood) Carpentry Smash roach Pest Control

  24. Ontology Structure • The Gene Ontology is structured as a directed acyclic graph (DAG) • A DAG is similar to a hierarchy except terms can have more than one parent • Terms can have zero, one or more children • Terms are linked by two relationships • is-a • part-of

  25. DAG: Directed Acyclic Graph Heirarchy Many-to-many parental relationship One-to-many parental relationship Each child may have one or more parents Each child has only one parent Parent-Child Relationships

  26. cell membrane chloroplast mitochondrial chloroplast membrane membrane is-a part-of Ontology Structure

  27. Ontology structure • This allows the modelling of biology more realistically than a hierarchy

  28. gene A Ontology structure An important feature of GO is that broader parents give rise to more specific children.When a gene is annotated to a term, it is automatically annotated to all of its parent terms Allows curators to assign terms at different levels of granularity, depending what is known or can be inferred

  29. True Path Rule • Every path from any term back to its top-level parent(s) must always be true (biologically accurate), or the ontology must be revised cell • cytoplasm • chromosome • nuclear chromosome • cytoplasmic chromosome • mitochondrial chromosome • nucleus • nuclear chromosome • is-a • part-of

  30. Anatomy of a GO term unique GO ID id: GO:0006094 name: gluconeogenesis namespace: process def: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol. exact_synonym: glucose biosynthesis synonym http://cancerweb.ncl.ac.uk/ def source is_a: GO:0006006 is_a: GO:0006092 term name ontology definition parentage

  31. No GO Areas • GO covers ‘normal’ functions and processes • No pathological processes • No experimental conditions • NO evolutionary relationships • A function term refers to a reaction or activity, NOT a gene product • NOT a system of nomenclature for genes

  32. So how does the GO annotation happen?

  33. For each gene ******* GO:******* IDA PMID:******* IDA Read and record paper ***** PMID: ***** Identify GO terms What type of evidence? GO:****

  34. Submit to the GO Consortium

  35. Annotation appears in GO database

  36. Who uses GO?

  37. http://www.geneontology.org

  38. Pias3 Pias4 Pias2 ATSIZ1 MGI TAIR Miz1 RGD Pias3 Pias4 GeneDB S.pombe SGD CST9 pli1 pli1 NFI1 MMS21 nse2 SIZ1 many groups annotate, we see the results of research across species GO:0019789 SUMO ligase activity

  39. Fission yeast GO annotation status

  40. MolecularFunction: Acetyl-CoA CoA-SH Citrate synthase Biological Process: TCA Cycle 7519 Cellular Component: 9459 13494 Fission yeast annotation progress Total 30,616 annotations to 3080 terms Data from 06/06/07

  41. Evidence Codes used 8618 IDA inferred from direct assay 776 IPI inferred from physical interaction 901 IGI inferred from genetic interaction 1089 TAS traceable author statement 1073 IC inferred by curator 9045 ISS inferred from sequence similarity 1912 IMP inferred from mutant phenotype 522 NAS non-traceable author statement 6397 IEA from electronic annotation 30333

  42. GO Curation Strategy Manual Curation • Emphasis on Primary Literature (IDA, IMP, IGI, IPI) • Manual inspection of sequence similarity (ISS) Computational Mappings (IEA) • InterPro (domain or family) to GO • UniProt (Swissprot keyword to GO) • E.C. number to GO 1617 PMIDs 15230 annotations 9569 annotations 5815 annotations Data from 06/06/07

  43. GO Curation Progress pombe manual pombe electronic pombe total cerevisiae total Total 30,616 annotations to 3080 GO terms S. cerevisiae has 27662 annotations to 2971 GO term (no IEA) Data from 06/06/07

  44. Function 3542 (includes protein binding) 993 Biological Process 4019 Cellular Component 4821 GO aspect coverage 18 191 54 3279 (3455) 679 672 14 Total 5004 (5780 S. cerevisiae) All three aspects unknown 105 (564 S. cerevisiae)

  45. A gene product can have several functions, cellular locations and be involved in many processes • Groups of functions make up a biological process • Annotation of a gene product to one ontology is independent from its annotation to other ontologies • Genes with ‘no data’ are annotated to the ‘root node’

  46. Developing GO

  47. Developing GO Adding new terms and biological concepts to the Gene Ontology • GO under constant development • International group of developers (all the major model organism databases contribute) • central editorial office at EBI - 4 members • Developed in consultation with domain experts • Term suggestions handled through online tracking system

  48. Why GO changes • Advances in biology • New organisms join, need new terms • Fix errors and legacy terms • Improve logical consistency • Suggestions for changes come from • the GO editors and organism curators • the user community • Analysis of logical consistency

  49. flybase SGD SGD MGI

More Related