1 / 78

Introduction to the Gene Ontology and GO Annotation Resources

Introduction to the Gene Ontology and GO Annotation Resources. EBI Bioinformatics Roadshow 13 th June 2012 Rotterdam , Netherlands Duncan Legge.

caroun
Download Presentation

Introduction to the Gene Ontology and GO Annotation Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to the Gene Ontology and GO Annotation Resources EBI Bioinformatics Roadshow13th June 2012Rotterdam, Netherlands Duncan Legge

  2. OUTLINE OF TUTORIAL:PART I: Ontologies and the Gene Ontology (GO)PART II: GO Annotations How to access GO annotations How scientists use GO annotations

  3. PART I: Gene Ontology

  4. What does an ontology provide? 1. Consistent terminology – controlled vocabulary. 2. Relationships between terms – hierarchy.

  5. Controlled vocabulary Q: What is a cell? A: It really depends who you ask!

  6. Different things can be described by the same name

  7. The same thing can be described by different names: • Glucose synthesis • Glucose biosynthesis • Glucose formation • Glucose anabolism • Gluconeogenesis

  8. Inconsistency in naming of biological concepts • Same name for differentconcepts • Different names for the same concept • Comparison is difficult – in particular across species or across databases • Just one reason why the Gene Ontology (GO) is is needed…

  9. Why do we need GO? • Inconsistency in naming of biological concepts • Increasing amounts of biological data available • Large datasets need to be interpreted quickly • Increasing amounts of biological data to come

  10. Increasing amounts of biological data available Search on mesoderm development…. you get 9441 results! Expansion of sequence information

  11. 1700s 1606 What is an ontology? • Dictionary: • A branch of metaphysics concerned with the nature and relations of being (philosophy) • A formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts (computer science) • Barry Smith: • The science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality.

  12. What is an ontology? • More usefully: • An ontology is the representation of something we know about. “Ontologies" consist of a representation of things, that are detectable or directly observable, and the relationships between those things. is part of

  13. What’s in an Ontology?

  14. What is the Gene Ontology (GO)? A way to capture biological knowledge in a written and computable form Describes attributes of gene products (RNA and protein)

  15. E. Coli hub http://www.geneontology.org Reactome

  16. The scope of GO • What information might we want to capture about a gene product? • What does the gene product do? • Where does it act? • How does it act?

  17. Biological Processwhat does a gene product do? A commonly recognised series of events transcription cell division

  18. Cellular Componentwhere is a gene product located? • plasma membrane • mitochondrion • mitochondrial membrane • mitochondrial matrix • mitochondrial lumen • ribosome • large ribosomal subunit • small ribosomal subunit

  19. Molecular Functionhow does a gene product act? • insulin binding • insulin receptor activity • glucose-6-phosphate isomerase activity

  20. Three separate ontologies or one large one? • GO was originally three completely independent hierarchies, with no relationships between them • As of 2009, GO have started making relationships between biological process and molecular function in the live ontology

  21. Process Function art of s a Function

  22. GO IS: • species independent • covers normal processes • GO is NOT: • NO pathological/disease processes • NO experimental conditions • NO evolutionary relationships • NOT a nomenclature system

  23. Aims of the GO project • Edit the ontologies • Annotate gene products using ontology terms • Provide a public resource of data and tools

  24. Anatomy of a GO term Unique identifier Term name Definition Synonyms Cross-references

  25. Ontology structure Less specific node • Nodes = terms in the ontology • Edges = relationships between the concepts node edge More specific node node • GO is structured as a hierarchical directed acyclic graph (DAG) • Terms can have more than one parent and zero, one or more children • Terms are linked by reationships, which add to the meaning of the term

  26. Relationships between GO terms • is_a • part_of • regulates • positively regulates • negatively regulates • has_part

  27. is_a • If Ais aB, then Ais a subtype of B • mitotic cell cycle is acell cycle • lyase activity is acatalytic activity. • Transitive relationship: can infer up the graph

  28. part_of • Necessarily part of • WhereverB exists, it is as part ofA. But not all B is part of A. • Transitive relationship (can infer up the graph) A B

  29. regulates • One process directly affects another process or quality • Necessarily regulates: if both A and B are present, B always regulatesA, but A may not always be regulated byB A B

  30. has_part • Relationships are upside down compared to is_a and part_of • Necessarily has part

  31. is_a complete • For all terms in the ontology, you have to be able to reach the root through a complete path of is_a relationships: • we call this being is_a complete • important for reasoning over the ontology, and ontology development

  32. True path rule • Child terms inherit the meaning of all their parent terms.

  33. How is GO maintained? • GO editors and annotators work with experts to remodel specific areas of the ontology • Signaling • Kidney development • Transcription • Pathogenesis • Cell cycle • Deal with requests from the community • database curators, researchers, software developers • Some simple requests can be dealt with automatically • GO Consortium meetings for large changes • Mailing lists, conference calls, content workshops

  34. Requesting changes to the ontology • Public Source Forge (SF) tracker for term related issues https://sourceforge.net/projects/geneontology/

  35. Why modify the GO? • GO reflects current knowledge of biology • Information from new organisms can make existing terms and arrangements incorrect • Not everything perfect from the outset • Improving definitions • Adding in synonyms and extra relationships

  36. Searching for GO terms http://www.ebi.ac.uk/QuickGO/ http://amigo.geneontology.org … there are more browsers available on the GO Tools page: http://www.geneontology.org/GO.tools.browsers.shtml The latest OBO Gene Ontology file can be downloaded from: http://www.geneontology.org/ontology/gene_ontology.obo

  37. Exercise • Browsing the Gene Ontology using QuickGO • Exercise 1 • 15 mins

  38. PART II: GO Annotation

  39. A GO annotation is… A statement that a gene product: 1. has a particular molecular function Or is involved in a particular biological process Or is located within a certain cellular component 2. as determined by a particular evidence 3. as described in a particular reference

  40. http://www.geneontology.org/GO.evidence.shtml Evidence codes IDA: enzyme assay IPI: e.g. Y2H BLASTs, orthology comparison, HMMs subcategories of ISS review papers

  41. GO evidence code decision tree

  42. GOA makes annotations using two methods • Electronic • Quick way of producing large numbers of annotations • Annotations are less detailed • Manual • Time-consuming process producing lower numbers of annotations • Annotations are very detailed and accurate

  43. Electronic annotation by GOA • 1. Mapping of external concepts to GO terms • InterPro2GO (protein domains) • SPKW2GO (UniProt/Swiss-Prot keywords) • HAMAP2GO (Microbial protein annotation) • EC2GO (Enzyme Commission numbers) • SPSL2GO (Swiss-Prot subcellular locations)

  44. Electronic annotation by GOA Aspartate transaminase activity ; GO:0004069 lipid transport; GO:0006869

  45. Electronic annotation by GOA • 2. Automatic transfer of annotations to orthologs

  46. Manual annotation by GOA • High-quality, specific annotations using: • Peer-reviewed papers • A range of evidence codes to categorize the types of evidence found in a paper www.ebi.ac.uk/GOA

  47. Finding annotations in a paper …for B. napus PERK1 protein (Q9ARH1) In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… serine/threoninekinase activity, integral membrane protein wound response PubMed ID: 12374299 Function: protein serine/threonine kinase activity GO:0004674 Component: integral to plasma membrane GO:0005887 Process: response to wounding GO:0009611

  48. Additional information • Qualifiers Modify the interpretation of an annotation • NOT(protein is not associated with the GO term) • colocalizes_with(protein associates with complex but is not a bona fide member) • contributes_to (describes action of a complex of proteins) • 'With' column Can include further information on the method being referenced e.g. the protein accession of an interacting protein

  49. The NOT qualifier • NOT is used to make an explicit note that the gene product is not associated with the GO term • Also used to document conflicting claims in the literature • NOT can be used with ALL three gene ontologies

  50. In these cells, SIPP1 was mainly present in the nucleus, where it displayed a non-uniform, speckled distribution and appeared to be excluded from the nucleoli. excluded from the nucleoli

More Related