1 / 37

The Gene Ontology Annotation (GOA) Database and enhancement of GO annotations through InterPro2GO

The Gene Ontology Annotation (GOA) Database and enhancement of GO annotations through InterPro2GO. Nicky Mulder mulder@ebi.ac.uk. Contents. Introduction to GOA Manual GOA annotation Electronic annotation: InterPro2GO GOA data flow Uses of GOA Future plans. What is GO annotation?. GO

Download Presentation

The Gene Ontology Annotation (GOA) Database and enhancement of GO annotations through InterPro2GO

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Gene Ontology Annotation (GOA) Database and enhancement of GO annotations through InterPro2GO Nicky Mulder mulder@ebi.ac.uk

  2. Contents • Introduction to GOA • Manual GOA annotation • Electronic annotation: • InterPro2GO • GOA data flow • Uses of GOA • Future plans

  3. What is GO annotation? GO Term ID • An annotation is a statement that a gene product • has a particular molecular function • is involved in a particular biological process • is located within a certain cellular component • …as determined by a particular method • …as described in a particular reference. Evidence Code Reference

  4. Gene Ontology Annotation (GOA) Database • GOA’s priority is to annotate the human, mouse and rat proteomes • Largest open-source contributor of annotations to GO • Provides 10 million annotations for more than 111,000 species • Share and integrate GO annotation

  5. How do we annotate GO terms  Manual Annotation  Electronic Annotation • All annotations must: • be attributed to a source • indicate what evidence was found to support the GO term-gene/protein association

  6. Manual annotation • High quality • Specific gene or gene product associations made using: • Peer reviewed papers • Evidence codes • BUT: • Time-consuming • Requires trained biologists

  7. Pubmed ID, Evidence code Read papers Find GO term Annotate to protein GO and EBI ftp sites Oracle RDBMS GOA-association file Manual GO annotation

  8. Protein2GO tool Online

  9. Information captured by GOA

  10. How successful is manual-GOA? 111740 taxa July 2006

  11. UniProt Curated or electronic rule based mappings High quality electronic protein to GO associations InterPro Keyword HAMAP EC Curated mapping e.g. EC:1.1.1.1 > GO:alcohol dehydrogenase activity ; GO:0004022 GO Electronic Annotation • Large-scale assignment of GO terms to UniProtKB entries using existing information within database entries and manual mappings • Get IEA evidence code

  12. www.uniprot.org/

  13. Mappings of external concepts to GO http://www.geneontology.org/GO.indices.shtml

  14. InterPro2GO mapping • InterPro is a resource that integrates protein signatures databases, e.g. Pfam, Prints, Prosite, ProDom, SMART, TIGRFAMs etc. • It provides a means of classifying proteins into families and identifying domains. • Each InterPro entry groups proteins belonging to the same family and potentially having the same function

  15. InterPro2Go mapping • Done manually, but using tools • Look at InterPro and protein annotation • For all Swiss-Prot proteins matching entry truly: • Get stats on DE lines, keywords, comments • Check how conserved common annotation is • Find appropriate GO term at most specific level that applies to all proteins (not necessarily domains)

  16. Tools used –”SQUID” Statistics options: keyword description Gene name Organism Comments, etc.

  17. SQUID statistics output

  18. SQUID statistics output

  19. InterPro2GO mapping in entry

  20. InterProScan output with GO terms

  21. InterPro2GO sanity checks • Run weekly • Reports: • Obsolete GO terms • Obsolete (deleted) IPRs • Secondary IPRs

  22. Exact term 151 24% Same lineage < granularity 273 43% Same lineage > granularity 24 4% New lineage 187 29% Minimal correct 424 67% Potentially incorrect 211 33% Precision 67-100% Quality of GO mapping • BioCreAtIvE test set -635 GO annotations through InterPro2GO Manually checked 44 proteins, 107 predictions: 97 correct (90%): -40 exact -57 same lineage 10 new lineage (unknown) 0 incorrect Camon et al., 2005, BMC Bioinformatics

  23. InterPro2GO mapping statistics

  24. How successful is IEA-GOA in general? • Provides large coverage • High Quality • However these annotations often use high-level GO terms and provide little detail. Manual ones: 336237 70728 Jun 2006

  25. Total GO statistics

  26. GOA data flow Gene association files

  27. Gene Association file format http://www.geneontology.org/GO.annotation.shtml

  28. Example GOA cow file

  29. Output from the GOA database New Non-Redundant: based on IPI GOA Cow Redundant GA slim for UniProt + GO slims Data also available in SRS, UniProt, QuickGO, MODs, Ensembl etc.

  30. GA Files for Non-redundant species • Non-redundant complete protein set for each proteome is identified (>25% GO coverage) • Includes UniProt, IPI and MOD-specific IDs, e.g. mouse (MGI), rat (RGD), zebrafish (ZFIN) etc. • Xref files available with identifiers from: UniProt, IPI, RefSeq, Ensembl, UniGene etc. ftp://ftp.ebi.ac.uk/pub/databases/GO/goa ftp://ftp.ebi.ac.uk/pub/databases/integr8

  31. Uses of GOA data • Access protein functional information • Look at relationships between proteins, e.g. IntAct • Connect biological information to gene expression data • Determine functional composition of a proteome –using GO slim

  32. Uses of GOA Find functional information on proteins http://www.ebi.ac.uk/ego

  33. Uses of GOA Find functional information on interaction proteins (IntAct) http:www.ebi.ac.uk/intact

  34. Uses of GOA Overview proteome with GO Slim http://www.ebi.ac.uk/integr8

  35. Uses of GOA Analysis of high-throughput data according to GO Microarray data analysis Proteomics data analysis GO classification GO classification Larkin JE et al, Physiol Genomics, 2004 Kislinger T et al, Mol Cell Proteomics, 2003 Cunliffe HE et al, Cancer Res, 2003

  36. Future plans • Continue deep level annotation of human, mouse and rat • Manually annotate splice variants • Outreach and inclusion of new datasets e.g. grape • New electronic mappings, e.g. unipathway2go • Ortholog prediction for electronic GO annotation • Develop tools for annotation training

  37. Acknowledgements Rolf Apweiler Head of sequence database group Evelyn Camon GOA Coordinator Daniel Barrell GOA Programmer Emily Dimmer GOA Curator Rachael Huntley GOA Curator David Binns & John Maslen QuickGO, GOA tools All EBI UniProtKB Curators, HAMAP(SIB), IntAct, GO Editorial Office @ EBI All GO Consortium & associate members

More Related