1 / 37

Introduction to the GO: a user’s guide

Introduction to the GO: a user’s guide. Iowa State Workshop 11 June 2009. All workshop materials are available at AgBase. Genomic Annotation. Genome annotation is the process of attaching biological information to genomic sequences. It consists of two main steps:

jaden
Download Presentation

Introduction to the GO: a user’s guide

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to the GO:a user’s guide Iowa State Workshop 11 June 2009

  2. All workshop materials are available at AgBase.

  3. Genomic Annotation • Genome annotation is the process of attaching biological information to genomic sequences. It consists of two main steps: • identifying functional elements in the genome: “structural annotation” • attaching biological information to these elements: “functional annotation” • biologists often use the term “annotation” when they are referring only to structural annotation

  4. TRAF 1, 2 and 3 TRAF 1 and 2 Structural annotation: DNA annotation CHICK_OLF6 Protein annotation Data from Ensembl Genome browser

  5. Functional annotation: catenin

  6. Structural & Functional Annotation Structural Annotation: • Open reading frames (ORFs) predicted during genome assembly • predicted ORFs require experimental confirmation • the Sequence Ontology (SO) provides a structured controlled vocabulary for sequence annotation Functional Annotation: • annotation of gene products = Gene Ontology (GO) annotation • initially, predicted ORFs have no functional literature and GO annotation relies on computational methods (rapid) • functional literature exists for many genes/proteins prior to genome sequencing • GO annotation does not rely on a completed genome sequence!

  7. Provides structural annotation for agriculturally important genomes • Provides functional annotation (GO) • Provides tools for functional modeling • Provides bioinformatics & modeling support for research community

  8. Introduction to GO • pre-GO: managing large datasets • Bio-ontologies • the Gene Ontology (GO) • a GO annotation example • GO evidence codes • literature biocuration & computation analysis • ND vs no GO • sources of GO

  9. 1. pre-GO: managing large datasets

  10. AgBase User Support • Functional modeling training • Database ID mapping • approx. 75% of requests • Providing GO annotation for datasets/arrays • Assistance with GO modeling tools • Intermediary with between research community and public databases • NCBI, UniProtKB, GO Consortium • Computational assistance

  11. Converting database accessions • UniProt database • Ensembl BioMart • Online analysis tools • DAVID, g:profiler, etc • AgBase database • ArrayIDer tool More information about these tools is available from the online workshop resources.

  12. 1. UniProt ID Mapping

  13. 2. Ensembl BioMart NOTE: Ensembl is scheduled to add plant & microbe species in 2009.

  14. 3. Online analysis tools g:profiler conversion tool http://biit.cs.ut.ee/gprofiler/gconvert.cgi This tool works for all species found in Ensembl.

  15. 3. Online analysis tools Database for Annotation, Visualization and Integrated Discovery (DAVID) http://david.abcc.ncifcrf.gov/conversion.jsp This tool works for a wide range of species.

  16. 4. AgBase: ArrayIDer Contact AgBase to request additional species.

  17. 2. Bio-ontologies

  18. Bio-ontologies • Bio-ontologies are used to capture biological information in a way that can be read by both humans and computers. • necessary for high-throughput “omics” datasets • allows data sharing across databases • Objects in an ontology (eg. genes, cell types, tissue types, stages of development) are well defined. • The ontology shows how the objects relate to each other.

  19. Bio-ontologies: http://www.obofoundry.org/

  20. Ontologies relationships between terms digital identifier (computers) description (humans)

  21. 3. The Gene Ontology

  22. Functional Annotation • Gene Ontology (GO) is the de facto method for functional annotation • Widely used for functional genomics (high throughput) • Many tools available for gene expression analysis using GO • The GO Consortium homepage: http://www.geneontology.org

  23. NDUFAB1 GO Mapping Example NDUFAB1 (UniProt P52505) Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa Biological Process (BP or P) GO:0006633 fatty acid biosynthetic process TAS GO:0006120 mitochondrial electron transport, NADH to ubiquinone TAS GO:0008610 lipid biosynthetic process IEA Molecular Function (MF or F) GO:0005504 fatty acid binding IDA GO:0008137 NADH dehydrogenase (ubiquinone) activity TAS GO:0016491 oxidoreductase activity TAS GO:0000036 acyl carrier activity IEA Cellular Component (CC or C) GO:0005759 mitochondrial matrix IDA GO:0005747 mitochondrial respiratory chain complex I IDA GO:0005739 mitochondrion IEA

  24. NDUFAB1 GO Mapping Example NDUFAB1 (UniProt P52505) Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa GO:ID (unique) aspect or ontology Biological Process (BP or P) GO:0006633 fatty acid biosynthetic process TAS GO:0006120 mitochondrial electron transport, NADH to ubiquinone TAS GO:0008610 lipid biosynthetic process IEA Molecular Function (MF or F) GO:0005504 fatty acid binding IDA GO:0008137 NADH dehydrogenase (ubiquinone) activity TAS GO:0016491 oxidoreductase activity TAS GO:0000036 acyl carrier activity IEA Cellular Component (CC or C) GO:0005759 mitochondrial matrix IDA GO:0005747 mitochondrial respiratory chain complex I IDA GO:0005739 mitochondrion IEA GO evidence code GO term name

  25. NDUFAB1 GO EVIDENCE CODES Direct Evidence Codes IDA - inferred from direct assay IEP - inferred from expression pattern IGI - inferred from genetic interaction IMP - inferred from mutant phenotype IPI - inferred from physical interaction Indirect Evidence Codes inferred from literature IGC - inferred from genomic context TAS - traceable author statement NAS - non-traceable author statement IC - inferred by curator inferred by sequence analysis RCA - inferred from reviewed computational analysis IS* - inferred from sequence* IEA - inferred from electronic annotation Other NR - not recorded (historical) ND - no biological data available GO Mapping Example NDUFAB1 (UniProt P52505) Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa Biological Process (BP or P) GO:0006633 fatty acid biosynthetic process TAS GO:0006120 mitochondrial electron transport, NADH to ubiquinone TAS GO:0008610 lipid biosynthetic process IEA Molecular Function (MF or F) GO:0005504 fatty acid binding IDA GO:0008137 NADH dehydrogenase (ubiquinone) activity TAS GO:0016491 oxidoreductase activity TAS GO:0000036 acyl carrier activity IEA Cellular Component (CC or C) GO:0005759 mitochondrial matrix IDA GO:0005747 mitochondrial respiratory chain complex I IDA GO:0005739 mitochondrion IEA ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model

  26. NDUFAB1 GO EVIDENCE CODES Direct Evidence Codes IDA - inferred from direct assay IEP - inferred from expression pattern IGI - inferred from genetic interaction IMP - inferred from mutant phenotype IPI - inferred from physical interaction Indirect Evidence Codes inferred from literature IGC - inferred from genomic context TAS - traceable author statement NAS - non-traceable author statement IC - inferred by curator inferred by sequence analysis RCA - inferred from reviewed computational analysis IS* - inferred from sequence* IEA - inferred from electronic annotation Other NR - not recorded (historical) ND - no biological data available GO Mapping Example • Biocuration of literature • detailed function • “depth” • slower (manual) ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model

  27. P05147 PMID: 2976880 Biocuration of Literature: detailed gene function Find a paper about the protein.

  28. Use most specific term possible Read paper to get experimental evidence of function experiment assayed kinase activity: use IDA evidence code

  29. NDUFAB1 GO EVIDENCE CODES Direct Evidence Codes IDA - inferred from direct assay IEP - inferred from expression pattern IGI - inferred from genetic interaction IMP - inferred from mutant phenotype IPI - inferred from physical interaction Indirect Evidence Codes inferred from literature IGC - inferred from genomic context TAS - traceable author statement NAS - non-traceable author statement IC - inferred by curator inferred by sequence analysis RCA - inferred from reviewed computational analysis IS* - inferred from sequence* IEA - inferred from electronic annotation Other NR - not recorded (historical) ND - no biological data available GO Mapping Example • Biocuration of literature • detailed function • “depth” • slower (manual) • Sequence analysis • rapid (computational) • “breadth” of coverage • less detailed ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model

  30. IEA PIPELINE fasta file of sequences (aa or nt) InterPro analysis (domains/motifs) GO2InterPro mapping file domains/motifs in sequence assign GO (IEA) no GO: “ND” ga file Computational GO annotation (“breadth”) ISO PIPELINE accessions from your species (species 1) public orthology prediction tool(s) 1:1 orthologs existing GO annotations transfer GO annotation to your species (ISO) accessions with no ISO ga file (integrate output into one ga file) Ranjit Kumar

  31. Unknown Function vs No GO • ND – no data • Biocurators have tried to add GO but there is no functional data available • Previously: “process_unknown”, “function_unknown”, “component_unknown” • Now: “biological process”, “molecular function”, “cellular component” • No annotations (including no “ND”): biocurators have not annotated

  32. Primary sources of GO: from the GO Consortium (GOC) & GOC members • most up to date • most comprehensive • Secondary sources: other resources that use GO provided by GOC members • public databases (eg. NCBI, UniProtKB) • genome browsers (eg. Ensembl) • array vendors (eg. Affymetrix) • GO expression analysis tools

  33. Different tools and databases display the GO annotations differently. • Since GO terms are continually changing and GO annotations are continually added, need to know when GO annotations were last updated.

  34. Secondary Sources of GO annotation • EXAMPLES: • public databases (eg. NCBI, UniProtKB) • genome browsers (eg. Ensembl) • array vendors (eg. Affymetrix) • CONSIDERATIONS: • What is the original source? • When was it last updated? • Are evidence codes displayed?

  35. For more information about GO • GO Evidence Codes:http://www.geneontology.org/GO.evidence.shtml • gene association file information:http://www.geneontology.org/GO.format.annotation.shtml • tools that use the GO:http://www.geneontology.org/GO.tools.shtml • GO Consortium wiki:http://wiki.geneontology.org/index.php/Main_Page All websites are available from the workshop website & handout.

More Related