1 / 23

Functional annotation with Blast2GO

Functional annotation with Blast2GO. Index. Why Blast2GO? Concepts on functional annotations Function assignment Vocabularies GO and GOA The GO Direct Acyclic Graph The problem of function transfer Blast2GO Java application + Practicals Blast2GO @ Babelomics + Practicals.

jed
Download Presentation

Functional annotation with Blast2GO

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Functional annotation with Blast2GO

  2. Index • Why Blast2GO? Concepts on functional annotations • Function assignment • Vocabularies • GO and GOA • The GO Direct Acyclic Graph • The problem of function transfer • Blast2GO Java application + Practicals • Blast2GO @ Babelomics + Practicals

  3. Why Blast2GO? Workflow analysis Experiment Data-Analysis Gene-List MNAT1 CTNNBL1 ENOX2 GTPBP1 RALY TAGLN2 RAB3A PPP2R5A MAPRE1 ..... ... Functional Annotation + Functional interpretation Functional Profiling

  4. What does Blast2GO do? Generates annotations Visualization of funcional annotations

  5. Concepts of functional annotation • Gene/Protein function • Referes to the molecular function of a gene or a protein: Tyrosine kinase • Functional annotation • More general, referes to the characterization of functional aspect of the protein. Stress-related, cytoplasm, ABC transporter • Also referes to the process of assingment of a function label • Habitually, standard vocabularies are used to assign function

  6. Functional Vocabularies Molecular Function Biological Process Cellular Component Metabolic pathways KEGG orthologues Functional motifs

  7. The Gene Ontology • Project developed by the Gene Ontology Consortium • Provides a controlled vocabulary to describe gene and gene product attributes in any organism • Includes both the development of the Ontology and the maintenance of a Database of annotations

  8. The Ontology • Annotations are given to te most specific (low) level. • True path rule: annotation at a term implies annotation to all its parent terms • Annotation is given with an Evidence Code: • IDA: inferred by direct assay • TAS: traceable author statement • ISS: infered by sequence similarity • IEA: electronic annotation • …. More general More specific

  9. The GO has a DAG structure

  10. The Gene Ontology Database (GOA) • There is a collaborating institution per organism to provide annotations • Most of the GOA annotations come from UniProt • Most of the annotations are electronic annotations http://www.geneontology.org/GO.current.annotations.shtml

  11. InterPro • Collection of databases with functional annotation of protein motifs • Functional vocabulary at UniProt • There is an equivalence table between GO and InterPro http://www.ebi.ac.uk/interpro/databases.html

  12. Functional assignment Annotation Empirical Transference Literature reference Filogeny Molecular interactions Biochemical assay Sequence analysis Structure Comparison Sequence homology Gene/protein expression Identification of folds Motif identification

  13. Automatic annotation • GO annotations can be created by comparision to annotated sequences • To achieve enough coverage, high-throughput, automatic annotation is required • The most effective (also error prone) automatic annotation method is transfer from sequence similarity

  14. Concerns in functional transfer by similarity • Level of homology (~ from 40-60% is possible) • The overlap query and hit sequences (not much a problem) • The domain or structure function association • The paralog problem: genes with similar sequences might have different functional specifications • The evidence for the original annotation • Balance between quality and quantity: depends on the use GO1, GO2, GO3, GO4 HIT QUERY GO1, GO2, GO3, GO4

  15. Blast2GO • Suite for functional annotation and data mining on functional data • Considerations for annotation • Simlarity • Length of the overlap • Percentage of hit sequence spanned by the overlap • Evidence original annotation • Blast hits and motif hits • Refinement by additional methods • Visualization: • Annotation charts • Knowledge discovery on the DAG • Desktop Java application • web interface @ Babelomics: Babelomics for non-model

  16. Blast2GO Annotation strategy go1,go2, go3 go1,go3, go4 Sq1 go3,go5, go6,go8 go1,go4 Hit1 Hit1 Hit1 Hit1 Hit1 Hit1 go1,go2, go3 go6,go9,go8 go2 go6,go9, go8 Sq2 go1,go8 go2,go4, go4 go1,go8 go1,go3, go4 Hit2 Hit2 Hit2 Hit2 Hit2 Hit2 go2,go5, go6 go4,go1, go8,go9 go4,go1,go8,go9 go3,go5, go6,go8 Hit3 Hit3 Hit3 Hit3 Hit3 Hit3 go2,go4 go1,go4 Hit4 Hit4 Hit4 Hit4 Hit4 Hit4 go2 go2,go4, go4 Sq3 go2,go5, go6 go2,go4 Sq4 Hit1 Hit1 Hit2 Hit2 Sq1 Sq1 Sq1 Sq2 Sq2 Sq2 Blast Mapping Annotation Sq3 Sq3 Sq3 Sq4 Sq4 Sq4

  17. Blast2GO Annotation Strategy go1,go2, go3 go1,go3, go4 Sq1 go3,go5, go6,go8 go1,go4 go6,go9,go8 Sq2 go1,go8 go4,go1,go8,go9 go2 go2,go4, go4 Sq3 go2,go5, go6 go2,go4 Sq4 go1,go2, go3, GO11 Sq1 Refinement go8, GO12, GO13 Sq2 InterPro Annex GOSlim Manual Sq3 go2,go4 GO15 Sq4

  18. Blast2GO annotation rule Quality of annotation source Lowest term above threshold Possibility of abstraction Evidence codes Similarity requirement Recall vs. Precision Lowest.node [(max.sim x ECw) + (#GO-1 x GOw)] >= threshold

  19. Annotation example • Consider a given query sequence with three hit sequences,with the following GO terms: Hit sequence 1: 60% similarity; One GO term : GO1 with Evidence Code = IDA Hit sequence 2: 65% similarity; One GO terms: GO2 with Evidence Code = ISS Hit sequence 3: 67% similarity; One GO terms: GO3 with Evidence Code = IEA GO2 and GO3 are brother terms with parent term GO4 • Let compute the Annotation Score (AS) and annotation ouput in a number of scenarios:

  20. Annotation example * Scenario 1 • ECw (IDA)=1; ECw(ISS) = 0.8; ECw(IEA) = 0.7 (Evidence Code Control) • Annotation threshold is set to 55 • GOw = 0 (no contribution from childer terms) AS(GO1) = (60 * 1) + (1-1 * 0) = 60 > 55 --> GO1 is transfered to the query sequence AS(GO2) = (65 * 0.8) + (1-1 * 0) = 48 < 55 --> GO2 is NOT transfered AS(GO3) = (67 * 0.7) + (1-1 * 0) = 52 < 55 --> GO3 is NOT transfered AS(GO4) = (67 * 0.7) + (2-1 * 0) = 52 < 55 --> GO4 is NOT transfered * Scenario 2 • ECw (IDA)=1; ECw(ISS) = 0.8; ECw(IEA) = 0.7 (Evidence Code Control) • Annotation threshold is set to 55 • GOw = 5 (the childern contribution is enabeled) AS(GO1) = (60 * 1) + (1-1 * 5) = 60 > 55 --> GO1 is transfered to the query sequence AS(GO2) = (65 * 0.8) + (1-1 * 5) = 48 < 55 --> GO2 is NOT transfered AS(GO3) = (67 * 0.7) + (1-1 * 5) = 52 < 55 --> GO3 is NOT transfered AS(GO4) = (67 * 0.7) + (2-1 * 5) = 58 > 55 --> GO4 is transfered

  21. Annotation example * Scenario 3 • ECw (IDA)=1; ECw(ISS) = 0.8; ECw(IEA) = 0.7 (Evidence Code control) • Annotation threshold is set to 50 • GOw = 5 (the childern contribution is enabeled) AS(GO1) = (60 * 1) + (1-1 * 5) = 60 > 50 --> GO1 is transfered to the query sequence AS(GO2) = (65 * 0.8) + (1-1 * 5) = 52 > 50 --> GO2 is transfered to the query sequence AS(GO3) = (67 * 0.7) + (1-1 * 5) = 47 < 50 --> GO3 is NOT transfered AS(GO4) = (67 * 0.7) + (2-1 * 5) = 52 > 50 --> GO4 is NOT transfered (transferred child) * Scenario 4 • ECw (IDA)=1; ECw(ISS) = 1; ECw(IEA) = 1 (no Evidence Code control) • Annotation threshold is set to 55 • GOw = 5 (the childern contribution is enabeled) AS(GO1) = (60 * 1) + (1-1 * 5) = 60 > 55 --> GO1 is transfered to the query sequence AS(GO2) = (65 * 1) + (1-1 * 5) = 65 > 55 --> GO2 is transfered AS(GO3) = (67 * 1) + (1-1 * 5) = 67 > 55 --> GO3 is transfered AS(GO4) = (67 * 1) + (2-1 * 5) = 72 > 55 --> GO4 is NOT transfered (transferred child)

  22. B2G Highlighting on the DAG: The B2G Score • Coloring strategy to highlight regions in the DAG where the most interesting information is concentrated • The confluence score (B2G score) keeps a balance between the number of annotated sequences at one node and the distance to the origin of annotation

  23. Hands-on

More Related