1 / 41

Gene Ontology Analysis

Gene Ontology Analysis. Dr. Lars Eijssen. Contents. Gene ontology annotations The gene ontology tree Gene ontology based analysis. Part 1:. Gene ontology annotations. Gene Ontology.

wynn
Download Presentation

Gene Ontology Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gene OntologyAnalysis Dr. Lars Eijssen

  2. Contents • Gene ontologyannotations • The gene ontology tree • Gene ontologybasedanalysis

  3. Part 1: Gene ontologyannotations

  4. Gene Ontology • The Gene Ontology (GO) project gives a consistent description of gene products from different databases • GO consortium: http://www.geneontology.org

  5. Protein annotation with GO terms • Cellular component • nucleus • Chromosome • DNA topoisomerasecomplex • Molecularfunction • chromatin binding • DNA topoisomerase activity • DNA-dependent ATPaseactivity • Biologicalprocess • DNA replication • DNA topological change • DNA ligation • DNA repair Human DNA topo- isomeraseIIA (P11388)

  6. Evidence codes

  7. . . . Entrez Gene

  8. Ensembl

  9. Part 2: The Gene ontology TREE

  10. Relationshipswithin the tree

  11. Part 3: Gene ontologybasedanalysis

  12. Basicprinciple • The principle is the same as withbiologicalpathwayanalysis • Find the termsthatcontain the relativelyhighestnumber of significantlychangedgenes

  13. What’s different • In Gene Ontologyanalysisthere is a high redundancy of terms • Alsoit is a tree structure • These must be taken into account…

  14. Gene Ontology analysis tools • GO-Elite • topGO • David/EASE   Note: there are many more, but these illustrateseveralapproaches the European Nutrigenomics Organisation

  15. How to deal with redundant nodes in a tree? • Only keep the ‘best’ node in eachbranch in the results • How to determine the best? • Severalways…

  16. TopGO analysis • TopGO(bioconductor) • Integrates the knowledge about the relationship between GO terms (BP, MF, CC) for the calculation of statistical significance (Alexaet al., 2006). • Test statistics • Fisher`s exact test (define threshold i.e. FDR<0.05) • Kolmogorov Smirnov (KS) test (looks at distribution of P values) • GO scoring algorithms (classic, elim, weight) • classic scores each node independent • elim scores nodes bottom up, scores parent nodes after elimination of genes present in significant child node • weight scores nodes bottom up, assigns weights to genes based on P values obtained for each node Slide from: Caroline Reiff, RRI, Aberdeen

  17. Load limma table Enter threshold (P value or FDR) Enter cdfname topGO Slide from: Caroline Reiff, RRI, Aberdeen

  18. Scoring the tree (I) This node • Classic: This node plus subtreethese values are used to score! (because the genesbelong in factto that term as well) 2/20 (20/100) 5/10 (7/30) 3/25 (11/50) 2/20 1/15 7/10 Suppose all the boldvalues are significant  The classic algorithmwould return all these processes!

  19. Scoring the tree (II) • However, itwouldbebetter to only return the best term in everybranch • Best couldmean: the most specific significant one • Thiscanbeachievedbyremovinggenesthat are present in significant childleaves, from the parent’s score • Elim does this: 2/20 (20/100) 5/10 (7/30) 3/25 (11/50) (4/40) 2/20 1/15 7/10

  20. TopGO analysis output Example results table for elim Fisher test(top 15 GO biological processes) Slide from: Caroline Reiff, RRI, Aberdeen

  21. A GO Graph (squares= 15 most significant GO Ids) Slidefrom: Caroline Reiff, RRI, Aberdeen

  22. Scoring the tree (III) • Anotheroption to score branches wouldbe to compute the significance of eachleavejust as the classic algorithm • Hereafter, foreverybranch the most significant leave is the onethat is reported back

  23. GO_Elite • Smart algorithm • Produces full and prunedresults • Runs on Windows and Linux • Under development

  24. Go_Elite Searches relationships in a hierarchical nature Identifies most significant scoring GO term: with higher score than all sibling terms For sibling terms, if one sibling branch scores higher than the parent and another branch does not, the highest scoring term from the latter sibling branch is also selected for the GO-Elite output, but the parent term is not

  25. GO analysis versus pathway analysis • Biologicalpathwayscontain more information, GO classes are just sets of genesthatshareanannotation • And pathwaysallowbetter data visualisation • Pathways are generally more curated • GO classes are howeverorganisedin a tree, biologicalpathways are (in practice) not • GO classes are also more uniformly covering the space of biologicalprocesses, pathwayanalysisdependsheavilyon the pathwaysthat have been contributed/added • GO also covers cellularlocalisation and biochemicalfunction

  26. Thanks! • Questions?

More Related