Chapter 8: Biological Knowledge Assembly and Interpretation - PowerPoint PPT Presentation

chapter 8 biological knowledge assembly and interpretation n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Chapter 8: Biological Knowledge Assembly and Interpretation PowerPoint Presentation
Download Presentation
Chapter 8: Biological Knowledge Assembly and Interpretation

play fullscreen
1 / 56
Chapter 8: Biological Knowledge Assembly and Interpretation
129 Views
Download Presentation
mandell
Download Presentation

Chapter 8: Biological Knowledge Assembly and Interpretation

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Chapter 8: Biological Knowledge Assembly and Interpretation JuHan Kim Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, Korea, Presenter: Zhen Gao

  2. Outline • Review of major computational approaches to facilitate biological interpretation of • high-throughput microarray • and RNA-Seq experiments.

  3. Input: Microarray / RNA seq • DEG: Differentially Expressed Genes co-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair • FAA: Functional Annotation Analysis: • Gene Ontology (GO) or Pathway analysis Gene list with annotations • Visualization, sematic assembling and knowledge learning: • Concept lattice analysis : BioLattice

  4. Glossary • FAA: Functional Annotation Analysis • GO: Gene Ontology • Pathway • DEG: Differentially Expressed Genes • GSEA: Gene Set Enrichment Analysis • Biological Interpretation and Biological Semantics • Concept lattice analysis

  5. Pathway and Ontology-Based Analysis • GO and biological pathway-based analysis: • one of the most powerful methods for inferring the biological meanings of expression changes • list of genes obtained by: • differential expression analysis • co-expression analysis (or clustering)

  6. Pathway and Ontology-Based Analysis

  7. Pathway and Ontology-Based Analysis • Attributes can be applied for FAA: • transcription factor binding • clinical phenotypes like disease associations • MeSH(Medical Subject Heading) terms • microRNA binding sites • protein family memberships • chromosomal bands, etc • GO terms • biological pathways

  8. Pathway and Ontology-Based Analysis • Features may have their own ontological structures • GO has a structure as a DAG (Directed Acyclic Graph)

  9. Pathway and Ontology-Based Analysis • DEGs:

  10. Input: Microarray / RNA seq • DEG: Differentially Expressed Genes co-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair • FAA: Functional Annotation Analysis: • Gene Ontology (GO) or Pathway analysis Gene list with annotations • Visualization, sematic assembling and knowledge learning: • Concept lattice analysis : BioLattice

  11. Pathway and Ontology-Based Analysis • DEGs: • 3 techniques which help obtain DEGs: • t-test • Wilcoxon’s rank sum test • ANOVA • Need to note that multiple-hypothesis-testing problem should be properly managed

  12. Pathway and Ontology-Based Analysis • Co-expression analysis

  13. Pathway and Ontology-Based Analysis • Co-expression analysis • puts similar expression profiles together and different ones apart • Returning genes that are assumed to be co-regulated • Clustering algorithms: • hierarchical-tree clustering • partitionalclustering

  14. Pathway and Ontology-Based Analysis • Pathwaysare powerful resources for the understanding of shared biological processes • E.g.: KEGG, MetaCycand BioCarta (signaling pathways)

  15. Pathway and Ontology-Based Analysis • MetaCyc: • an experimentally determined non-redundant metabolic pathway database • It is the largest collection • containing over 1400 metabolic pathways

  16. Pathway and Ontology-Based Analysis • Ontology / GO: • providing a shared understanding of a certain domain of information • controlled vocabularies • DAG structures with 3 vocabularies of GO: • Molecular Function (MF) • Cellular Compartment (CC) • Biological Process (BP)

  17. Pathway and Ontology-Based Analysis • Common Gos: • MIPS: integrated source, protein properties, variety of complete genomes • MeSH: clinical including disease names • OMIM (Online Mendelian Inheritance in Man) • UMLS (Unified Medical Language System)

  18. Pathway and Ontology-Based Analysis • GO enrichment test: • For example • if 20% of the genes in a gene list are annotated with a GO term ‘apoptosis’ • only 1% of the genes in the whole human genome fall into this functional category

  19. Pathway and Ontology-Based Analysis • Common statistical tests: • Chi-square • binomial • hypergeometric tests

  20. Pathway and Ontology-Based Analysis • hypergeometrictest:

  21. Pathway and Ontology-Based Analysis • Avoid pitfalls when using hypergeometrictest • Choice of background, that makes substantial impact on the result. • All genes having at least one GO annotation • all genes ever known in genome databases • all genes on the microarray • GO has a hierarchical tree (or graphical) structure while hypergeometrictest assumes independence of categories

  22. Pathway and Ontology-Based Analysis • Common Tools • DAVID • ArrayX- Path • Pathway Miner • EASE • GOFish • GOTree etc.

  23. Gene Set-Wise Differential Expression Analysis

  24. Input: Microarray / RNA seq • DEG: Differentially Expressed Genes co-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair • FAA: Functional Annotation Analysis: • Gene Ontology (GO) or Pathway analysis Gene list with annotations • Visualization, sematic assembling and knowledge learning: • Concept lattice analysis : BioLattice

  25. Gene Set-Wise Differential Expression Analysis • Evaluates coordinated differential expression of gene groups • Gene Set Enrichment Analysis (GSEA) • The first developed in this category • evaluates for each a pre-defined gene set the significant association with phenotypic classes

  26. Gene Set-Wise Differential Expression Analysis • Difference between FAA and GSEA: • FAA: find over-represented GO terms from a interesting gene list • GSEA: obtain the pre-defined gene list first and test the changes under different conditions.

  27. Gene Set-Wise Differential Expression Analysis • Advantages of gene set-wise differential expression analysis: • successfully identified modest but coordinated changes in gene expression that might have been missed by conventional ‘individual gene-wise’ differential expression analysis. • (many tiny expression changes can collectively create a big change) • straightforward biological interpretation because the gene sets are defined by biological knowledge

  28. Gene Set-Wise Differential Expression Analysis • Enrichment Score (ES) is calculated by evaluating the fractions of genes in S (‘‘hits’’) weighted by their correlation and the fractions of genes not in S (‘‘misses’’) present up to a given position i in the ranked gene list, L, where N genes are ordered according to the correlation,

  29. Gene Set-Wise Differential Expression Analysis • Typical gene sets: • regulatory-motif • function-related • disease-related sets • Database: • MSigDB: • 6769 gene sets • classified into five different collections • Has some interesting extensions

  30. Differential Co-Expression Analysis

  31. Input: Microarray / RNA seq • DEG: Differentially Expressed Genes co-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair • FAA: Functional Annotation Analysis: • Gene Ontology (GO) or Pathway analysis Gene list with annotations • Visualization, sematic assembling and knowledge learning: • Concept lattice analysis : BioLattice

  32. Differential Co-Expression Analysis • Co-expression analysis: • determines the degree of co-expression of a cluster of genes under a certain condition • Differential co-expression analysis: • determines the degree of co-expression difference of a gene pair or a gene cluster across different conditions

  33. Differential Co-Expression Analysis • 3 major types: • (a) differential co-expression of gene cluster(s) • (b) gene pair-wise differential co- expression • (c) differential co-expression of paired gene sets

  34. Differential Co-Expression Analysis • Type (a), identify differentially co-expressed gene cluster(s) between two conditions • Let conditions and genes be denoted by J and I, respectively. The mean squared residual of model is a measurement of co-expression of genes:

  35. Differential Co-Expression Analysis Type (a) cont.

  36. Differential Co-Expression Analysis • Type (b)

  37. Differential Co-Expression Analysis • Type (b), identify differentially co-expressed gene pairs • Techniques: • F-statistic • A meta-analytic approach

  38. Differential Co-Expression Analysis • Note that identification of differentially co-expressed gene clusters or gene pairs usually do not use a pre-defined gene sets or pairs. • Thus the interpretation may also be improved by ontology and pathway-based annotation analysis.

  39. Differential Co-Expression Analysis • Type (c), dCoxS(differential co-expression of gene sets) algorithm identifies gene set pairs differentially co-expressed across different conditions • Biological pathways can be used as pre-defined gene sets and the differential co-expression of the biological pathway pairs between conditions is analyzed.

  40. Differential Co-Expression Analysis • Type (c) cont. • To measure the expression similarity between paired gene-sets under the same condition, dCoxS defines the interaction score (IS) as the correlation coefficient between the sample-wise entropies. Even when the numbers of the genes in different pathways are different, IS can always be obtained because it uses only sample-wise distances regardless of whether the two pathways have the same number of genes or not.

  41. Differential Co-Expression Analysis • Type (c) cont.

  42. Biological Interpretation and Biological Semantics

  43. Input: Microarray / RNA seq • DEG: Differentially Expressed Genes co-expression / clustering Gene Set-Wise Differential Expression Analysis Differential Co-Expression Analysis Interest gene, genes list, gene pair or gene list pair • FAA: Functional Annotation Analysis: • Gene Ontology (GO) or Pathway analysis Gene list with annotations • Visualization, sematic assembling and knowledge learning: • Concept lattice analysis : BioLattice

  44. Biological Interpretation and Biological Semantics • Biomedical semantics provides rich descriptions for biomedical domain knowledge. • Motivation for Biological Semantics: • GO has limitations: • The result of GO is typically a long unordered list of annotations • Most of the analysis tools evaluate only one cluster at a time • time-consuming to read the massive annotation lists • hard to manually assemble • Many annotations are redundant

  45. Biological Interpretation and Biological Semantics • Introducing BioLattice: • a mathematical framework • based on concept lattice analysis • organize traditional clusters and associated annotations into a lattice of concepts • A graphical summary • considers gene expression clusters as objects and annotations as attributes • Thus, complex relations among clusters and annotations are clarified, ordered and visualized.

  46. Biological Interpretation and Biological Semantics • Another advantage of BioLattice is that heterogeneous biological knowledge resources can be added