1 / 49

Meta Analysis and Differential Network Analysis with Applications in Mouse Expression Data

Meta Analysis and Differential Network Analysis with Applications in Mouse Expression Data. Steve Horvath. Outline. Standard differential expression analysis Statistical power studies Important network concepts Single versus differential network analysis Differential network construction.

nell
Download Presentation

Meta Analysis and Differential Network Analysis with Applications in Mouse Expression Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Meta Analysis and Differential Network Analysis with Applications in Mouse Expression Data Steve Horvath

  2. Outline • Standard differential expression analysis • Statistical power studies • Important network concepts • Single versus differential network analysis • Differential network construction

  3. Standard (gene based) differential expression analysis • Many software packages and R functions calculate T tests, p-values, false discovery rates, fold changes, etc. • WGCNA R functions: • For a binary trait (e.g. case control status), use standardScreeningBinaryTrait • For a numeric trait (e.g. body weight), use standardScreeningNumericTrait • For a right censored time variable, use standardScreeningCensoredTime

  4. metaAnalysis R function in the WGCNA R package

  5. helpfile metaAnalysis

  6. Stouffer Z statistics from metaAnalysis

  7. Ranking based metaAnalysis statistics

  8. Combine several gene rankings using the rankPvalue function

  9. Statistical Power Studies

  10. Statistical power calculations According to google scholar, it was cited by 11708 (July 2013).

  11. Network concept=network statistics

  12. Network=Adjacency Matrix • A network can be represented by an adjacency matrix, A=[aij], that encodes whether/how a pair of nodes is connected. • A is a symmetric matrix with entries in [0,1] • For unweighted network, entries are 1 or 0 depending on whether or not 2 nodes are adjacent (connected) • For weighted networks, the adjacency matrix reports the connection strength between node pairs • Our convention: diagonal elements of A are all 1.

  13. Challenge: Develop simple descriptive measures that describe the patterns. Solution: The following network concepts are useful: density, centralization, clustering coefficient, heterogeneity Motivational example I:Pair-wise relationships between genes across different mouse tissues and genders

  14. Motivational example (continued) Challenge: Find a simple measure for describing the relationship between gene significance and connectivity Solution: network concept called hub gene significance

  15. Backgrounds • Network concepts are also known as network statistics or network indices • Examples: connectivity (degree), clustering coefficient, topological overlap, etc • Network concepts underlie network language and systems biological modeling. • Dozens of potentially useful network concepts are known from graph theory.

  16. Review of somefundamental network concepts which are defined for all networks (not just co-expression networks)Horvath 2011 Weighted Network Analysis. Springer Book. Hardcover ISBN: 978-1-4419-8818-8Dong Horvath 2007 Understanding network concepts in modules BMC Syst BiolHorvath Dong (2008) Geometric Interpretation of Gene Co-expression network analysis. Plos Comp Biol

  17. Connectivity • Node connectivity = row sum of the adjacency matrix • For unweighted networks=number of direct neighbors • For weighted networks= sum of connection strengths to other nodes

  18. Density • Density= mean adjacency • Highly related to mean connectivity

  19. Centralization = 1 if the network has a star topology = 0 if all nodes have the same connectivity Centralization = 0 because all nodes have the same connectivity of 2 Centralization = 1 because it has a star topology

  20. Heterogeneity • Heterogeneity: coefficient of variation of the connectivity • Highly heterogeneous networks exhibit hubs

  21. Clustering Coefficient Measures the cliquishness of a particular node « A node is cliquish if its neighbors know each other » This generalizes directly to weighted networks (Zhang and Horvath 2005) Clustering Coef of the black node = 0 Clustering Coef = 1

  22. The topological overlap dissimilarity is used as input of hierarchical clustering • Generalized in Zhang and Horvath (2005) to the case of weighted networks • Generalized in Li and Horvath (2006) to multiple nodes • Generalized in Yip and Horvath (2007) to higher order interactions

  23. Network Significance • Defined as average gene significance • We often refer to the network significance of a module network as module significance.

  24. Maximum adjacency ratio

  25. Network concepts for comparing two networks

  26. Differential network concepts • Node specific statistics: • Diff.ClusterCoef(i) = CC1(i) – CC2(i) • Diff.Mar(i)= MAR1(i) – MAR2(i) • Global statistics • Diff.MeanClusterCoef = Mean.CC1–Mean.CC2 • Diff.MeanConnectivity=Mean.k1 – mean.k2 • Diff.MeanMAR=Mean.MAR1 – mean.MAR2 • Diff.MeanKME=Mean.KME • Diff.Density=Density1 – Density2 • can be calculated via the modulePreservation function

  27. Measuring the similarity between two networks

  28. R code for computing network concepts

  29. R code, help file

  30. Data analysis strategies Single network analysis versus differential network analysis

  31. Goals of Single Network Analysis • Identifying genetic pathways (modules) • Finding key drivers (hub genes) • Modeling the relationships between: • Transcriptome • Clinical traits / Phenotypes • Genetic marker data

  32. Validationset 1 Validationset 2 Single Network WGCNA 1 gene co-expression network Multiple data sets may be used for validation

  33. Goals of Differential Network Analysis • Uncover differences in modules and connectivity in different data sets • Ex: Human versus chimpanzee brains (Oldham et al. 2006) • Differing topology in multiple networks reveals genes/pathways that are wired differently in different sample populations Fuller TF, Ghazalpour A, Aten JE, Drake TA, Lusis AJ, …(2007) "Weighted Gene Co-expression Network Analysis Strategies Applied to Mouse Weight", Mamm Genome. 18(6):463-472 Oldham MC, …Geschwind DH (2006) Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc Natl Acad Sci U S A 103, 17973-17978.

  34. Differential Network WGCNA NETWORK 1 NETWORK 2 2+ gene co-expression networks Identify genes and pathways that are: • Differentially expressed • Differentially wired

  35. 135 FEMALES NETWORK 1 NETWORK 2 BxH Mouse Data from AJ Lusis • Single network analysis female BxH mice revealed a weight-related module (Ghazalpour et al. 2006) • Samples: Constructed networks from mice from extrema of weight spectrum: • Network 1: 30 leanest mice • Network 2: 30 heaviest mice • Transcripts: Used 3421 most connected and varying transcripts Ghazalpour A, Doss S, Zhang B, Wang S, Plaisier C, Castellanos R, Brozell A, Schadt EE, Drake TA, Lusis AJ, Horvath S (2006) Integrating genetic and network analysis to characterize genes related to mouse weight. PLoS genetics 2, e130

  36. Methods • Compute Comparison Metrics • Difference in expression: t-test statistic • Compare difference in connectivity: DiffK Identify significantly different genes/pathways Permutation test Functional analysis of significant genes/pathways DAVID database Primary literature

  37. Computing Comparison Metrics DIFFERENTIAL EXPRESSION t-test statistic computed for each gene, t(i) DIFFERENTIAL CONNECTIVITY K1(i) = k1(i) K2(i) = k2(i) max(k1) max(k2) DiffK(i): difference in normalized connectivities for each gene: DiffK(i) = K1(i) – K2(i)

  38. Sector Plot • We visualize the comparison metrics via a sector plot: • x-axis: DiffK • y-axis: t statistics • We establish sector boundaries to identify regions of differentially expressed and/or connected regions • |t| = 1.96 corresponding to p = 0.05 • |DiffK| = 0.4

  39. NETWORK 1 NETWORK 2 Permutation test:Identifying significant sectors no.perms: number of permutations For each sector j, we compare the number of genes in unpermuted and permuted sectors (nobs and nperm) PERMUTE

  40. X 0.001 0.001 X X 0.01 0.001 X Sector Plot Results

  41. Functional Analysis SECTOR 3 High t statistic High DiffK Yellow module in lean Grey in obese (63 genes) SECTOR 5 Low t statistic High Diff K (28 genes) Genes in these sectors have higher connectivity in lean than obese mice: ~ pathways potentially disregulated in obesity ~

  42. Sector 3:Functional Analysis Results DAVID Database • “Extracellular”: • extracellular region (38% of genes p = 1.8 x 10-4) • extracellular space (34% of genes p = 5.7 x 10-4) • signaling (36% of genes p = 5.4 x 10-4) • cell adhesion (16% of genes p = 7.7 x 10-4) • glycoproteins (34% of genes p = 1.6 x 10-3) • 12 terms for epidermal growth factor or its related proteins • EGF-like 1 (8.2% of genes p = 8.7 x 10-4), • EGF-like 3 (6.6% of genesp = 1.6 x 10-3), • EGF-like 2 (6.6% of genes p = 6.0 x 10-3), • EGF (8.2% of genes p = 0.013) • EGF_CA (6.6% of genes p = 0.015)

  43. Sector 3:Functional Analysis Results Primary Literature • Results supported by a study on EGF levels in mice (Kurachi et al. 1993) • EGF found to be increased in obese mice • Obesity was reversed in these mice by: • Administration of anti-EGF • Sialoadenectomy Kurachi H, Adachi H, Ohtsuka S, Morishige K, Amemiya K, Keno Y, Shimomura I, Tokunaga K, Miyake A, Matsuzawa Y, et al. (1993) Involvement of epidermal growth factor in inducing obesity in ovariectomized mice. The American journal of physiology 265, E323-331

  44. Sector 5: Functional Analysis ResultsDAVID Database • Enzyme inhibitor activity (p = 2.9 x 10-3)* • Protease inhibitor activity (p = 6.0 x 10-3) • Endopeptidase inhibitor activity (p = 6.0 x 10-3) • Dephosphorylation (p = 0.012) • Protein amino acid dephosphorylation (p = 0.012) • Serine-type endopeptidase inhibitor activity (p = 0.042) * p values shown are corrected using Bonferroni correction

  45. Sector 5: Functional Analysis ResultsPrimary Literature Itih1 and Itih3 • Enriched for all categories shown previously • Located near a QTL for hyperinsulinemia (Almind and Kahn 2004) • Itih3 identified as a gene candidate for obesity-related traits based on differential expression in murine hypothalamus (Bischof and Wevrick 2005) Serpina3n and Serpina10 • Enriched for enzyme inhibitor, protease inhibitor, and endopeptidase inhibitor • Serpina10, or Protein Z-dependent protease inhibitor (ZPI) has been found to be associated with venous thrombosis (Van de Water et al. 2004) Almind K, Kahn CR (2004) Genetic determinants of energy expenditure and insulin resistance in diet-induced obesity in mice. Diabetes 53, 3274-3285 Bischof JM, Wevrick R (2005) Genome-wide analysis of gene transcription in the hypothalamus. Physiological genomics 22, 191-196 Van de Water N, Tan T, Ashton F, O'Grady A, Day T, Browett P, Ockelford P, Harper P (2004) Mutations within the protein Z-dependent protease inhibitor gene are associated with venous thromboembolic disease: a new form of thrombophilia. Bjh 127, 190-194

  46. Discussion • If applicable, always report findings from a standard differential expression analysis as well. • A host of network concepts exists for describing the network topology. • Relatively few people use differential network analysis which may reflect the fact that large sample sizes are needed. • A large sample size is needed to compare two correlation coefficients • To check whether a module is preserved in another network use the modulePreservation function.

  47. Acknowledgements HORVATH LAB Dissertation work of Tova Fuller Jun Dong Peter Langfelder Mouse data collaboration LUSIS LAB Jake Lusis Anatole Ghazalpour Thomas Drake An R tutorial may be found at: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/DifferentialNetworkAnalysis

More Related