1 / 29

Multivariate Analysis of Pathways

Multivariate Analysis of Pathways. Multivariate Approaches to Gene Set Selection. Key Multivariate Ideas. PCA (Principal Components Analysis) SVD (Singular Value Decomposition) MDS (Multi-dimensional Scaling) Hotelling T 2. PCA. PCA1 lies along the direction of

lysa
Download Presentation

Multivariate Analysis of Pathways

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multivariate Analysis of Pathways

  2. Multivariate Approaches to Gene Set Selection

  3. Key Multivariate Ideas • PCA (Principal Components Analysis) • SVD (Singular Value Decomposition) • MDS (Multi-dimensional Scaling) • Hotelling T2

  4. PCA PCA1 lies along the direction of maximal correlation; PCA 2 at right angles with the next highest variation. Three correlated variables

  5. Multivariate Representation of Pathways • BAD pathway Normal IBC Other BC • Clear separation between groups • Variation differences

  6. Hotelling’s T2 • Compute distance between sample means using (common) metric of covariation • Where • Multidimensional analog of t (actually F) statistic

  7. Principles of Kong et al Method • Normal covariation generally acts to preserve homeostasis • The transcription of genes that participate in many processes will be changed • The joint changes in genes will be most distinctive for those genes active in pathways that are working differently

  8. Critiques of Hotelling’s T • Small samples: unreliable S estimates • N < p • Estimates of Dx and S not robust to outliers • Assumes same covariance in each sample • S1 = S2 ? Usually not in disease • Kong et al propose analog of Welch t-test • Permutation in samples for significance

  9. Making it Stable • Insufficient information to capture all relationships – too much correlation! • Power of Hotelling’s method comes from identifying directions of rare variation • Many (spurious) directions of 0 variation • Random variation in data leads to random variation in PCA • Regularization strategy: force covariance to be more like IID

  10. Making it Robust • Microarray data has many outliers • Multivariate methods are very much distorted by outliers • Robust estimates of covariance could give robust PCA • Simple approach: trim outliers

  11. Handling Changes of Covariance • Power of Hotelling’s method comes from identifying directions of rare variation • If one group shows little covariation in one direction but the other does – how to test for changes? • If one group is control then its rare covariance changes should be taken as standard • Robust measure of means in both groups

  12. Detecting changes of covariance

  13. Meaning of Covariance Change • Meaning of covariance across individuals • Homeostasis in face of individual variation • e.g. BAD pathway: largest loadings of PC1 on PRKARB & ADCY1 • PRKARB represses CREB1; ADCY activates CREB1 • Gene sets whose covariance diminishes may • be responding to different inputs • have escaped their usual regulatory control • Characteristic of cancers

  14. Testing Covariance Changes • Idea: directions of small variation in one should match directions of small variation in other • Mathematical approach • Find solutions of S1 – lS2 • Solutions should all be near 1, if no change • Test statistic: easily computed • Computational approach • Ratio of largest to smallest: lmax / lmin

  15. Network Connectivity Methods

  16. Network Topology • Connections represent interactions: • Regulatory (one-way) • Protein interaction (two-way) • Hubs are genes with many connections • Bottlenecks are single genes that connect two parts of a functional network

  17. Devising Tests Based on Topology • Issues: how to weight more heavily the genes that are hubs • How to assess directionality of change • How to measure co-operativity (activation or repression changes in appropriate ways)

  18. Draghici et. al. Approach • Overall measure • Effective contribution (perturbation factor)

  19. Analysis of Outliers

  20. Outliers: Clues to Disease Process? • Outliers usually reflect idiosyncratic events • Recurrent outliers reflect rare events that are selected • If a particular pathway is disrupted in disease, but by many different mechanisms, then the expression profiles should • Lose healthy covariance • Show recurrent outliers • How to test for ‘consistent’ outliers? • COPA: a method for flagging recurrent outliers in expression data • Finds consistent fusion gene

  21. A Test Statistic for Consistent Outliers • Ratio of quantile differences to normal variation: (q.90 – q.10)tumor/max( (q.9-q.1)normal,0.4) • Compare to null distribution by permutation • Many genes show much higher ratios

  22. Statistical Significance • Find false positives confidence limits by permutations • Several hundred genes appear significant at 10-20% FDR • Actual scores: 267 scores are greater than 5, where 90% of permutations have fewer than 34 scores over 5

  23. A Test for Functional Groups • For each group G of genes • sG <- sum(scores[G])/sqrt(length(G)) • Scores: t-scores or range ratios • PAGE (BMC Bioinformatics, 2005)

  24. Do Genes Make Sense? • Quantile Ratio • [1] "DNA replication" • [2] "response to pathogenic fungi" • [6] "cleavage of lamin" • [7] "spindle organization and biogenesis" • [15] "response to osmotic stress" • [16] "nutrient import" • [22] "response to mercury ion" • T-test • [2] "sodium ion homeostasis" • [3] "leukocyte adhesive activation" • [4] "positive regulation of calcium-independent cell-cell adhesion" • [5] "oxytocin receptor activity" • [6] "ADP biosynthesis" • [7] "dADP biosynthesis" • [10] "regulation of muscle contraction" • [11] "caveolar membrane" • [12] "response to cold" • [16] "stress fiber formation" • [18] "positive regulation of complement activation" • [19] "astrocyte activation" • [22] "regulation of long-term neuronal synaptic plasticity" • [24] "positive regulation of endocytosis" • [25] "embryonic hemopoiesis"

  25. Cancer Functional Groups • Do very probable cancer genes show high-discrepancy in few samples? • Program: identify genes that might contribute to cancer processes: growth signaling, loss of cell-matrix adhesion, apoptosis • Do most samples from these categories show at least one gross mis-regulation? • Are they the same genes in most samples?

  26. Example: Cell Growth • Select genes in GO:001558 ‘regulation of cell growth’ • Expect most samples to have at least one very serious mis-regulated gene from this category. • Compute maximum aberration score across category

  27. Aberrations • Aberration score indicated by color: vanilla: 0; red: 4 • Nine normals at left • No gene misregulated in even 50% of samples • BUT: Only a few genes commonly misregulated

  28. Simplest Summary • Maximum aberration score for samples

  29. Testing the Pathway for Outliers • Many genes show aberrations in tumor group • Null distribution: medians of maxima from randomly selected gene groups of size 37 • P < .01 NB. The results for cell-matrix interaction are very similar; angiogenesis not so strong

More Related