1 / 37

Joint analysis of regulatory networks and expression profiles

Joint analysis of regulatory networks and expression profiles. Ron Shamir School of Computer Science Tel Aviv University April 2013. Sources: Igor Ulitsky and Ron Shamir. Identification of Functional Modules using Network Topology and High-Throughput Data. BMC Systems Biology 1:8 (2007).

elmo
Download Presentation

Joint analysis of regulatory networks and expression profiles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Joint analysis of regulatory networks and expression profiles Ron Shamir School of Computer Science Tel Aviv University April 2013 Sources: Igor Ulitsky and Ron Shamir. Identification of Functional Modules using Network Topology and High-Throughput Data. BMC Systems Biology 1:8 (2007). Igor Ulitsky and Ron Shamir. Identifying functional modules using expression profiles and confidence-scored protein interactions. Bioinformatics Vol. 25 no. 9 1158-1164 (2009) . 1

  2. Outline Background Joint network and expression profiles Matisse Cezanne

  3. Background

  4. DNA protein RNA One program Its output The hard disk transcription translation

  5. DNA Microarrays / RNA-seq Simultaneous measurement of expression levels of all genes / transcripts. Perform 105-109 measurements in one experiment Allow global view of cellular processes. The most important biotechnological breakthroughs of the last /current decade http://www.biomedcentral.com/1471-2105/12/323/figure/F2

  6. The Raw Data genes experiments • Entries of the Raw Data matrix: expression levels. • Ratios/absolute values/… • expression patternfor each gene • Profile for eachexperiment • /condition/sample/chip • Needs normalization!

  7. A. Maron, R. Sharan Bioinformatics 03 A. Maron-Katz, A. Tanay, C. Linhart, I. Steinfeld, R. Sharan, Y. Shiloh, R. Elkon BMC Bioinformatics 05 EXPression ANalyzer and DisplayER Ulitsky et al. Nature Protocols 10 Clustering Identify clusters of co-expressed genes CLICK, KMeans, SOM, hierarchical Promoter analysis Analyze TF binding sites of co-regulated genes PRIMA Biclustering Identify homogeneous submatrices SAMBA Visualization Function. enrichment GO, TANGO microRNA function inference: FAME • http://acgt.cs.tau.ac.il/expander

  8. Networks of Protein-protein interactions (PPIs) Large, readily available resource Representation: Network with nodes=proteins/genes edges=interactions • Analysis methods: • Global properties • Motif content analysis • Complex extraction • Cross-species comparison

  9. The hairball syndrome

  10. Potential inroad into pathways and function Can the network help to improve the analysis?

  11. Analysis of gene expression profiles + a network

  12. Goal • Challenge: Detect active functional modules: connected subnetwork of proteins whose genes are co-expressed • “Where is the action in the network in a particular experiment?”

  13. 13 Ron Shamir, RNA Antalia, April 08

  14. Ulitsky & Shamir BMC Systems Biology 07

  15. Interaction High expression similarity Modular Analysis for Topology of Interactions and Similarity SEts • Input: Expression data and a PPI network • Output: a collection of modules • Connected PPI subnetworks • Correlated expression profiles http://acgt.cs.tau.ac.il/matisse

  16. Probabilistic model • Event Mij: i,j are mates = highly co-expressed • P(Sij|Mij) ~ N(m , 2m) • P(Sij|Mij) ~ N(n , 2n) • H0: U is a set of unrelated genes • H1: U is a module = connected subnetwork with high internal similarity • Ri: gene i transcriptionally regulated • m: fraction of mates out of module gene pairs that are transcriptionally regulated • m= P(Mij| Ri  Rj, H1) • pm: fraction of mates out of all gene pairs that are transcriptionally regulated

  17. Probabilistic model (2) • Is connected gene set U a module? Assuming pair indep: • Define mij= m P(Ri)P(Rj) • Define nij= pm P(Ri)P(Rj). • Likelihood ratio Pr(Data|H1)/Pr Data|H0) • Taking log: sum of terms ij:

  18. Probabilistic model - summary • Similarities:mixture of two Gaussians • For a candidate group U, the likelihood ratio of originating from a module or from the background is • Module score = Gene group likelihood ratio = sum over all the gene pairs • Find connected subgraphs U with high WU

  19. Complexity Finding heaviest connected subgraph: NP hard even without connectivity constraints (+/- edge weights) Devised a heuristic for the problem

  20. MATISSE workflow • Seed generation • Greedy optimization • Significance filtering

  21. Finding seeds • Three seeding alternatives tested • All alternatives build a seed and delete it from the network • Building small seeds around single nodes: • Best neighbors • All neighbors • Approximating the heaviest subgraph • Delete low-degree nodes and record the heaviest subnetwork found

  22. Greedy optimization • Simultaneous optimization of all the seeds • The following steps are considered: • Node addition • Node removal • Assignment change • Module merge

  23. Front vs. Back nodes • Only a fraction of the genes (front nodes) have meaningful similarity values • MATISSE can link them using other genes (backnodes). • Back nodes correspond to: • Unmeasured transcripts • Post-translational regulation • Partially regulated pathways

  24. Test case: Yeast osmotic shock Network: 65,990 PPIs & protein-DNA interactions among 6,246 genes Expression: 133 experimental conditions – response of perturbed strains to osmotic shock (O’Rourke & Herskowitz 04) Front nodes: 2,000 genes with the highest variance

  25. Pheromone response subnetwork Back Front

  26. Performance comparison % of modules with category enrichment at p< 10-3 % annotations enriched at p<10-3 in modules

  27. GO and promoter analysis (c)

  28. F. Müller, L. Laurent, D. Kostka, I. Ulitsky, R. Williams, C. Lu, I. Park, M. Rao, P. Schwartz, N. Schmidt, J. LoringNature 08 ~150 human stem cell lines of diverse types profiled using microarrays Clustered profiles into groups Adjusted Matisse to seek subnetworks that characteristic to each group Focused analysis on pluripotent stem cells Application to stem cells

  29. Pluripotent stem cells network Highlights the key protein machinery underlying pluripotency

  30. Ulitsky & Shamir Bioinformatics 2009

  31. Accounting for PPI confidence PPI-based analysis is made difficult by abundant false positive / negative interactions Various methods can assign confidence (probability) to individual edges Idea: seek modules that are connected with high probability Ulitsky & Shamir Bioinformatics, 2009

  32. CEZANNE: (Co-Expression Zone ANalysis using NEtworks) Edge probability p(e)Edge weight –log(1-p(e)) For any WU, ≥1 edge connects W with U\Wwith probability q (e.g. 0.95) The weight of the minimum cut of U is at least -log(1-q) Algorithm: among the subnets whose minimum cut exceeds -log(1-q) find the one with the maximum co-expression score P({A},{B,C,D})=1-0.3*0.3=0.91 A P({A,B,D},{C})=0.994 0.7 C minimum cut 0.7 0.8 0.9 D B P({A,B},{C,D})=0.94 P({A,C,D},{B})=0.94

  33. DNA damage response in S. cerevisiae 47 DNA Damage Response expression profiles(Gasch et al., 01) Front nodes: 2,074 genes with at least two-fold expression change Network and confidence values: purification enrichment (PE) scores (Collins et al. 07)

  34. DNA damage response modules Cytoplasmic ribosome biogenesis Suggests SWS2 a novel member Novel pathway enriched with actin-localized proteins; Supported in other datasets; Similar deletion phenotypes Proteasome Mitochondrial ribosome – small subunit Spliceosome Novel actin-localized pathway? Mitochondrial ribosome – large subunit Ribonuclease P Trehalose biosynthesis PKA Hsp90

  35. Comparison with prior work Combined measure of sensitivity (% of annotations enriched) and specificity (% of modules enriched) with p<0.001 Clustering expression & network (Hanisch et al., 2002) Expression similarity + confident network connectivity Expression similarity + network connectivity Clustering of only expression data

  36. Summary • Algorithms using co-expression + networks to detect functionally coherent modules • Accommodate both sparse and dense subnetworks • Subnetworks linked to osmotic shock and DNA damage • A general framework for confident connectivity in PPI networks • The next steps: • Co-expression is not the only interesting way to utilize GE data • Scaling to complex human datasets

More Related