1 / 42

Introduction to Molecular Networks

Introduction to Molecular Networks. BMI/CS 576 www.biostat.wisc.edu/bmi576.html Sushmita Roy sroy@biostat.wisc.edu Nov 27 th , 2012. Different types of networks. Physical networks Protein-DNA : interactions between regulatory proteins (transcription factors) and regulatory DNA

frisco
Download Presentation

Introduction to Molecular Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Molecular Networks BMI/CS 576 www.biostat.wisc.edu/bmi576.html Sushmita Roy sroy@biostat.wisc.edu Nov 27th, 2012

  2. Different types of networks • Physical networks • Protein-DNA: interactions between regulatory proteins (transcription factors) and regulatory DNA • Protein-protein: interactions among proteins • Signaling networks: interactions between protein and small molecules, and among proteinsthat relay signals from outside the cell to the nucleus • Functional networks • metabolic: describe reactions through which enzymes convert substrates to products • genetic: describe interactions among genes which when genetically perturbed together produce a significant phenotype than individually • co-expression: describes the dependency between expression patterns of genes under different conditions

  3. Protein-DNA interactions Transcriptional regulatory networks S. cerevisiae: E. coli 153 TFs (green & light red), 1319 targets 157 TFs and 4410 targets Vargas and Santillan, 2008

  4. Detecting protein-DNA interactions • ChIP-chip • ChIP-seq • Promoter scanning of sequence-specific motifs • DNAseI hypersensitivity maping • Chromatin marks to identify “regulatory regions” followed by scanning using sequence-specific motifs

  5. Protein-DNA interaction example • goal: determine the (approximate) locations in the genome where a given protein binds • ChIP-chip and ChIP-chip binding profiles for transcription factors Peter Park, Nature Reviews Genetics, 2009

  6. Protein-protein interaction networks Yeast Human Node colors: Red: lethal, green: non-lethal, yellow: slow growth Edge colors: Red:Rualet al., blue: literature Barabasi et al. 2003, Rual et al. 2005

  7. Detecting protein-protein interactions • Binary interactions • Yeast two-hybrid:Uses a transcription factors with two domains: each fused to proteins of interest, and a reporter gene • Protein Complementation Assay • Complexes • Tandem Affinity Purification (TAP) with Mass-spectrometry • Makes use of a TAP tag attached to a protein of interest. Protein and complex are pulled and purified in two steps. Yeast two hybrid TAP Protein complementation Shoemaker and Panchenko, 2007, PloS computational biology, Xu et al, Protein Expression and Purification, 2010

  8. Signaling networks

  9. Metabolic networks gene products other molecules Figure from KEGG database

  10. Genetic interaction networks Dixon et al., 2009, Annu. Rev. Genet

  11. Yeast genetic interaction network Costanzo et al, 2011

  12. Computational challenges in networks • Identifying the connectivity • Structure and parameter learning • Using the connectivity to infer function and activation • Network-based predictive models • Analyzing the network structure • Graph clustering • Graph properties • Network motifs We will study these questions in the context of transcriptional regulatory networks

  13. Network model representations • Unweighted graphs • Boolean networks • Bayesian networks and related graphical models • Differential equations • Petri nets • Constraint-based models • etc.

  14. Transcriptional gene regulation Input: Transcription factor level (trans) Sko1 Hot1 HSP12 Input: Transcription factor binding sites (cis) Output: mRNA levels Transcriptional regulatory network connects TFs to target genes

  15. Regulatory network inference from expression Expression-based network inference

  16. Modeling a regulatory network Sko1 Hot1 HSP12 X2 X1 Hot1 Sko1 BOOLEAN LINEAR DIFF. EQNS PROBABILISTIC …. Hot1 regulates HSP12 ψ(X1,X2) HSP12 is a target of Hot1 HSP12 Y Function Structure Who are the regulators? How they determine expression levels?

  17. Network inference from expression is a computationally difficult problem • Given 2 TFs and 3 nodes how many possible networks can there be? …. Not exhaustive set of possible networks There can be a total of 26 possible networks.

  18. Why is this problem so hard? • Assume we have n target genes and mTFs. • Number of possible edges: nXm • For example, with 4500 target genes and 300 TFs we have 1.35 million edges! • Number of possible networks is 2nXm Need clever methods to address this large space of possibilities.

  19. Two classes of expression-based methods • Per-gene/direct methods • Module based methods

  20. Per-gene methods • Key idea: find the regulators that “best explain” expression of a gene • Mutual Information • Context Likelihood of relatedness • ARACNE • Probabilistic methods • Bayesian network: Sparse Candidates • Regression • TIGRESS • GENIE-3

  21. Per-gene methods can be further classified based on how regulators are added • Pairwise: • Ask if TF Y and gene X have a high statistical correlation/mutual information • Examples are CLR and ARACNE • Higher-order: • Ask if TFs {Y1,Y2..YK} explain expression of X best • Regression, Bayesian networks, Dependency networks

  22. Pairwise methods • ARACNE • CLR Both need to find a good way to pick a cutoff of what is an edge vs not

  23. Information theory for measuring dependence • I(X,Y) is the mutual information between two variables • Knowing X, how much information do I have for Y • P(Z) is the probability distribution of Z

  24. ARACNE Getting rid of indirect links: Target X2 X1 X3 Regulators X1 I(X1,X2) I(X1,X3) X2 X3 I(X2,X3) Exclude edges with lowest information in a triplet I(X2,X3) < min(I(X1,X2),I(X1,X3)) These typically correspond to low mutual information. Margolin et al 2006

  25. Context Likelihood of Relatedness (CLR) • For a genejand regulator i, context is defined by the mutual information of j with all other regulators, and mutual information of i with all other target genes. • Use the contexts to compute two background distributions of mutual information • Get a z-value for Mij with respect to these distributions. • Final z-value is the square root of these z-values • Call an edge is z-value is greater than a cutoff.

  26. Context Likelihood of Relatedness Mij i j zij is the likelihood of observing Mij from either distribution by chance Use zij to decide if gene i regulates gene j.

  27. Higher order models for network inference • Bayesian networks • Dependency networks Random variables encode expression levels Sho1 Msb2 Regulators X2 X1 X1 Ste20 Y3=f(X1,X2) X2 Y3 Target Y3 Structure Function Goal: learn the structure and function of these networks

  28. Bayesian networks • a BN is a Directed Acyclic Graph (DAG) in which • the nodes denote random variables • each node X has a conditional probability distribution (CPD) representing P(X | Parents(X)) • the intuitive meaning of an arc from X to Y is that X directly influences Y • Provides a tractable way to work with large joint distributions

  29. Bayesian networks for representing regulatory networks … ? ? ? Regulators (parents) Yi Conditional probability distribution (CPD) Target (child)

  30. Example Bayesian network Parents X2 X1 X4 X3 Child Assume Xi is binary X5 Needs 25 measurements No independence assertions Needs 23 measurements Independence assertions

  31. P( D | A, B,C) as a tree A f t Pr(D =t) = 0.9 B f t Pr(D =t) = 0.5 C f t Pr(D =t) = 0.8 Pr(D =t) = 0.5 Representing CPDs for discrete variables • CPDs can be represented using tables or trees • consider the following case with Boolean variables A, B, C, D P( D | A, B,C) as a table

  32. Representing CPDs for continuous variables Parameters X2 X1 X3 Conditional Gaussian

  33. Dependency networks: a set of regression problems Regulators 1 p 1 … 1 ? ? ? 1 Yi X1 …… Xp = bj Yi d p d Function: Linear regression Regularization term Number of genes

  34. Two classes of expression-based methods • Per-gene/direct methods • Module based methods

  35. An expression module Set of genes that are co-expressed in a set of conditions Genes Genes Modules Genes Gasch & Eisen, 2002

  36. Expression modules identified by expression clustering Experiments M1 Cluster M2 Genes M3

  37. Module Networks Revisit the modules Learn regulators per module Y2 Y1 Y2 Y1 X1 X2 X2 X1 X2 M1 X1 X3 X4 X4 X3 X4 X3 M2 X5 Y2 Y1 Y2 Y1 X6 X7 X6 X7 X5 X6 X8 X7 X5 X8 X8 M3 Every gene in a module has the same set of regulatory program Lee et al 2009, Segal et al 03

  38. Modeling the relationship between regulators and targets • suppose we have a set of (8) genes that all have in their upstream regions the same activator/repressor binding sites

  39. Modeling the relationship between regulators and targets X1 > e1 Each path captures a mode of regulation NO YES Activating regulation X2 > e2 Activating regulation YES NO Repressing regulation Expression of target modeled using Gaussians at each leaf node

  40. The Respiration and Carbon Module

  41. Global View of Modules • modules for common processes often share common • regulators • binding site motifs

  42. Comparing module (LeMoNe) and per-gene (CLR) methods

More Related