Introduction to Molecular Networks

Introduction to Molecular Networks BMI/CS 576 www.biostat.wisc.edu/bmi576.html Sushmita Roy sroy@biostat.wisc.edu Nov 27th, 2012

Different types of networks • Physical networks • Protein-DNA: interactions between regulatory proteins (transcription factors) and regulatory DNA • Protein-protein: interactions among proteins • Signaling networks: interactions between protein and small molecules, and among proteinsthat relay signals from outside the cell to the nucleus • Functional networks • metabolic: describe reactions through which enzymes convert substrates to products • genetic: describe interactions among genes which when genetically perturbed together produce a significant phenotype than individually • co-expression: describes the dependency between expression patterns of genes under different conditions

Protein-DNA interactions Transcriptional regulatory networks S. cerevisiae: E. coli 153 TFs (green & light red), 1319 targets 157 TFs and 4410 targets Vargas and Santillan, 2008

Detecting protein-DNA interactions • ChIP-chip • ChIP-seq • Promoter scanning of sequence-specific motifs • DNAseI hypersensitivity maping • Chromatin marks to identify “regulatory regions” followed by scanning using sequence-specific motifs

Protein-DNA interaction example • goal: determine the (approximate) locations in the genome where a given protein binds • ChIP-chip and ChIP-chip binding profiles for transcription factors Peter Park, Nature Reviews Genetics, 2009

Protein-protein interaction networks Yeast Human Node colors: Red: lethal, green: non-lethal, yellow: slow growth Edge colors: Red:Rualet al., blue: literature Barabasi et al. 2003, Rual et al. 2005

Detecting protein-protein interactions • Binary interactions • Yeast two-hybrid:Uses a transcription factors with two domains: each fused to proteins of interest, and a reporter gene • Protein Complementation Assay • Complexes • Tandem Affinity Purification (TAP) with Mass-spectrometry • Makes use of a TAP tag attached to a protein of interest. Protein and complex are pulled and purified in two steps. Yeast two hybrid TAP Protein complementation Shoemaker and Panchenko, 2007, PloS computational biology, Xu et al, Protein Expression and Purification, 2010

Signaling networks

Metabolic networks gene products other molecules Figure from KEGG database

Genetic interaction networks Dixon et al., 2009, Annu. Rev. Genet

Yeast genetic interaction network Costanzo et al, 2011

Computational challenges in networks • Identifying the connectivity • Structure and parameter learning • Using the connectivity to infer function and activation • Network-based predictive models • Analyzing the network structure • Graph clustering • Graph properties • Network motifs We will study these questions in the context of transcriptional regulatory networks

Network model representations • Unweighted graphs • Boolean networks • Bayesian networks and related graphical models • Differential equations • Petri nets • Constraint-based models • etc.

Transcriptional gene regulation Input: Transcription factor level (trans) Sko1 Hot1 HSP12 Input: Transcription factor binding sites (cis) Output: mRNA levels Transcriptional regulatory network connects TFs to target genes

Regulatory network inference from expression Expression-based network inference

Modeling a regulatory network Sko1 Hot1 HSP12 X2 X1 Hot1 Sko1 BOOLEAN LINEAR DIFF. EQNS PROBABILISTIC …. Hot1 regulates HSP12 ψ(X1,X2) HSP12 is a target of Hot1 HSP12 Y Function Structure Who are the regulators? How they determine expression levels?

Network inference from expression is a computationally difficult problem • Given 2 TFs and 3 nodes how many possible networks can there be? …. Not exhaustive set of possible networks There can be a total of 26 possible networks.

Why is this problem so hard? • Assume we have n target genes and mTFs. • Number of possible edges: nXm • For example, with 4500 target genes and 300 TFs we have 1.35 million edges! • Number of possible networks is 2nXm Need clever methods to address this large space of possibilities.

Two classes of expression-based methods • Per-gene/direct methods • Module based methods

Per-gene methods • Key idea: find the regulators that “best explain” expression of a gene • Mutual Information • Context Likelihood of relatedness • ARACNE • Probabilistic methods • Bayesian network: Sparse Candidates • Regression • TIGRESS • GENIE-3

Per-gene methods can be further classified based on how regulators are added • Pairwise: • Ask if TF Y and gene X have a high statistical correlation/mutual information • Examples are CLR and ARACNE • Higher-order: • Ask if TFs {Y1,Y2..YK} explain expression of X best • Regression, Bayesian networks, Dependency networks

Pairwise methods • ARACNE • CLR Both need to find a good way to pick a cutoff of what is an edge vs not

Information theory for measuring dependence • I(X,Y) is the mutual information between two variables • Knowing X, how much information do I have for Y • P(Z) is the probability distribution of Z

ARACNE Getting rid of indirect links: Target X2 X1 X3 Regulators X1 I(X1,X2) I(X1,X3) X2 X3 I(X2,X3) Exclude edges with lowest information in a triplet I(X2,X3) < min(I(X1,X2),I(X1,X3)) These typically correspond to low mutual information. Margolin et al 2006

Context Likelihood of Relatedness (CLR) • For a genejand regulator i, context is defined by the mutual information of j with all other regulators, and mutual information of i with all other target genes. • Use the contexts to compute two background distributions of mutual information • Get a z-value for Mij with respect to these distributions. • Final z-value is the square root of these z-values • Call an edge is z-value is greater than a cutoff.

Context Likelihood of Relatedness Mij i j zij is the likelihood of observing Mij from either distribution by chance Use zij to decide if gene i regulates gene j.

Higher order models for network inference • Bayesian networks • Dependency networks Random variables encode expression levels Sho1 Msb2 Regulators X2 X1 X1 Ste20 Y3=f(X1,X2) X2 Y3 Target Y3 Structure Function Goal: learn the structure and function of these networks

Bayesian networks • a BN is a Directed Acyclic Graph (DAG) in which • the nodes denote random variables • each node X has a conditional probability distribution (CPD) representing P(X | Parents(X)) • the intuitive meaning of an arc from X to Y is that X directly influences Y • Provides a tractable way to work with large joint distributions

Bayesian networks for representing regulatory networks … ? ? ? Regulators (parents) Yi Conditional probability distribution (CPD) Target (child)

Example Bayesian network Parents X2 X1 X4 X3 Child Assume Xi is binary X5 Needs 25 measurements No independence assertions Needs 23 measurements Independence assertions

P( D | A, B,C) as a tree A f t Pr(D =t) = 0.9 B f t Pr(D =t) = 0.5 C f t Pr(D =t) = 0.8 Pr(D =t) = 0.5 Representing CPDs for discrete variables • CPDs can be represented using tables or trees • consider the following case with Boolean variables A, B, C, D P( D | A, B,C) as a table

Representing CPDs for continuous variables Parameters X2 X1 X3 Conditional Gaussian

Dependency networks: a set of regression problems Regulators 1 p 1 … 1 ? ? ? 1 Yi X1 …… Xp = bj Yi d p d Function: Linear regression Regularization term Number of genes

Two classes of expression-based methods • Per-gene/direct methods • Module based methods

An expression module Set of genes that are co-expressed in a set of conditions Genes Genes Modules Genes Gasch & Eisen, 2002

Expression modules identified by expression clustering Experiments M1 Cluster M2 Genes M3

Module Networks Revisit the modules Learn regulators per module Y2 Y1 Y2 Y1 X1 X2 X2 X1 X2 M1 X1 X3 X4 X4 X3 X4 X3 M2 X5 Y2 Y1 Y2 Y1 X6 X7 X6 X7 X5 X6 X8 X7 X5 X8 X8 M3 Every gene in a module has the same set of regulatory program Lee et al 2009, Segal et al 03

Modeling the relationship between regulators and targets • suppose we have a set of (8) genes that all have in their upstream regions the same activator/repressor binding sites

Modeling the relationship between regulators and targets X1 > e1 Each path captures a mode of regulation NO YES Activating regulation X2 > e2 Activating regulation YES NO Repressing regulation Expression of target modeled using Gaussians at each leaf node

The Respiration and Carbon Module

Global View of Modules • modules for common processes often share common • regulators • binding site motifs

Comparing module (LeMoNe) and per-gene (CLR) methods

Introduction to Molecular Networks

Introduction to Molecular Networks

Presentation Transcript

Introduction to Networks

Introduction to molecular biology

An Introduction to Molecular Biology

Introduction to Molecular Genetics

Introduction to Molecular Epidemiology

Introduction to Molecular Evolution

MBT2000 Introduction to Molecular Biotechnology

Introduction to molecular biology…

Introduction to Networks

Introduction to Networks

Introduction to Networks

Introduction to Molecular Dynamics

Introduction to Networks

Introduction to Molecular Biology

Introduction to Molecular Biology

Introduction to Molecular Biology

Introduction to Molecular Cloning

Introduction to Molecular Evolution

Introduction to Molecular Simulation

Introduction to Networks

Introduction to Molecular Biology