Introduction to molecular networks
Download
1 / 42

Introduction to Molecular Networks - PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on

Introduction to Molecular Networks. BMI/CS 576 www.biostat.wisc.edu/bmi576.html Sushmita Roy [email protected] Nov 27 th , 2012. Different types of networks. Physical networks Protein-DNA : interactions between regulatory proteins (transcription factors) and regulatory DNA

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Introduction to Molecular Networks' - frisco


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Introduction to molecular networks

Introduction to Molecular Networks

BMI/CS 576

www.biostat.wisc.edu/bmi576.html

Sushmita Roy

[email protected]

Nov 27th, 2012


Different types of networks
Different types of networks

  • Physical networks

    • Protein-DNA: interactions between regulatory proteins (transcription factors) and regulatory DNA

    • Protein-protein: interactions among proteins

    • Signaling networks: interactions between protein and small molecules, and among proteinsthat relay signals from outside the cell to the nucleus

  • Functional networks

    • metabolic: describe reactions through which enzymes convert substrates to products

    • genetic: describe interactions among genes which when genetically perturbed together produce a significant phenotype than individually

    • co-expression: describes the dependency between expression patterns of genes under different conditions


Protein dna interactions transcriptional regulato ry networks
Protein-DNA interactions Transcriptional regulatory networks

S. cerevisiae:

E. coli

153 TFs (green & light red), 1319 targets

157 TFs and 4410 targets

Vargas and Santillan, 2008


Detecting protein dna interactions
Detecting protein-DNA interactions

  • ChIP-chip

  • ChIP-seq

  • Promoter scanning of sequence-specific motifs

  • DNAseI hypersensitivity maping

  • Chromatin marks to identify “regulatory regions” followed by scanning using sequence-specific motifs


Protein dna interaction example
Protein-DNA interaction example

  • goal: determine the (approximate) locations in the genome where a given protein binds

  • ChIP-chip and ChIP-chip binding profiles for transcription factors

Peter Park, Nature Reviews Genetics, 2009


Protein protein interaction networks
Protein-protein interaction networks

Yeast

Human

Node colors:

Red: lethal, green: non-lethal, yellow: slow growth

Edge colors:

Red:Rualet al., blue: literature

Barabasi et al. 2003, Rual et al. 2005


Detecting protein protein interactions
Detecting protein-protein interactions

  • Binary interactions

    • Yeast two-hybrid:Uses a transcription factors with two domains: each fused to proteins of interest, and a reporter gene

    • Protein Complementation Assay

  • Complexes

    • Tandem Affinity Purification (TAP) with Mass-spectrometry

    • Makes use of a TAP tag attached to a protein of interest. Protein and complex are pulled and purified in two steps.

Yeast two hybrid

TAP

Protein complementation

Shoemaker and Panchenko, 2007, PloS computational biology, Xu et al, Protein Expression and Purification, 2010



Metabolic networks

gene products

other molecules

Figure from KEGG database


Genetic interaction networks
Genetic interaction networks

Dixon et al., 2009, Annu. Rev. Genet


Yeast genetic interaction network
Yeast genetic interaction network

Costanzo et al, 2011


Computational challenges in networks
Computational challenges in networks

  • Identifying the connectivity

    • Structure and parameter learning

  • Using the connectivity to infer function and activation

    • Network-based predictive models

  • Analyzing the network structure

    • Graph clustering

    • Graph properties

    • Network motifs

We will study these questions in the context of transcriptional regulatory networks


Network model r epresentations
Network model representations

  • Unweighted graphs

  • Boolean networks

  • Bayesian networks and related graphical models

  • Differential equations

  • Petri nets

  • Constraint-based models

  • etc.


Transcriptional gene regulation
Transcriptional gene regulation

Input: Transcription factor level (trans)

Sko1

Hot1

HSP12

Input: Transcription factor binding sites (cis)

Output: mRNA levels

Transcriptional regulatory network connects TFs to target genes


Regulatory network inference from expression
Regulatory network inference from expression

Expression-based network inference


Modeling a regulatory network
Modeling a regulatory network

Sko1

Hot1

HSP12

X2

X1

Hot1

Sko1

BOOLEAN

LINEAR

DIFF. EQNS

PROBABILISTIC

….

Hot1 regulates HSP12

ψ(X1,X2)

HSP12 is a target of Hot1

HSP12

Y

Function

Structure

Who are the regulators?

How they determine expression levels?


Network inference from expression is a computationally difficult problem
Network inference from expression is a computationally difficult problem

  • Given 2 TFs and 3 nodes how many possible networks can there be?

….

Not exhaustive set of possible networks

There can be a total of 26 possible networks.


Why is this problem so hard
Why is this problem so hard? difficult problem

  • Assume we have n target genes and mTFs.

  • Number of possible edges: nXm

  • For example, with 4500 target genes and 300 TFs we have 1.35 million edges!

  • Number of possible networks is 2nXm

Need clever methods to address this large space of possibilities.


Two classes of expression based methods
Two classes of expression-based methods difficult problem

  • Per-gene/direct methods

  • Module based methods


Per gene methods
Per-gene methods difficult problem

  • Key idea: find the regulators that “best explain” expression of a gene

  • Mutual Information

    • Context Likelihood of relatedness

    • ARACNE

  • Probabilistic methods

    • Bayesian network: Sparse Candidates

  • Regression

    • TIGRESS

    • GENIE-3


Per gene methods can be further classified based on how regulators are added
Per-gene methods can be further classified based on how regulators are added

  • Pairwise:

    • Ask if TF Y and gene X have a high statistical correlation/mutual information

    • Examples are CLR and ARACNE

  • Higher-order:

    • Ask if TFs {Y1,Y2..YK} explain expression of X best

    • Regression, Bayesian networks, Dependency networks


Pairwise methods
Pairwise regulators are added methods

  • ARACNE

  • CLR

Both need to find a good way to pick a cutoff of what is an edge vs not


Information theory for measuring dependence
Information theory for measuring dependence regulators are added

  • I(X,Y) is the mutual information between two variables

    • Knowing X, how much information do I have for Y

  • P(Z) is the probability distribution of Z


Aracne
ARACNE regulators are added

Getting rid of indirect links:

Target

X2

X1

X3

Regulators

X1

I(X1,X2)

I(X1,X3)

X2

X3

I(X2,X3)

Exclude edges with lowest information in a triplet

I(X2,X3) < min(I(X1,X2),I(X1,X3))

These typically correspond to low mutual information.

Margolin et al 2006


Context likelihood of relatedness clr
Context regulators are addedLikelihood of Relatedness (CLR)

  • For a genejand regulator i, context is defined by the mutual information of j with all other regulators, and mutual information of i with all other target genes.

  • Use the contexts to compute two background distributions of mutual information

  • Get a z-value for Mij with respect to these distributions.

  • Final z-value is the square root of these z-values

  • Call an edge is z-value is greater than a cutoff.


Context likelihood of relatedness
Context Likelihood of Relatedness regulators are added

Mij

i

j

zij is the likelihood of observing Mij from either distribution by chance

Use zij to decide if gene i regulates gene j.


Higher order models for network inference
Higher order models for network regulators are addedinference

  • Bayesian networks

  • Dependency networks

Random variables encode expression levels

Sho1

Msb2

Regulators

X2

X1

X1

Ste20

Y3=f(X1,X2)

X2

Y3

Target

Y3

Structure

Function

Goal: learn the structure and function of these networks


Bayesian networks
Bayesian regulators are added networks

  • a BN is a Directed Acyclic Graph (DAG) in which

    • the nodes denote random variables

    • each node X has a conditional probability distribution (CPD) representing P(X | Parents(X))

  • the intuitive meaning of an arc from X to Y is that X directly influences Y

  • Provides a tractable way to work with large joint distributions


Bayesian networks for representing regulatory networks
Bayesian networks regulators are addedfor representing regulatory networks

?

?

?

Regulators (parents)

Yi

Conditional probability distribution (CPD)

Target (child)


Example bayesian network
Example Bayesian network regulators are added

Parents

X2

X1

X4

X3

Child

Assume Xi is binary

X5

Needs 25 measurements

No independence assertions

Needs 23 measurements

Independence assertions


Representing cpds for discrete variables

P( regulators are addedD | A, B,C) as a tree

A

f

t

Pr(D =t) = 0.9

B

f

t

Pr(D =t) = 0.5

C

f

t

Pr(D =t) = 0.8

Pr(D =t) = 0.5

Representing CPDs for discrete variables

  • CPDs can be represented using tables or trees

  • consider the following case with Boolean variables A, B, C, D

P( D | A, B,C) as a table


Representing cpds for continuous variables
Representing regulators are addedCPDs for continuous variables

Parameters

X2

X1

X3

Conditional Gaussian


Dependency networks a set of regression problems
Dependency networks: a set of regression problems regulators are added

Regulators

1

p

1

1

?

?

?

1

Yi

X1 …… Xp

=

bj

Yi

d

p

d

Function: Linear regression

Regularization term

Number of genes


Two classes of expression based methods1
Two classes of expression-based methods regulators are added

  • Per-gene/direct methods

  • Module based methods


An expression module
An expression module regulators are added

Set of genes that are co-expressed in a set of conditions

Genes

Genes

Modules

Genes

Gasch & Eisen, 2002


Expression modules identified by expression clustering
Expression modules identified by expression clustering regulators are added

Experiments

M1

Cluster

M2

Genes

M3


Module networks
Module Networks regulators are added

Revisit the modules

Learn regulators per module

Y2

Y1

Y2

Y1

X1

X2

X2

X1

X2

M1

X1

X3

X4

X4

X3

X4

X3

M2

X5

Y2

Y1

Y2

Y1

X6

X7

X6

X7

X5

X6

X8

X7

X5

X8

X8

M3

Every gene in a module has the same set of regulatory program

Lee et al 2009, Segal et al 03


Modeling the relationship between regulators and targets
Modeling the relationship between regulators and targets regulators are added

  • suppose we have a set of (8) genes that all have in their upstream regions the same activator/repressor binding sites


Modeling the relationship between regulators and targets1
Modeling the relationship between regulators and targets regulators are added

X1 > e1

Each path captures a mode of regulation

NO

YES

Activating

regulation

X2 > e2

Activating

regulation

YES

NO

Repressing

regulation

Expression of target modeled using Gaussians at each leaf node


The respiration and carbon module
The Respiration and Carbon Module regulators are added


Global view of modules
Global View regulators are addedof Modules

  • modules for common processes often share common

    • regulators

    • binding site motifs


Comparing module lemone and per gene clr methods
Comparing module ( regulators are addedLeMoNe) and per-gene (CLR) methods


ad