Loading in 5 sec....

Reverse engineering gene regulatory networksPowerPoint Presentation

Reverse engineering gene regulatory networks

- 167 Views
- Uploaded on
- Presentation posted in: General

Reverse engineering gene regulatory networks

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Learning signalling pathways and regulatory networks from postgenomic data

true network

Does the extracted network provide a good prediction of the true interactions?

Reverse Engineering of Regulatory Networks

- Can we learn the network structure from postgenomic data themselves?
- Statistical methods to distinguish between
- Direct interactions
- Indirect interactions

- Challenge: Distinguish between
- Correlations
- Causal interactions

- Breaking symmetries with active interventions:
- Gene knockouts (VIGs, RNAi)

- Relevance networks
- Graphical Gaussian models
- Bayesian networks

- Relevance networks
- Graphical Gaussian models
- Bayesian networks

Relevance networks(Butte and Kohane, 2000)

- Choose a measure of association A(.,.)
- Define a threshold value tA
- For all pairs of domain variables (X,Y) compute their association A(X,Y)
4. Connect those variables (X,Y) by an undirected edge whose association A(X,Y) exceeds the predefined threshold value tA

2

‘direct interaction’

X

1

2

1

2

X

X

‘common regulator’

1

1

2

2

‘indirect interaction’

strong correlation σ12

Pairwise associations without taking the context of the system into consideration

- Relevance networks system into consideration
- Graphical Gaussian models
- Bayesian networks

1 system into consideration

2

direct interaction

1

2

Graphical Gaussian Modelsstrong partial correlation π12

Partial correlation, i.e. correlation

conditional on all other domain variables

Corr(X1,X2|X3,…,Xn)

Distinguish between direct and indirect interactions system into consideration

direct

interaction

common

regulator

indirect

interaction

co-regulation

A and B have a low partial correlation

1 system into consideration

2

direct interaction

1

2

Graphical Gaussian Modelsstrong partial correlation π12

Partial correlation, i.e. correlation

conditional on all other domain variables

Corr(X1,X2|X3,…,Xn)

Problem: #observations < #variables

Shrinkage estimation and the lemma of Ledoit-Wolf system into consideration

Shrinkage estimation and the lemma of Ledoit-Wolf system into consideration

Graphical Gaussian Models system into consideration

direct

interaction

common

regulator

indirect

interaction

P(A,B)=P(A)·P(B)

But: P(A,B|C)≠P(A|C)·P(B|C)

Undirected versus directed edges system into consideration

- Relevance networks and Graphical Gaussian models can only extract undirected edges.
- Bayesian networks can extract directed edges.
- But can we trust in these edge directions?
It may be better to learn undirected edges than learning directed edges with false orientations.

- Relevance networks system into consideration
- Graphical Gaussian models
- Bayesian networks

Bayesian networks system into consideration

- Marriage between graph theory and probability theory.
- Directed acyclic graph (DAG) representing conditional independence relations.
- It is possible to score a network in light of the data: P(D|M), D:data, M: network structure.
- We can infer how well a particular network explains the observed data.

NODES

A

B

C

EDGES

D

E

F

Bayesian networks versus system into considerationcausal networks

Bayesian networks represent conditional (in)dependence relations - not necessarily causal interactions.

Node A unknown system into consideration

A

A

True causal graph

B

C

B

C

Bayesian networks versus causal networks

Bayesian networks versus system into considerationcausal networks

A

A

A

B

C

B

C

B

C

- Equivalence classes: networks with the same scores: P(D|M).
- Equivalent networks cannot be distinguished in light of the data.

A

B

C

A system into consideration

C

B

Equivalence classes of BNsA

C

B

A

C

A

B

P(A,B)≠P(A)·P(B)

P(A,B|C)=P(A|C)·P(B|C)

C

B

A

C

completed partially directed graphs (CPDAGs)

B

v-structure

A

P(A,B)=P(A)·P(B)

P(A,B|C)≠P(A|C)·P(B|C)

C

B

Interventional data system into consideration

A and B are correlated

A

B

inhibition of A

A

B

A

B

A

B

down-regulation of B

no effect on B

Learning Bayesian networks from data system into consideration

P(M|D) = P(D|M) P(M) / Z

M: Network structure. D: Data

Learning Bayesian networks from data system into consideration

P(M|D) = P(D|M) P(M) / Z

M: Network structure. D: Data

Evaluation system into consideration

- On real experimental data, using the gold standard network from the literature
- On synthetic data simulated from the gold-standard network

Evaluation system into consideration

- On real experimental data, using the gold standard network from the literature
- On synthetic data simulated from the gold-standard network

From Sachs et al., Science 2005 system into consideration

Evaluation: system into considerationRaf signalling pathway

- Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell
- Deregulation carcinogenesis
- Extensively studied in the literature gold standard network

Raf regulatory network system into consideration

From Sachs et al Science 2005

Flow cytometry data system into consideration

- Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins
- 5400 cells have been measured under 9 different cellular conditions (cues)
- Downsampling to 100 instances (5 separate subsets): indicative of microarray experiments

Two types of experiments system into consideration

Evaluation system into consideration

- On real experimental data, using the gold standard network from the literature
- On synthetic data simulated from the gold-standard network

Comparison with simulated data 1 system into consideration

Raf pathway system into consideration

Comparison with simulated data 2 system into consideration

Comparison with simulated data 2 system into consideration

Steady-state approximation

Real biological data: system into considerationfull complexity of biological systems.

The “gold-standard” only represents our current state of knowledge; it is not guaranteed to represent the true network.

Simulated data: Simplifications that might be biologically unrealistic.

We know the true network.

Real versus simulated dataHow can we evaluate the reconstruction accuracy system into consideration?

extracted network system into consideration

true network

Evaluation of

learning

performance

biological knowledge

(gold standard network)

Performance evaluation: system into considerationROC curves

Performance evaluation: system into considerationROC curves

- We use the Area Under the Receiver Operating Characteristic Curve(AUC).

AUC=1

0.5<AUC<1

AUC=0.5

Alternative performance evaluation: True positive (TP) scores

We set the threshold such that we obtain 5 spurious edges (5 FPs) and count the corresponding number of true edges (TP count).

Directed graph evaluation - scoresDGE

true regulatory network

edge scores

data

high

low

Thresholding

concrete network

predictions

TP:1/2

FP:0/4

TP:2/2

FP:1/4

Undirected graph evaluation - scoresUGE

skeleton of the

true regulatory network

undirected edge scores

data

high

low

Thresholding

concrete network

(skeleton) predictions

TP:1/2

FP:0/1

TP:2/2

FP:1/1

Synthetic data, observations scores

How can we explain the difference between synthetic scoresand real data ?

Simulated data are “simpler”. scores

No mismatch between models used for data generation and inference.

Complications with real data scores

Can we trust our gold-standard network?

Raf regulatory network scores

From Sachs et al Science 2005

Disputed structure of the gold-standard network scores

Regulation of Raf-1 by Direct Feedback Phosphorylation. Molecular Cell, Vol. 17, 2005 Dougherty et al

Complications with real data scores

Interventions might not be “ideal” owing to negative feedback loops.

Stabilisation

through negative feedback loops

inhibition

Conclusions 1 scores

- BNs and GGMs outperform RNs, most notably on Gaussian data.
- No significant difference between BNs and GGMs on observational data.
- For interventional data, BNs clearly outperform GGMs and RNs, especially when taking the edge direction (DGE score) rather than just the skeleton (UGE score) into account.

Conclusions 2 scores

Performance on synthetic data better than on real data.

- Real data: more complex
- Real interventions are not ideal
- Errors in the gold-standard network

Unfolding in time scores