1 / 89

Reverse engineering gene and protein regulatory networks using Graphical Models.

Reverse engineering gene and protein regulatory networks using Graphical Models. A comparative evaluation study. Marco Grzegorczyk Dirk Husmeier Adriano Werhli. Systems biology Learning signalling pathways and regulatory networks from postgenomic data. possibly completely unknown.

eydie
Download Presentation

Reverse engineering gene and protein regulatory networks using Graphical Models.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reverse engineering gene and protein regulatory networks using Graphical Models. A comparative evaluation study. Marco Grzegorczyk Dirk Husmeier Adriano Werhli

  2. Systems biology Learning signalling pathways and regulatory networks from postgenomic data

  3. possibly completely unknown

  4. possibly completely unknown E.g.: Flow cytometry experiments data Here: Concentrations of (phosphorylated) proteins

  5. possibly completely unknown E.g.: Flow cytometry experiments data data Machine Learning statistical methods

  6. extracted network true network Is the extracted network a good prediction of the real relationships?

  7. extracted network true network Evaluation of learning performance biological knowledge (gold standard network)

  8. Reverse Engineering of Regulatory Networks • Can we learnnetwork structures from postgenomic data themselves? • Are there statistical methods to distinguish between direct and indirect correlations? • Is it worth applying time-consuming Bayesian network approaches although computationally cheaper methods are available? • Do active interventions improve the learning performances? • Gene knockouts (VIGs, RNAi)

  9. direct interaction common regulator indirect interaction co-regulation

  10. Reverse Engineering of Regulatory Networks • Can we learnnetwork structures from postgenomic data themselves? • Are there statistical methods to distinguish between direct and indirect correlations? • Is it worth applying time-consuming Bayesian network approaches although computationally cheaper methods are available? • Do active interventions improve the learning performances? • Gene knockouts (VIGs, RNAi)

  11. Three widely applied methodologies: • Relevance networks • Graphical Gaussian models • Bayesian networks

  12. Relevance networks • Graphical Gaussian models • Bayesian networks

  13. Relevance networks(Butte and Kohane, 2000) • Choose a measure of association A(.,.) • Define a threshold value tA • For all pairs of domain variables (X,Y) compute their association A(X,Y) 4. Connect those variables (X,Y) by an undirected edge whose association A(X,Y) exceeds the predefined threshold value tA

  14. Relevance networks(Butte and Kohane, 2000)

  15. Relevance networks(Butte and Kohane, 2000) • Choose a measure of association A(.,.) • Define a threshold value tA • For all pairs of domain variables (X,Y) compute their association A(X,Y) 4. Connect those variables (X,Y) by an undirected edge whose association A(X,Y) exceeds the predefined threshold value tA

  16. direct interaction common regulator indirect interaction co-regulation

  17. Pairwise associations without taking the context of the system into consideration direct interaction pseudo-correlations between A and B E.g.:Correlation between A and C is disturbed (weakend) by the influence of B

  18. 1 2 ‘direct interaction’ X 1 2 1 2 X X ‘common regulator’ 1 1 2 2 ‘indirect interaction’ strong correlation σ12

  19. Relevance networks • Graphical Gaussian models • Bayesian networks

  20. 1 2 direct interaction 1 2 Graphical Gaussian Models strong partial correlation π12 Partial correlation, i.e. correlation conditional on all other domain variables Corr(X1,X2|X3,…,Xn) But usually: #observations < #variables

  21. Shrinkage estimation of the covariance matrix (Schäfer and Strimmer, 2005) 0<λ0<1 estimated (optimal) shrinkage intensity, with where is guaranteed

  22. direct interaction common regulator indirect interaction co-regulation

  23. Graphical Gaussian Models direct interaction common regulator indirect interaction P(A,B)=P(A)·P(B) But: P(A,B|C)≠P(A|C)·P(B|C)

  24. Further drawbacks • Relevance networks and Graphical Gaussian models can extract undirected edges only. • Bayesian networks promise to extract at least some directed edges. But can we trust in these edge directions? It may be better to learn undirected edges than learning directed edges with false orientations.

  25. Relevance networks • Graphical Gaussian models • Bayesian networks

  26. Bayesian networks NODES • Marriage between graph theory and probability theory. • Directed acyclic graph (DAG) represents conditional independence relations. • Markov assumption leads to a factorization of the joint probability distribution: A B C EDGES D E F

  27. Bayesian networks versus causal networks Bayesian networks represent conditional (in)dependency relations - not necessarily causal interactions.

  28. Node A unknown A A True causal graph B C B C Bayesian networks versus causal networks

  29. Bayesian networks NODES • Marriage between graph theory and probability theory. • Directed acyclic graph (DAG) represents conditional independence relations. • Markov assumption leads to a factorization of the joint probability distribution: A B C EDGES D E F

  30. Bayesian networks Parameterisation:Gaussian BGe scoring metric: data~N(μ,Σ) with normal-Wishart distribution of the (unknown) parameters, i.e.: μ~N(μ*,(vW)-1) and W~Wishart(T0)

  31. Bayesian networks BGe metric: closed form solution

  32. Learning the network structure graph → scoreBGe(graph) Idea: Heuristically searching for the graph M* that is most supported by the data P(M*|data)>P(graph|data), e.g. greedy search

  33. MCMC sampling of Bayesian networks • Better idea: Bayesian model averaging via Markov Chain Monte Carlo (MCMC) simulations • Construct and simulate a Markov Chain (Mt)t in the space of DAGs {graph} whose distribution converges to the graph posterior distribution as stationary distribution, i.e.: P(Mt=graph|data) → P(graph|data) t → ∞ to generate a DAG sample: G1,G2,G3,…GT

  34. A A B F C C D D E E A F A B C D E F A F C D E B → DAG Order MCMC(Friedman and Koller, 2003) • Order MCMC generates a sample of node orders from which in a second step DAGs can be sampled: Acceptance probability (Metropolis Hastings) G1,G2,G3,…GT DAG sample

  35. A C B Equivalence classes of BNs A C B A C A B P(A,B)≠P(A)·P(B) P(A,B|C)=P(A|C)·P(B|C) C B A C completed partially directed graphs (CPDAGs) B v-structure A P(A,B)=P(A)·P(B) P(A,B|C)≠P(A|C)·P(B|C) C B

  36. A B C D E F A B C D E F CPDAG representations CPDAGs DAGs Utilise the CPDAG sample for estimating the posterior probability of edge relation features: where I(Gi) is 1 if the CPDAG Gi contains the directed edge A→B, and 0 otherwise

  37. A B C A D B C E F D E F A B C D E F CPDAG representations CPDAGs interpretation DAGs superposition Utilise the DAG (CPDAG) sample for estimating the posterior probability of edge relation features: where I(Gi) is 1 if the CPDAG of Gi contains the directed edge A→B, and 0 otherwise

  38. Interventional data A and B are correlated A B inhibition of A A B A B A B down-regulation of B no effect on B

  39. Evaluation of Performance • Relevance networks and Graphical Gaussian models extract undirected edges (scores = (partial) correlations) • Bayesian networks extract undirected as well as directed edges (scores = posterior probabilities of edges) • Undirected edges can be interpreted as superposition of two directed edges with opposite direction. • How to cross-compare the learning performances when the true regulatory network is known? • Distinguish between DGE (directed graph evaluation) and UGE (undirected graph evaluation)

  40. Probabilistic inference - DGE true regulatory network edge scores data high low Thresholding concrete network predictions TP:1/2 FP:0/4 TP:2/2 FP:1/4

  41. Probabilistic inference - UGE skeleton of true regulatory network undirected edge scores add up scores of directed edges with opposite direction data

  42. Probabilistic inference - UGE skeleton of true regulatory network + undirected edge scores add up scores of directed edges with opposite direction data + +

  43. Probabilistic inference skeleton of true regulatory network undirected edge scores data

  44. Probabilistic inference skeleton of true regulatory network undirected edge scores data high low Thresholding concrete network (skeleton) predictions TP:1/2 FP:0/1 TP:2/2 FP:1/1

  45. Evaluation 1: AUC scoresArea under Receiver Operator Characteristic (ROC) curve sensitivity inverse specificity AUC=0.5 AUC=1 0.5<AUC≤1

  46. Evaluation 2: TP scores We set the threshold such that we obtained 5 spurious edges (5 FPs) and counted the corresponding number of true edges (TP count).

  47. Evaluation 2: TP scores 5 FP counts

  48. Evaluation 2: TP scores BN GGM RN 5 FP counts

More Related