770 likes | 902 Views
Dirk Husmeier. Probabilistic modelling in computational biology. Biomathematics & Statistics Scotland. James Watson & Francis Crick, 1953. Frederick Sanger, 1980. Network reconstruction from postgenomic data. Model Parameters q. Marriage between
E N D
Dirk Husmeier Probabilistic modelling in computational biology Biomathematics & Statistics Scotland
Marriage between graph theory and probability theory Friedman et al. (2000), J. Comp. Biol. 7, 601-620
Bayes net ODE model
Model Parameters q Probability theory Likelihood
Model Parameters q Bayesian networks: integral analytically tractable!
Identify the best network structure Ideal scenario: Large data sets, low noise
Uncertainty about the best network structure Limited number of experimental replications, high noise
Sample of high-scoring networks Feature extraction, e.g. marginal posterior probabilities of the edges Uncertainty about edges High-confident edge High-confident non-edge
Sampling with MCMC Number of structures Number of nodes
Overview • Introduction • Limitations • Methodology • Application to morphogenesis • Application to synthetic biology
Homogeneity assumption Interactions don’t change with time
Changepoint model Parameters can change with time
Changepoint model Parameters can change with time
Extension of the model q Allocation vector h k Number of components (here: 3)
Analytically integrate out the parameters q Allocation vector h k Number of components (here: 3)
RJMCMC within Gibbs P(network structure | changepoints, data) P(changepoints | network structure, data) Birth, death, and relocation moves
2 Dynamic programming, complexity N
Collaborationwith theInstitute of Molecular Plant Sciences at Edinburgh University (Andrew Millar’s group) Circadian rhythms in Arabidopsis thaliana - Focus on: 9 circadian genes: LHY, CCA1, TOC1, ELF4, ELF3, GI, PRR9, PRR5, and PRR3 - Transcriptional profiles at 4*13 time points in 2h intervals under constant light for - 4 experimental conditions
Comparison with the literature Precision Proportion of identified interactions that are correct Recall = Sensitivity Proportion of true interactions that we successfully recovered Specificity Proportion of non-interactions that are successfully avoided
Which interactions from the literature are found? ELF3 True positive CCA1 True positives (TP) = 8 False negatives (FN) = 5 LHY PRR9 Recall= 8/13= 62% GI Blue: activations Red: Inhibitions TOC1 PRR5 PRR3 ELF4 False negative
Which proportion of predicted interactions are confirmed by the literature? True positives (TP) = 8 False positives (FP) = 13 Precision = 8/21= 38% True positive Blue: activations Red: Inhibitions False positives
Precision= 38% Recall= 62% ELF3 CCA1 LHY PRR9 GI TOC1 PRR5 PRR3 ELF4
True positives (TP) = 8 False positives (FP) = 13 False negatives (FN) = 5 True negatives (TN) = 9²-8-13-5= 55 Sensitivity = TP/[TP+FN] = 62% Specificity = TN/[TN+FP] = 81% Recall Proportion of avoided non-interactions
Model extension So far:non-stationarity in the regulatory process
Model Parameters q Use prior knowledge!
Flexible network structure with regularization Hyperparameter Normalization factor
Flexible network structure with regularization Exponential prior versus Binomial prior with conjugate beta hyperprior