A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcript...
1 / 19

Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence - PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription. Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence. Talk plan. Overview of the problem Extending regression Introducing dynamics Modelling separately concentrations What next?.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentationdownload

Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Guido sanguinetti magnus rattray and neil d lawrence

A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription

Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence

Talk plan

Talk plan

  • Overview of the problem

  • Extending regression

  • Introducing dynamics

  • Modelling separately concentrations

  • What next?

The problem

The problem

  • The Central Dogma






Easy to measure


Hard to measure

Protein interactions


Specific problem

Specific problem

  • Transcription factors produce proteins that promote or repress transcription of other genes; they play a fundamental role in gene networking

  • Deduce the activity of the transcription factors’ proteins (in an experimental condition) from the mRNA expression data.

Why not use the tfs expressions

Why not use the TFs expressions?

TFs are often low expressed, noisy

TFs are post-transcriptionally regulated

TFs interact non-trivially with each other

Current approaches

Current approaches

  • Integrate with ChIP-on-chip data

  • ChIP-on-chip gives a binary matrix X of transcription factors binding genes (connectivity matrix)

  • Regress microarray expression data on X

bmt is the transcription factor activity (TFA) of TF m at time t, monotonically linked to protein concentrations (Liao et al, Boulesteix and Strimmer, Gao et al,...)



  • All genes bound by the TF contribute equally to the estimate of the TFA, regardless of the regulation type.

  • TFAs are gene-independent, but the influence of a transcription factor varies from gene to gene (and according to condition)

  • The model is linear (inevitable)

Extending regression

Extending Regression

Modify the regression model to allow different TFAs for different genes and experiments

Reduce the number of parameters by placing a prior distribution over the gene-specific TFAs. The choice of the prior distribution depends on the situation we model. E.g., for independent samples we may assume TFAs at different time points to be independent

Introducing dynamics

Introducing dynamics

  • To model time series data, we choose a Kalman filter prior on the rows of B


This is equivalent to assuming TFAs vary smoothly

Likelihood function

Likelihood function

  • Given the model and the prior, we can obtain a likelihood

The likelihood can be estimated efficiently using the sparsity

of the covariance and recursion relations.

Estimating the tfas

Estimating the TFAs

TFAs can be estimated a posteriori using Bayes’s

Theorem and moment matching

Error bars associated with each TFA are given by the squared root of the diagonal entries in the posterior covariance.

Mean TFAs can be obtained by averaging gene-specific TFAs over the target genes.

Testing the model

Testing the model

  • We compared our averaged TFAs with the ones obtained by regression for the Spellman dataset (Mol.Biol.Cell,1998), ChIP data from Lee et al. (Science 2002). The diagrams show the TFA for ACE2p.

But we also get

...but we also get...

TFA for CTS1

TFA for SCW11

TFA for YER124C

TFA for YKL151C

And we can do more

...and we can do more!

  • Error bars allow to determine which regulations are significant

  • Correlations among TFs can be obtained from Σ

Decoupling action and concentration

Decoupling action and concentration

  • It is not clear in the model whether a high gene-specific TFA is the result of a high affinity or of a high protein concentration

  • We modify the model to distinguish the effects of protein concentration and affinity

  • Specifically, we model

Estimating the parameters

Estimating the parameters

  • The model is no longer exact.

  • Approximate inference is performed using a variational EM algorithm

  • This exploits Jensen’s inequality to get a bound on the log likelihood

Under a factorization assumption on the approximating

distribution q, the E-step becomes exactly solvable via

fixed point equations.



The left hand picture shows the expression level of ACE2

in the yeast cell cycle, the middle shows the inferred protein

concentration and right shows the significance of the activities.



  • ChIP data is notoriously noisy; for example the same transcription factor (MSN4) in the same conditions (rich medium) is found to bind 32 genes in Lee et al. and 57 genes in Harbison et al. (the intersection is 20 genes).

  • Posterior estimation helps with false positives, not with false negatives.

  • The model is additive (in log space) and doesn’t model combinatorial effects.

What next

What next?

  • Collaborate with biologists to validate our predictions on novel data

  • Microarray and ChIP data from same lab should be more consistent

  • Use the model results as a starting point for systems biology modeling

  • Introduce combinatorial effects

  • Login