1 / 27

Bayesian detection of non-sinusoidal periodic patterns in circadian expression data

Bayesian detection of non-sinusoidal periodic patterns in circadian expression data. Darya Chudova , Alexander Ihler , Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no. 23 2009, pages 3114-3120. Outline. Introduction Methodology

akira
Download Presentation

Bayesian detection of non-sinusoidal periodic patterns in circadian expression data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian detection of non-sinusoidal periodic patterns in circadian expression data Darya Chudova, Alexander Ihler, Kevin K. Lin, Bogi Andersen and Padhraic Smyth BIOINFORMATICS Gene expression Vol. 25 no. 23 2009, pages 3114-3120

  2. Outline • Introduction • Methodology • Experimental Results • Conclusion

  3. Introduction • Cyclical biological processes : • Cell cycle, hair growth cycle, mammary cycle and circadian rhythms • Produce coordinated periodic expression of thousands of genes. • Existing computational methods are biased toward discovering genes that follow sine-wave patterns. • The objective is to identify or rank which of these genes are most likely to be periodically regulated.

  4. Introduction • Two major categories : • Frequency domain • Compute the spectrum of the average expression profile for each probe. • Test the significance of the dominant frequency against a suitable null hypothesis such as uncorrelated noise. • Not well suited for short time courses. • Time domain • Identification of sinusoidal expression patterns • Simple and computational efficiency • Not effective at finding periodic signals which violate the sinusoidal assumption.

  5. Introduction • In this article, a general statistical framework for detecting periodic profiles from time course • Analyzing the similarity of observed profiles across the cycles. • discover periodic transcripts of arbitrary shapes from replicated gen expression profiles. • Provide an empirical Bayes procedure for estimating parameters of the prior distribution. • Derive closed-formed expressions for the posterior probability of periodicity.

  6. Introduction • Expression profiles from the murine liver time course data set. • Two of these probe sets (NrIdI and Arntl) correspond to well-established clock-control genes.

  7. Methodology • Probabilistic mixture model: • Differentially expressed genes • change their expression level in response to changes in experimental conditions • Background genes • remains constant throughout the experiment • Coordinated expression across multiple cycles • Model periodic phenomena

  8. Methodology • Mode the data using a mixture of three components for background, differentially and periodically expressed profiles. Compute the posterior probability that a given probe set was generated by the periodic component.

  9. Methodology • A probabilistic model for periodicity • N probe sets over C cycles of known length. • Each cycle is represented by the same grid of T time points, indexed from 1 to T. • Denote the number of replicate observations for probe set at time point of cycle by . • : the expression intensity value for a particular probe set i , time point j and replicate k for cycle c. • : the entire set of observations for probe set i.

  10. Methodology • Our probabilistic model for expression , then consists of three components : background(b), differentailly expressed but aperiodic (d) and periodically expressed profiles (p). • Let denote the component associated with probe set i. • Each of the three component models consists of Normal/Inverse Gamma (NIG) prior distribution on the latent profile and additional Normal noise on the observations.

  11. Methodology • Normal/Inverse Gamma (NIG) prior is a flexible and computationally convenient distribution commonly used as a prior model for latent expression levels and replicate variability. • Scalar variables are distributed as NIG with parameters . • : inverse Gamma distribution with a degrees of freedom and scale parameters b, evaluated at x.

  12. Methodology • Three type of unknown quantities: • The prior parameters, denoted  • Determine via an empirical Bayesian procedure • Subsequently treated as known and fixed • Probe set-specific hidden variables: the latent profiles (consisting of a mean and variance) for each component. • The component identify , indicating from which component the data ware generated.

  13. Methodology N probes sets, repeat Ntimes The observed profiles Yand latent variables Z (component identity) and {, }

  14. Methodology • The background component model: • NIG prior shared by all background probe sets and parameterized by four scalars • Yi are modeled as independent samples from a Gaussian distribution with mean and variance

  15. Methodology • The differentially expressed component model: • and be (C x T)-dimensional vector • The prior distribution for this component is defined by four (C x T) –dimensional parameters, • Mode observations as being independent given :

  16. Methodology • The periodic component model: • Assume repeated expression of the same pattern across multiple cycles • and are T-dimensional variables encoding expression levels and replicate variability in the ‘ideal’ cycle.

  17. Methodology • The complete set of prior parameters  includes the prior component probabilities z (corresponding to the relative frequencies of background, differentially expressed, and periodic probe sets)

  18. Methodology • Inference • Detect periodic expression by computing the posterior probability of the periodic component

  19. Methodology • An analysis of variance periodicity detector • The resulting inferential test for periodicity is quite close to a simplified, non-Bayesian test based on analysis of variance (ANOVA). • Construct ANOVA test • Dividing the data into groups by their associated time points regardless of cycle number • All replicates for c=1,..,C and k=1,…, fall into the same group

  20. Methodology • test whether the data support separation into these groups • whether the amount of variation between groups is significantly larger than the variation found within the groups. • High values of the ratio of these quantities indicated that most of the variability in observations can be explained using a time-dependent, cycle-independent profile,

  21. Methodology • Estimating parameters of the prior distribution: • Develop an empirical Bayes procedure to determine the prior parameters  • Determine a tentative assignment of probe set to each component • Use this assignment to find approximate maximum likelihood estimates of the location scale  and parameter of the inverse Gamma distribution (a,b); we set the location mean  to o in all three components.

  22. Methodology • To find a tentative initial assignment of probe sets for estimating prior parameters: • Run ANOVA detector of differential expression and periodicity. • To define parameters of the component for differential expression • Probe sets that vary significantly over time (P<0.01) • To define the parameters of the background components: • Probe sets which fail this test (P>0.1) • probe sets for estimating the prior parameters of the periodic component • choosing those probe sets with P<0.001 results in a number of probe sets similar to that previously identified in the literature.

  23. Experimental Results • Demonstrate the model can effectively identify both sinusoidal and non-sinusoidal periodic expression pattern. • It is widely believed that 5-10% of transcribed genes may be under circadian regulation, with some studies suggesting a higher proportion – up to 50%in murine liver. • The datasets analyzed in this article contain gene expression profiles of liver and skeletal muscle tissues in mice.

  24. Experimental Results • Sine-wave detection: • Use the sine-wave matching algorithm of Straume (2004). • Identify 848 distinct rhythmic prove sets in liver and 383 such probe sets in skeletal muscle. • Model-based detection: • Among the top 25 probe sets there are nine that were not among the top 400 ranked by sine-wave matching. • Profile peak or drop at a single time point are poorly matched to a sinusoid shape.

  25. Experimental Results

  26. Experimental Results • Tns3 is just the single probe set that ranked above 25 by the sin-wave method but below 400 by the model. • Conforms to the sine-wave pattern, but possesses a very small amplitude, and is assigned to the background component by the model. • All of the other probe sets that were so highly ranked by the sine-wave method received posterior probabilities of periodicity >0.9 from our model.

  27. Conclusion • We argue that in typical experiments with only a small number of samples per cycle, we should test for arbitrary patterns which are repeated between cycles, rather than parametric shapes. • To this end, we propose a Bayesian mixture model for identifying patterns of unconstrained shape, which stand out as both differentially and periodically expressed.

More Related