1 / 18

Regulatory element discovery for developmental time series

Regulatory element discovery for developmental time series. Computational Biology Program Sloan-Kettering Institute Memorial Sloan-Kettering Cancer Center. Joint work with Xuejing Li, Chris Wiggins, Valerie Reinke Christina Leslie. http://cbio.mskcc.org. Regulatory networks in development.

gareth
Download Presentation

Regulatory element discovery for developmental time series

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regulatory element discovery for developmental time series Computational Biology Program Sloan-Kettering Institute Memorial Sloan-Kettering Cancer Center Joint work with Xuejing Li, Chris Wiggins, Valerie Reinke Christina Leslie http://cbio.mskcc.org

  2. Regulatory networks in development • Reinke lab: genome-wide expression for C. elegans developmental time series + germ cell/gametogenesis mutants • Problem: decipher regulatory networks governing germline- and sex-regulated genes

  3. Previous work: MEDUSA in yeast • Predict up/down expression of target genes from promoter + regulator expression • Learns from a set of mRNA expression experiments without clustering • Problem: high correlation of nearby time points, many regulator profiles

  4. Sequence to expression profile • Can we learn mapping from promoter sequence to full expression trajectory (with some level of statistical significance)? • Retain some properties of MEDUSA: • No clustering of expression profiles • Learn motifs de novo from promoters by building from k-mers …AGCTATGCCATCGACTGCTCCA…

  5. Regression problem expression profile for gene g motif vector (k-mer counts) for gene g M E • Idea: learn latent factors T = X W that “explain” Y • Then regress X ≈ TPt, Y ≈ TQt or Y ≈ BX where B  WQt G G X Y columns wi = weight vectors columns of P, Q = loadings

  6. First step: PLS regression • Sequentially build latent factors ti = Xwi: • Maximize covariance between factors and Y • Constrain t1, …, tK to be uncorrelated • SIMPLS: • for i = 1, …, K in 1D case subject to

  7. Equivalent formulation • Learn latent factors ti = Xwi andui = Xci for both predictor and response variables • wi and ci chosen to maximize Cov(ti, ui) • for i = 1, …, K subject to wi ci motif weight vector expression weight vector

  8. Next steps: sparsity, graph Laplacian • For regulatization and interpretability of weight vectors, want • sparsity in w: want most components to be 0 • smoothness in w: define graph on set of k-mers, with edge k ~ l if corresponding k-mers are close in Hamming distance

  9. Preliminary results: worm time series • Reinke data: ~9000 genes, 12 time points (3 replicates), wild type germline development • Genes sets, from mutant expression data: • Sperm genes: high expression in spermatogenesis • Oocyte genes: high expression in oogenesis • Motif matrix: filter k-mers based on expected counts

  10. Standard PLS • 10-fold c.v. on held-out genes

  11. Regularized PLS • 10-fold c.v. on held-out genes

  12. Regularized PLS • Sperm/oocyte gene sets: largest chi-square reduction for 3rd/1st latent factor

  13. Interpretation of factor weights • To infer motifs relevant for an expression pattern: • Latent factors ti = Xwi and ui = Yci for both predictors and reponse variables • wi and ci chosen to maximize Cov(ti,ui) • ci gives weights over time points: interpret as expression pattern • wi gives weights over motifs: highly weighted motifs relevant for this expression pattern

  14. Sperm genes • c3 correlated with sperm gene expression, consistent with drop in chi-square

  15. Motif graph for sperm genes • Top 50 k-mer graph for w3, clusters around GATAA (ELT-1) and ACGTG (bHLH)

  16. Oocyte genes • Oocyte genes correlate with c1 pattern

  17. Oocyte motif map • Top 50 k-mer graph for w1, log(p) vs weight

  18. Some related work • Zhang et al, 2008: PCA in Y for motif discovery • Naughton et al, 2006: algorithmic motif search using graph representation • Beer and Tavazoie, 2004; Segal et al, 2002: sequence to expression via clustering

More Related