1 / 35

Outline

Reg. ACGTGC. Outline. Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work?. State 1. State 2. State 3. Repressor. Regulated gene. Activator. Activator. Activator. Activator. Repressor. Activator. Repressor.

Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reg. ACGTGC Outline • Who regulates whom and when? • Model • Learning algorithm • Evaluation • Wet lab experiments • Perspective: why does it work?

  2. State 1 State 2 State 3 Repressor Regulated gene Activator Activator Activator Activator Repressor Activator Repressor Repressor Regulators Regulators DNA Microarray DNA Microarray Regulated gene Regulated gene Regulated gene Gene Regulation: Simple Example

  3. Regulation program Module genes Regulation Tree Activator? Activator expression false true true Repressor? Repressor expression false true Genes in the same module share the same regulation program State 1 State 2 State 3

  4. false true HAP4  true false CMK1  Module Networks Modules Goal: Discover regulatory modules and their regulators • Module genes: set of genes that are similarly controlled • Regulation program: expression as function of regulators

  5. P(Level | Module, Regulators) Module HAP4  Expression level of Regulator1 in experiment CMK1  1 What module does gene “g” belong to? 0 Regulator1 0 0 BMH1  Regulator2 GIC2  2 Regulator3 0 0 0 Expression level in each module is a function of expression of regulators Level Module Network Probabilistic Model Experiment Module Gene Expression

  6. Reg. ACGTGC Outline • Who regulates whom and when? • Model • Learning algorithm • Evaluation • Wet lab experiments • Perspective: why does it work?

  7. Goal: Find gene module assignments and tree structures that maximize P(M|D) Hard Gene module assignments Regulator1 Tree structures Regulator2 Regulator3 HAP4  CMK1  Level 0 0 0 Learning Problem • Genes: 5000-10000 • Regulators: ~500 Experiment Module Gene Expression

  8. clustering Gene module assignment Learn regulation programs Relearn gene assignments to modules Regulatory modules HAP4  CMK1  Learning Algorithm Overview

  9. Experiments sorted in original order Regulator HAP4  CMK1  SIP4  HAP4  Hap4 expression Experiments sorted by Hap4 expression log P(M|D)  log P(DHAP4 |HAP4 ,HAP4 ) + log P(DHAP4 |HAP4 ,HAP4 ) + log P(HAP4,HAP4, HAP4 ,HAP4) log P(M|D)  log P(DSIP4 |SIP4 ,SIP4 ) + log P(DSIP4 |SIP4 ,SIP4 ) + log P(SIP4,SIP4, SIP4 ,SIP4) log P(M|D)  log P(DHAP4 |HAP4 ,HAP4 ) + log P(DCMK1  |CMK1 ,CMK1 ) + log P(DCMK1 |CMK1 ,CMK1 ) + … Learning Regulation Programs Experiments Module genes log P(M|D)  log P(D|,) + log P(,) Module genes

  10. -128 -129 Bayesian score (avg. per gene) -130 Algorithm iterations -131 0 5 10 15 20 50 40 Gene module assignment changes (% from total) 30 20 10 Algorithm iterations 0 0 5 10 15 20 Learning Algorithm Performance Significant improvements across learning iterations Many genes (50%) change module assignment in learning

  11. Reg. ACGTGC Outline • Who regulates whom and when? • Model • Learning algorithm • Evaluation • Wet lab experiments • Perspective: why does it work?

  12. Yeast Stress Data • Genes • Selected 2355 that showed activity • Experiments (173) • Diverse environmental stress conditions: heat shock, nitrogen depletion,…

  13. Bayesian NetworkFriedman et al ’00Hartemink et al. ’01 Hap4 Expression level of each gene is a function of expression of regulators Mig1 Yap1 Cmk1 Ste12 Gic1 Fragment of learned Bayesian network • 2355 variables (genes) • 173 instances (experiments) Comparison to Bayesian Networks Problems • Robustness • Interpretability

  14. Regulator1 Regulator2 Regulator3 Module Solutions • Robustness  sharing parameters • Interpretability  module-level model Level Comparison to Bayesian Networks Bayesian NetworkFriedman et al ’00Hartemink et al. ’01 Module NetworkSPRKF ’03 (UAI) Hap4 Mig1 Yap1 Cmk1 Ste12 Gic1 Problems • Robustness • Interpretability

  15. 150 Test Data Log-Likelihood(gain per instance) 100 50 Learn which parameters are shared(by learning which genes are in the same module) Bayesian Network performance 0 -50 Number of modules -100 -150 0 100 200 300 400 500 Comparison to Bayesian Networks Problems • Robustness • Interpretability Solutions • Robustness  sharing parameters • Interpretability  module-level model

  16. HAP4  CMK1  HAP4  CMK1  0 0 0 Regulator1 Regulator2 Regulator3 Module Biologically relevant? Level From Model to Regulatory Modules

  17. Regulation program Module genes Respiration Module  • Module genes functionally coherent? • Module genes known targets of predicted regulators?  Predicted regulator Energy production (oxid. phos. 26/55 P<10-30) Hap4+Msn4 known to regulate module genes

  18. Tpk1: • Regulation by non-TFs (Tpk1 is a catalytic unit of cAMP dependent protein kinase) • Module contains known Tpk1 targets (e.g. Tps1) • Tpk1-mediated STRE motif (50/64 genes; p<3x10-11) Energy, Osomlarity, & cAMP Signaling

  19. 45 40 35 30 25 Negative log p-value (module network) 20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 Negative log p-value (standard clustering) EM: Biological Improvement

  20. 48 Inferred regulation Module (number) Experimentally tested regulator Regulator (transcription factor) Enriched cis-Regulatory Motif Regulation supported in literature Regulator (Signaling molecule) Not3 Gcn20 Bmh1 Ime4 Ypl230w Yap6 Gac1 Tpk2 Pph3 Gis1 Lsg1 Ppt1 Cmk1 Yer184c Tpk1 Kin82 Sip2 Xbp1 Msn4 Hap4 Gat1 36 47 39 26 17 14 25 9 11 8 31 5 16 30 42 18 13 15 41 33 10 4 3 2 1 N36 N30 N26 N18 N13 N14 N41 N11 HSF MIG1 CAT8 XBP1 HAC1 STRE GATA ADR1 GCR1 GCN4 MCM1 ABF_C HAP234 CBF1_B REPCAR DNA and RNAprocessing Energy andcAMP signaling Amino acidmetabolism nuclear

  21. Are the module genes functionally coherent? Are some module genes known targets of the predicted regulators? Biological Evaluation Summary 46/50 Functionally coherent = module genes enriched for GO annotations with hypergeometric p-value < 0.01 (corrected for multiple hypotheses) 30/50 Known targets = direct biological experiments reported in the literature

  22. Reg. ACGTGC Outline • Who regulates whom and when? • Model • Learning algorithm • Evaluation • Wet lab experiments • Perspective: why does it work?

  23. HAP4  Ypl230w ? From Model to Detailed Predictions • Prediction: • Experiment: Regulator ‘X’ regulates process ‘Y’ Knock out ‘X’ and repeat experiment X

  24. wild-type mutant 1334 regulated genes(312 expected by chance) Modules predicted to be regulated by Ypl230w >4x Regulated genes Does ‘X’ Regulate Predicted Genes? Experiment: knock out Ypl230w (stationary phase) Rank modules by regulated genes Ypl230w regulates computationally predicted genes Predicted modules

  25. wild-type mutant wild-type mutant Does ‘X’ Regulate Predicted Genes? Ppt1 knockout(hypo-osmotic stress) Kin82 knockout (heat shock) Regulated genes(1014) Regulated genes(1034)

  26. New yeast biology suggested • Ypl230w activates protein-folding, cell wall and ATP-binding genes • Ppt1 represses phosphate metabolism and rRNA processing • Kin82 activates energy and osmotic stress genes Wet Lab Experiments Summary 3/3 regulators regulate computationally predicted genes

  27. Reg. ACGTGC Outline • Who regulates whom and when? • Model • Learning algorithm • Evaluation • Wet lab experiments • Perspective: why does it work?

  28. Statistical methods can detect associations between regulators and their targets Why does it work? • Underlying assumption: Regulators are transcriptionally regulated Regulators are part of regulatory structures in which they are themselves regulated* * [Shen-Orr et al., ’02] find many such structures

  29. Regulator Chain • Respiration module Phd1 Phd1 (TF) Activeproteinlevel Hap4 Targets Hap4 (TF) Hap4 mRNAexpressionlevel Targets Phd1 Cox4 Cox6 Atp17 Time • Black: regulators that cannot be detected • Red: correctly predicted regulator • Blue: targets

  30. Auto Regulation • Snf kinase regulated processes module Yap6 (TF) Vid24 Tor1 Gut2 • Black: regulators that cannot be detected • Red: correctly predicted regulator • Blue: targets

  31. Positive Signaling Loop • Sporulation and cAMP pathway module Sip2 (SM) Msn4 (TF) Vid24 Tor1 Gut2 • Black: regulators that cannot be detected • Red: correctly predicted regulator • Blue: targets

  32. Negative Signaling Loop • Energy and osmotic stress module Tpk1 (SM) Msn4 (TF) Nth1 Tps1 Glo1 • Black: regulators that cannot be detected • Red: correctly predicted regulator • Blue: targets

  33. Some transcription factors and signal transduction molecules have a detectable expression signature Module Networks infers their regulatory relationships Why Does it Work? Feed-forward and feedback loops

  34. Assignment • Download the yeast stress expression dataset • Download the list of transcription factor regulators • Randomly partition the dataset in a 5-fold cross validation scheme • For k=50: • Create a hard-clustering model (use code from earlier exercise). At each array, this model has a separate Gaussian distribution for each of the 50 values of the cluster variable • Use the assignment of genes to clusters that you learned in the hard-clustering, and for each cluster, learn a decision tree with at most: (1) one split (2) two splits (3) three splits • Note 1: allow only splits with >=5 arrays in each side of the split • Note 2: split question is whether the expression level of the transcription factor is greater than some value

  35. Assignment Continued • Note 3: at each leaf of the resulting model, there is a single Gaussian distribution that is used for all arrays that map to that leaf • Compute the log-likelihood of the test data for each model (hard-clustering, and each of the three regulation models) • Plot the avg. and std. test log-likelihood for each model • For the model with two splits on each cluster, use the Gaussian distribution at each array to sample a new expression dataset with exactly the same number of genes and number of arrays. For each original gene and array, you sample from the Gaussian distribution associated with that gene and that array • Learn a model with two splits for each cluster • Plot the number of regulation tree splits that are identical between the model that sampled the data and the new model that you learned

More Related