1 / 104

Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Discovering Regulatory Networks from Gene Expression and Promoter Sequence. Eran Segal Stanford University. Modules. Interactions. Activity. From Parts to Systems. Parts. Gene 1. Gene 2. RNA. Protein. is a tightly regulated process. DNA. RNA. Gene Regulation. DNA. Gene 1. Gene 2.

sema
Download Presentation

Discovering Regulatory Networks from Gene Expression and Promoter Sequence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovering Regulatory Networks from Gene Expression and Promoter Sequence Eran Segal Stanford University

  2. Modules Interactions Activity From Parts to Systems Parts

  3. Gene 1 Gene 2 RNA Protein is a tightly regulated process DNA RNA Gene Regulation DNA

  4. Gene 1 Gene 2 Coding Coding Regulator Control Control Swi5 RNA ACGTGC Motif Swi5 Regulator (transcription factor) Gene Regulation DNA

  5. Gene 1 Gene 2 Coding Coding Control Control Genome-wide Available Data • DNA Sequence • Gene Expression • mRNA level of all genes • Measured in different conditions ……ACTAGCGGCTATAATGACTGGACCTACGTACCGATATAATGTCAGCTAGCA…… RNA DNA Microarray

  6. Gene 1 Gene 2 Coding Coding Regulator Control Control ACGTGC Motif Many diagnostic, prognostic and therapeutic implications Gene Regulation Swi5 • How are genes regulated? • How are genes regulated? • Who regulates whom? • How are genes regulated? • Who regulates whom? • Under which conditions? • How are genes regulated? • Who regulates whom? • Under which conditions? • Which genes are co-regulated?

  7. clustering Motif Procedural • Apply a different method to each type of data • Use output of one method as input to the next GACTGC Example: Finding Motifs • Cluster gene expression profiles • Search for motifs in control regions of clustered genes Control regions Gene I AGCTAGCTGAGACTGCACAC TTCGGACTGCGCTATATAGA GACTGCAGCTAGTAGAGCTC CTAGAGCTCTATGACTGCCG ATTGCGGGGCGTCTGAGCTC TTTGCTCTTGACTGCCGCTT AGCTAGCTGAGACTGCACAC TTCGGACTGCGCTATATAGA GACTGCAGCTAGTAGAGCTC CTAGAGCTCTATGACTGCCG ATTGCGGGGCGTCTGAGCTC TTTGCTCTTGACTGCCGCTT Gene II Gene III Genes Gene IV Gene V Gene VI Experiments

  8. What is a model? probabilistic stochastic A description of the biological process that could have generated the observed data Our Approach: Model Based

  9. Our Approach: Model Based • Statistical modeling language for biological domains • Based on Bayesian networks • Classes of objects • Properties • Observed: gene sequence,experiment conditions • Hidden: gene module • Interactions • Expression level as afunction of gene andexperiment properties Gene Experiment Condition Tumor Module Expression STGFK ’01 (ISMB)

  10. Bayesian Network Condition1 Condition2 Tumor1 Tumor2 Module1 Level1,1 Level1,2 Module2 Level2,1 Level2,2 P(Level2,1 | Module2,Condition2,Tumor2) Probabilistic Model • Defines a joint distribution Exper. Condition Gene Module Tumor Level Expression STGFK ’01 (ISMB)

  11. Problem-specific structure • Modularity in biological systems • Convex optimization • Graph theoretic algorithms • Dynamic programming • Heuristic search NP-Hard Probabilistic Model • Defines a joint distribution • Learned automatically from data • Parameterization • Structure • Assignment to hidden variables Exper. Condition Gene Module Tumor Level Expression Find model M that maximizes P(M | D) Learn parameterization and structure of distributions Learn network structure • Thousands of variables • Space of possible networks is super-exponential Probabilistic inference in the Bayesian network • Millions of hidden variables • Variables are highly dependent STGFK ’01 (ISMB)

  12. Biological problem Analyze results • Visualization • Literature • Statistics Model design • Classes of objects • Properties • Interactions Learn model • Automatically from data • Structure • Parameterization Derive biological insights from model Scheme Analyze results Model design Learn model Data STGFK ’01 (ISMB)

  13. Reg. ACGTGC Outline • Who regulates whom and when? • Model • Learning algorithm • Evaluation • Wet lab experiments • How are genes regulated? • Regulation of multi-functional genes • Evolution of gene regulation

  14. Ongoing Biological Debate Can we discover actual regulators from gene expression data alone?

  15. State 1 State 2 State 3 Repressor Regulated gene Activator Activator Activator Activator Repressor Activator Repressor Repressor Regulators Regulators DNA Microarray DNA Microarray Regulated gene Regulated gene Regulated gene Gene Regulation: Simple Example

  16. Regulation program Module genes Regulation Tree SSRPBKF ’03 (Nature Genetics) Activator? Activator expression false true true Repressor? Repressor expression false true Genes in the same module share the same regulation program State 1 State 2 State 3

  17. false true HAP4  true false CMK1  Module Networks SSRPBKF ’03 (Nature Genetics) Modules Goal: Discover regulatory modules and their regulators • Module genes: set of genes that are similarly controlled • Regulation program: expression as function of regulators

  18. P(Level | Module, Regulators) Module HAP4  Expression level of Regulator1 in experiment CMK1  1 What module does gene “g” belong to? 0 Regulator1 0 0 BMH1  Regulator2 GIC2  2 Regulator3 0 0 0 Expression level in each module is a function of expression of regulators Level Module Network Probabilistic Model SSRPBKF ’03 (Nature Genetics) Experiment Module Gene Expression

  19. Reg. ACGTGC Outline • Who regulates whom and when? • Model • Learning algorithm • Evaluation • Wet lab experiments • How are genes regulated? • Regulation of multi-functional genes • Evolution of gene regulation

  20. Goal: Find gene module assignments and tree structures that maximize P(M|D) Hard Gene module assignments Regulator1 Tree structures Regulator2 Regulator3 HAP4  CMK1  Level 0 0 0 Learning Problem SSRPBKF ’03 (Nature Genetics) • Genes: 5000-10000 • Regulators: ~500 Experiment Module Gene Expression

  21. clustering Gene module assignment Learn regulation programs Relearn gene assignments to modules Regulatory modules HAP4  CMK1  Learning Algorithm Overview SSRPBKF ’03 (Nature Genetics)

  22. Experiments sorted in original order Regulator HAP4  CMK1  SIP4  HAP4  Hap4 expression Experiments sorted by Hap4 expression log P(M|D)  log P(DHAP4 |HAP4 ,HAP4 ) + log P(DHAP4 |HAP4 ,HAP4 ) + log P(HAP4,HAP4, HAP4 ,HAP4) log P(M|D)  log P(DSIP4 |SIP4 ,SIP4 ) + log P(DSIP4 |SIP4 ,SIP4 ) + log P(SIP4,SIP4, SIP4 ,SIP4) log P(M|D)  log P(DHAP4 |HAP4 ,HAP4 ) + log P(DCMK1  |CMK1 ,CMK1 ) + log P(DCMK1 |CMK1 ,CMK1 ) + … Learning Regulation Programs Experiments Module genes log P(M|D)  log P(D|,) + log P(,) Module genes

  23. -128 -129 Bayesian score (avg. per gene) -130 Algorithm iterations -131 0 5 10 15 20 50 40 Gene module assignment changes (% from total) 30 20 10 Algorithm iterations 0 0 5 10 15 20 Learning Algorithm Performance SPRKF ’03 (UAI) Significant improvements across learning iterations Many genes (50%) change module assignment in learning

  24. Reg. ACGTGC Outline • Who regulates whom and when? • Model • Learning algorithm • Evaluation • Wet lab experiments • How are genes regulated? • Regulation of multi-functional genes • Evolution of gene regulation

  25. Yeast Stress Data • Genes • Selected 2355 that showed activity • Experiments (173) • Diverse environmental stress conditions: heat shock, nitrogen depletion,…

  26. Bayesian NetworkFriedman et al ’00Hartemink et al. ’01 Hap4 Expression level of each gene is a function of expression of regulators Mig1 Yap1 Cmk1 Ste12 Gic1 Fragment of learned Bayesian network • 2355 variables (genes) • 173 instances (experiments) Comparison to Bayesian Networks Problems • Robustness • Interpretability

  27. Regulator1 Regulator2 Regulator3 Module Solutions • Robustness  sharing parameters • Interpretability  module-level model Level Comparison to Bayesian Networks Bayesian NetworkFriedman et al ’00Hartemink et al. ’01 Module NetworkSPRKF ’03 (UAI) Hap4 Mig1 Yap1 Cmk1 Ste12 Gic1 Problems • Robustness • Interpretability

  28. 150 Test Data Log-Likelihood(gain per instance) 100 50 Learn which parameters are shared(by learning which genes are in the same module) Bayesian Network performance 0 -50 Number of modules -100 -150 0 100 200 300 400 500 Comparison to Bayesian Networks SPRKF ’03 (UAI) Problems • Robustness • Interpretability Solutions • Robustness  sharing parameters • Interpretability  module-level model

  29. HAP4  CMK1  HAP4  CMK1  0 0 0 Regulator1 Regulator2 Regulator3 Module Biologically relevant? Level From Model to Regulatory Modules SSRPBKF ’03 (Nature Genetics)

  30. Regulation program Module genes Respiration Module SSRPBKF ’03 (Nature Genetics)  • Module genes functionally coherent? • Module genes known targets of predicted regulators?  Predicted regulator Energy production (oxid. phos. 26/55 P<10-30) Hap4+Msn4 known to regulate module genes

  31. Regulation program Module genes Energy, Osomlarity, & cAMP Signaling • Regulation by non-TFs (Tpk1 – cAMP-dependent protein kinase) • Module genes known targets of predicted regulators?

  32. Are the module genes functionally coherent? Are some module genes known targets of the predicted regulators? Biological Evaluation Summary SSRPBKF ’03 (Nature Genetics) 46/50 Functionally coherent = module genes enriched for GO annotations with hypergeometric p-value < 0.01 (corrected for multiple hypotheses) 30/50 Known targets = direct biological experiments reported in the literature

  33. Reg. ACGTGC Outline • Who regulates whom and when? • Model • Learning algorithm • Evaluation • Wet lab experiments • How are genes regulated? • Regulation of multi-functional genes • Evolution of gene regulation

  34. HAP4  Ypl230w ? From Model to Detailed Predictions SSRPBKF ’03 (Nature Genetics) • Prediction: • Experiment: Regulator ‘X’ regulates process ‘Y’ Knock out ‘X’ and repeat experiment X

  35. wild-type mutant 1334 regulated genes(312 expected by chance) Modules predicted to be regulated by Ypl230w >4x Regulated genes Does ‘X’ Regulate Predicted Genes? SSRPBKF ’03 (Nature Genetics) Experiment: knock out Ypl230w (stationary phase) Rank modules by regulated genes Ypl230w regulates computationally predicted genes Predicted modules

  36. wild-type mutant wild-type mutant Does ‘X’ Regulate Predicted Genes? SSRPBKF ’03 (Nature Genetics) Ppt1 knockout(hypo-osmotic stress) Kin82 knockout (heat shock) Regulated genes(1014) Regulated genes(1034)

  37. New yeast biology suggested • Ypl230w activates protein-folding, cell wall and ATP-binding genes • Ppt1 represses phosphate metabolism and rRNA processing • Kin82 activates energy and osmotic stress genes Wet Lab Experiments Summary SSRPBKF ’03 (Nature Genetics) 3/3 regulators regulate computationally predicted genes

  38. Many regulatory relationships can be induced from gene expression data Ongoing Biological Debate SSRPBKF ’03 (Nature Genetics) Can we discover actual regulators from gene expression data alone?

  39. Feedforward, auto-regulatory “motifs” (Shen-Orr et al. 2002) TFs and SMs have detectable expression signature Sip2 (SM) Phd1 (TF) Msn4 (TF) Yap6 (TF) Hap4 (TF) Statistical methods can infer their regulatory relationships from gene expression data Vid24 Tor1 Gut2 Vid24 Tor1 Gut2 Cox4 Cox6 Atp17 Positive signaling loop Auto regulation Regulator chain (Sporulation & cAMP) (Respiration) (Snf kinase regulated processes) Undetected regulators Detected regulators Detected target Why Does it Work? SSRPBKF ’03 (Nature Genetics) Assumption: Regulators are transcriptionally regulated

  40. Reg. ACGTGC Reg. ACGTGC Motif Outline • Who regulates whom and when? • How are genes regulated? • Model • Evaluation • Regulation of multi-functional genes • Evolution of gene regulation

  41. DNA control sequence GATAG GATAG ACGTGC Motif ACGTGC + No motifs GATAG GATAG From Sequence to Expression DNA Microarray Repressor Activator ? ? ? Activator Activator Repressor Gene 1 Gene 2 Gene 3

  42. Sequence Expression ACGTGC + No motifs GATAG GATAG From Sequence to Expression Goal: Explain how expression arises from sequence • Construct mechanistic model of gene regulation • Learn the model from sequence and expression data

  43. clustering Motif Procedural • Apply a different method to each type of data • Use output of one method as input to the next GACTGC Two Phase Approach (I) • Cluster gene expression profiles • Search for motifs in control regions of clustered genes Control regions Gene I AGCTAGCTGAGACTGCACAC TTCGGACTGCGCTATATAGA GACTGCAGCTAGTAGAGCTC CTAGAGCTCTATGACTGCCG ATTGCGGGGCGTCTGAGCTC TTTGCTCTTGACTGCCGCTT AGCTAGCTGAGACTGCACAC TTCGGACTGCGCTATATAGA GACTGCAGCTAGTAGAGCTC CTAGAGCTCTATGACTGCCG ATTGCGGGGCGTCTGAGCTC TTTGCTCTTGACTGCCGCTT Gene II Gene III Genes Gene IV Gene V Gene VI Experiments

  44. Shared Motif Clustering B Clustering A Shared Motif Cluster I Cluster I Cluster II Cluster II Two Phase Approach: Problems • Expression clustering is not perfect

  45. TCGACT CGATGG AAATTA TCGACT ACGAGA GATACC GATACC TTCGCA ACGACT AAATGC CGCTGA GATACC Two Phase Approach (II) • Iterate over all sequences of length k • Find all genes that have each k-mer in their promoter • Keep k-mers whose genes are coherent in expression

  46. TCGACTGC TCGACTGC TCGACTGC GATAC TCGACTGC + + + GATAC GATAC GATAC Two Phase Approach: Problems • Single motifs may not have coherent expression • Activator: • Repressor: TCGACTGC GATAC

  47. OR TCGACTGC ? + CCAAT Two Phase Approach: Problems • Are we missing motifs? TCGACTGC

  48. Genes TCGACTGC GATAC CCAAT GCAGTT Motifs TCGACTGC TCGACTGC TCGACTGC TCGACTGC TCGACTGC TCGACTGC GCAGTT Motif Profiles CCAAT GATAC CCAAT GATAC GATAC GATAC CCAAT CCAAT CCAAT CCAAT CCAAT CCAAT + + GCAGTT GCAGTT GCAGTT GATAC CCAAT Expression Profiles Unified Model of Gene Regulation ACGATGCTAGTGTAGCTGATGCTGATCGATCGTACGTGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCAGCTAGCTCGACTGCTTTGTGGGGCCTTGTGTGCTCAAACACACACAACACCAAATGTGCTTTGTGGTACTGATGATCGTAGTAACCACTGTCGATGATGCTGTGGGGGGTATCGATGCATACCACCCCCCGCTCGATCGATCGTAGCTAGCTAGCTGACTGATCAAAAACACCATACGCCCCCCGTCGCTGCTCGTAGCATGCTAGCTAGCTGATCGATCAGCTACGATCGACTGATCGTAGCTAGCTACTTTTTTTTTTTTGCTAGCACCCAACTGACTGATCGTAGTCAGTACGTACGATCGTGACTGATCGCTCGTCGTCGATGCATCGTACGTAGCTACGTAGCATGCTAGCTGCTCGCAAAAAAAAAACGTCGTCGATCGTAGCTGCTCGCCCCCCCCCCCCGACTGATCGTAGCTAGCTGATCGATCGATCGATCGTAGCTGAATTATATATATATATATACGGCG Sequence SYK ’03 (ISMB)

  49. Unified Model of Gene Regulation Sequence cis-regulatory modules Motifs TCGACTGC GCAGTT Motif Profiles CCAAT + + GATAC CCAAT Expression Profiles

  50. Regulatory Module DNA control sequences of module genes Expression of module genes TCGACTGC GATAC Motif Profile: + Unified Model of Gene Regulation Experiments Modules SYK ’03 (ISMB)

More Related