1 / 12

Learning Regulatory Networks that Represent Regulator States and Roles

Learning Regulatory Networks that Represent Regulator States and Roles. Keith Noto (noto@cs.wisc.edu) and Mark Craven. K. Noto and M. Craven, Learning Regulatory Network Models that Represent Regulator States and Roles . To appear in Lecture Notes in Bioinformatics. Task. Given:

bart
Download Presentation

Learning Regulatory Networks that Represent Regulator States and Roles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Regulatory Networks that Represent Regulator States and Roles Keith Noto (noto@cs.wisc.edu) and Mark Craven K. Noto and M. Craven, Learning Regulatory Network Models that Represent Regulator States and Roles. To appear in Lecture Notes in Bioinformatics.

  2. Task • Given: • Gene expression data • Other sources of data • e.g. sequence data, transcription factor binding sites, transcription unit predictions • Do: • Construct a model that captures regulatory interactions in a cell

  3. Key Ideas: States and Roles • Regulator states • Cannot be observed • Depend on more than regulator expression • We use cellular conditions as surrogates/predictors of regulation effectors • Regulator roles • Is a regulator an activator or a repressor? • We use sequence analysis to predict these roles Regulator Expression Effector Cellular Condition Regulator State Regulatee Expression Regulatee Expression

  4. Network Variables and Structure Regulators: expression states represented as a mixture of Gaussians Cellular Conditions: “stationary growth phase”, “heat shock”, ... Select relevant parents HiddenRegulator States: “activated” or “inactivated” Connect where we have evidence of regulation Regulatees: expression states represented as a mixture of Gaussians

  5. Network Parameters: Hidden Nodes use CPD-Trees • Parents selected from regulator expression, cellular conditions • May contain context-sensitive independence metJ Growth Phase Growth Medium Heat Shock metJ state metJ metJ = Low expression metJ ≠ Low expression Growth Phase P(metJ state = activated): 0.001 Growth Phase ≠ Log Growth Phase = Log Phase P(metJ state = activated): 0.004 P(metJ state = activated): 0.994

  6. Initializing Roles Transcription Start Site* -35 metA transcription unit DNA Binding sites Upstream Downstream CPT for regulatee metA metR state metJ state metR state metJ state (metR binds upstream; considered an activator) (metJ binds downstream; considered a repressor) P(Low) P(High) activated activated activated inactivated inactivated activated Inactivated inactivated 0.6 0.4 0.2 0.8 0.9 0.1 0.5 0.5 metA *Predicted transcription start sites from Bockhorst et. al., ISMB ‘03

  7. Training the Model • Initialize the parameters • Activators tend to bind more upstream than repressors • Use an EM algorithm to set parameters • E-Step: Determine expected states of regulators • M-Step: Update CPDs • Repeat until convergence

  8. Experimental Data and Procedure • Expression measurements from Affymetrix microarrays (Fred Blattner’s lab, University of Wisconsin-Madison) • Regulator binding site predictions from TRANSFAC, EcoCyc, cross-species comparison (McCue, et. al., Genome Research 12, 2002) • Experimental data consists of: • 90 Experiments • 6 Cellular condition variables (between two and seven values) • 296 regulatees • 64 regulators • Cross-fold validation • Microarrays held aside for testing • Conditions from test microarrays do not appear in training set

  9. Model Classification Error Average Squared Error Log Likelihood 22.16% 0.75 -13,363 Baseline #1 (No hidden nodes, no cellular conditions) 12.42% 0.51 -12,193 Baseline #2 (No hidden nodes, using cellular conditions) 13.34% 0.51 -12,004 Our Model (3 iterations of adding missing TFs) Random Initialization (3 iterations of adding missing TFs) 14.19% 0.54 -11,893

More Related