1 / 36

Inferring gene regulatory networks from multiple microarray datasets (Wang 2006)

Inferring gene regulatory networks from multiple microarray datasets (Wang 2006). Tiffany Ko ELE571 Spring 2009. Outline. Introduction Gene Regulatory Networks DNA Microarrays Objectives Methods Approach: SVD GNR Algorithm Confidence Evaluation Results Simulated Data

sara-landry
Download Presentation

Inferring gene regulatory networks from multiple microarray datasets (Wang 2006)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inferring gene regulatory networks from multiple microarray datasets (Wang 2006) Tiffany Ko ELE571 Spring 2009

  2. Outline • Introduction • Gene Regulatory Networks • DNA Microarrays • Objectives • Methods • Approach: SVD • GNR Algorithm • Confidence Evaluation • Results • Simulated Data • Experimental Data • Discussion • Limitations • Conclusions

  3. Intro

  4. Gene Regulatory Networks http://upload.wikimedia.org/wikipedia/commons/0/07/Gene.png http://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Gene2-plain.svg/708px-Gene2-plain.svg.png

  5. Gene Regulatory Networks http://upload.wikimedia.org/wikipedia/commons/thumb/d/df/Gene_Regulatory_Network_2.jpg/800px-Gene_Regulatory_Network_2.jpg

  6. Gene Regulatory Networks http://www.pnas.org/content/104/31/12890/F2.large.jpg

  7. DNA Microarrays • Y-direction: genes X-direction: data points • M x N matrix S • M genes, N experiments • Expression (color magnitude) representative of the number of probes which have bound to present complementary DNA templates. • High number of genes, low number of samples/data points.

  8. Objectives • Purpose • Construct a novel method of gene network reconstruction (GNR) which able to process a variety of multiple microarray datasets from difference experiments for inferring the most consistent gene network (GN) while taking into consideration sparsity of connections. • Motivation • Multiple datasets: addresses data scarcity and the “dimensionality problem” • Improve inferred gene network reliability • Derive gene networks with higher biologically plausible sparsity

  9. Methods

  10. Approach • Express Gene Networks (GN) as differential equations. • Derive a solution for a single time-course dataset using singular value decomposition (SVD). • Find the most consistent network structure with respect to all datasets. • Optimal solution has minimal connections (edges).

  11. Approach • Express Gene Networks (GN) as differential equations. • Gene regulation dynamics typically nonlinear, however linear equations capture main features of the network.

  12. Approach 2. Derive a solution for a single time-course dataset using singular value decomposition (SVD).

  13. Approach • Derive a solution for a single time-course dataset using singular value decomposition (SVD). SVD: • nonzero elements of eklisted last, s.t. e1 = … = el, el+1 , … , en≠ 0. • Allows for particular solution with the smallest L2 norm for the connectivity matrix, Ĵ.

  14. Approach • Derive a solution for a single time-course dataset using singular value decomposition (SVD).

  15. Approach • Find the most consistent network structure with respect to all datasets. • Multiple, N, microarray datasets for one organism exists; each corresponds to its own general solution, J. • Jk is already normalized in time due to definition of X’. • LP problem posed:

  16. Approach • Find the most consistent network structure with respect to all datasets. • Matching Term • Match most consistent solution with k’s solution • Weighted by reliability • Sparsity Term • Forces sparsity by minimizing the L1 norm • Relative importance balanced by  Sparsity Term Matching Term

  17. GNR Algorithm • When J is fixed, problem can be divided into N independent subproblems. • Through iteration, J will then be updated based on results of Y. • STEP 0: Initialize; set iteration index q = 1. • STEP 1: Fix J (q-1) • STEP 2: Fix J(q) • STEP 3: Check for convergence; else return to STEP 1.

  18. Algorithm: Step 0 • Initialize: • Using SVD, solve for the particular solution • Set initial values: • Ensure given parameters are positive. • q = Iteration index, set

  19. Algorithm: Step 1 • Update J: • At iteration q, with fixed, solve LP:

  20. Algorithm: Step 2 & 3 • STEP 2: Having solved for , fix all of and solve for J(q): • STEP 3: Check for convergence. • Is ? • Yes  Terminate computation. • No  Return to STEP 1.

  21. GNR Algorithm Overview

  22. Confidence Evaluation • Given the optimal solution is , we can compute for each element Jij: • Variance • Deviation • Overall average deviation:

  23. = 0 • 1 dataset • = 0.3 • 3 datasets True Network Results • = 0 • 2 datasets • = 0 • 3 datasets

  24. Simulated Data • Constructed a small simulated network with five genes, and noise function (t): • Randomly chose 3 initial starting conditions. • Produced 3 datasets with 4, 4, and 3 time points, respectively.

  25. Simulated Data • Assessed network recovering ability (Yeung 2002 criterion): • Assessed accuracy of GNR

  26. Simulated Data No sparsity or noise factor Variant: # of data sets True Network • = 0,  = 0 • 1 dataset • = 0,  = 0 • 3 datasets • = 0,  = 0 • 2 datasets

  27. Simulated Data Gaussian noise distribution Variant: # of data sets,  • = 0 • 1 dataset • = 0.3 • 3 datasets True Network • = 0 • 2 datasets • = 0 • 3 datasets

  28. Simulated Data • Adding datasets improves accuracy of network reconstruction • GNR must balance between topology reconstruction accuracy and interaction strength accuracy. •  controls the trade-off between E0 and E1 (or E2). • Adding datasets improves the confidence of network reconstruction.

  29. Simulated Data • GNR is able to accurately infer the GN solution to a highly under-determined problem given datasets with few time points and differing initial conditions. • Network topology may still be correctly inferred in the presence of high noise by including a sparsity constraint at the expense of interaction strength accuracy. • Larger simulated network structures were tested with similarly effective results.

  30. Experimental Data • Heat-Shock Response Data for Yeast • 10 transcription factors • 4 microarray datasets (Stanford Microarray Database)  7, 5, 5, 4 time points Correctly inferred 4 edges with documented, known regulation, and 1 edge with documented potential regulation.

  31. Experimental Data • Cell-cycle Data for Yeast • 140 differentially expressed genes • 4 datasets with differing experimental conditions Constructed sub-GN involving several genes with proven function within cell wall organization. (Circles in same color indicate same biological function.)

  32. Experimental Data • Stress Response Data for Arabidopsis • Root experiments: 226 genes; Shoot experiments: 246 genes • 9 datasets with 6+ time points for each root and shoot (www.arabidopsis.org)

  33. Discussion

  34. Limitations • Assumes the regulatory network remains stationary regardless of differing environmental conditions. • Requires high resolution, high-quality, time-course datasets. • Noise of gene expression data intrinsic to microarray technologies is a major source of error. • Hidden regulatory factors may lead to implicit description errors. • Inferred GN models predict, indiscriminately, both direct and indirect regulations due to hidden variables. • Model edges correlate to net effect. • Predicted regulatory relationship does not inherently correlate to regulation by a transcriptional factor.

  35. Conclusions • Created a novel method to derive GN substructure using multiple microarray datasets instead of multiple inferred network alignment. • Model can capture regulatory mechanisms at the protein and metabolite levels which cannot be physically measured. • Capable of deriving a more global structure with dense connections, in addition to more local substructures with sparse connections by modifying the trade-off parameter, . • Model is used most effectively in tandem with other information sources. • FUTURE WORK: Extend GNR to identify conserved network patterns or motifs from the datasets of differing species.

  36. The End Thank you for listening!

More Related