250 likes | 274 Views
This study presents a computational tool called StepMiner to identify gene regulation transitions in microarray time-course data. It evaluates one-step and two-step patterns using F-statistic and P-value calculations, with applications in gene expression analysis. Results from simulated data and experimental data analysis, like diauxic shift in yeast, are discussed, along with comparisons with existing methods. The algorithm's effectiveness in identifying transitions is demonstrated through heat maps and gene ontology annotations.
E N D
Extracting binary signals from microarray time-course data Debashis Sahoo1, David L. Dill2, Rob Tibshirani3 and Sylvia K. Plevritis41 Department of Electrical Engineering2 Department of Computer Science 3 Department of Radiology and4 Department of Health Research and Policy and Department of StatisticsStanford University Roli Shrivastava
Introduction • Problem Statement • To identify up and down regulated gene • To identify the time of transition • Experimental Technique • Microarray (Tens of thousands of distinct probes on an array to accomplish the equivalent number of genetic tests in parallel) • Computational Technique • A tool called StepMiner to extract biologically meaningful result from large amounts of data
Types of Transitions 1. One Step 2. Two Step 3. Genes for which the one- or two-step patterns do not fit appreciably better than a constant mean value (the null hypothesis).
Calculate the F statistic for the model and data set Pthreshold = 0.05 Calculate the P-value If P < Pthreshold If P > Pthreshold The model does not fit The model fits Fitting One or Two-Step Function • F1 statistic: Computes how well the one-step model fits the data • F2 statistic: Computes how well the two-step model fits the data • F12 statistic: Compares the fit of one-step model and two-step model on same data • P-value: Low P-value represents a good fit of the model to the data
StepMiner Algorithm one-step fits data AND one-step fits better than two-step two-step fits data AND one-step does not fit it Neither one-step Nor two-step fits the data
Comparison of 4 Algorithms StepMinerAlgo Step height = 5σ. Number of timepoints = 15. A total of 2000 random data, 2000 one step data and 2000 two step data with random step positions.
Comparison of 4 Algorithms Step height = 5σ. Number of timepoints = 15. A total of 2000 random data, 2000 one step data and 2000 two step data with random step positions.
Generation of Simulated Data • Microarray data with 15 non-uniform time points • 4000 genes with 2000 one-step and 200 two-step patterns • Gaussian noise was added to the above data • P-value threshold of 0.05 was used
Results of Simulated Data - I • σ is the standard deviation of noise • Step position is fixed at 5 for 1-step • Step position at 5 and 9 for 2-step • Higher the height easier is the identification
Results of Simulated Data - II • σ is the standard deviation of noise • Random step positions • Small reduction in accuracy • Higher matches occur if all constant segments in a curve have several time points. • Desirable to design experiments so that there are several points before the first interesting transition and after the last interesting transition.
Results of Simulated Data - III • Shows sensitivity to P-value threshold and number of time points • Random step position and step height of 5σ • Two-step signals require more time points than one-step signals • Matches increase on increasing P-value but at the cost of higher False Discovery Rate
Results of Simulated Data - IV • Shows sensitivity to spacing between steps • For 15 time points first step is fixed at position 4 • A spacing of at least 3 time points is required when step height is > 3σ • Steps are required to be placed at least 3 time points from end point
Diauxic Shift • In the initial phases of a growing batch culture, yeast prefers to metabolize glucose and produce ethanol even when oxygen is abundant. • When the glucose is exhausted, cells undergo a “diauxic shift,” in which they switch abruptly to an oxidative metabolism. This pathway allows the oxidation of the accumulated fermentation products and is highly efficient as a mechanism for generating ATP. Brauer et. al., Mol Biol Cell. 2005 May; 16(5): 2503–2517
Analysis of Experimental Data Fitting functions for 3 genes • 2284 genes with diauxic shift • 1088 were matched with one-step transition • 267 were two-step transitions • 929 did not match to anything
The heat map shows two transitions at 8.25 and 9.25 h Same Data reanalyzed using StepMiner Heat Maps Analysis by Brauer et. al.
Comparison With Brauer et al’s Results • The GO annotations and FDR-corrected P-values for the clusters reported in Brauer et al. was recomputed with the latest yeast gene annotations from the Gene Ontology Consortium Website • Table shows the results of the p-values from GO- Term Finder as well as Step Miner.
Results Of Comparison • The annotation that had the lowest P-values in Brauer et al. had even low P-values in the StepMiner groups. • In most cases, the P-values in the reanalysis are lower than Brauer et al’s, implies that grouping by time-of-change is at least as effective as hierarchical clustering at identifying relevant genes. • GO annotations are obtained fully automatically using StepMiner – it is not necessary to select interesting clusters manually. • Those clusters which has no P-values from StepMiner were “less interpretable in terms of diauxic shift”, in the words of Brauer et al.
Comparison of StepMiner to Other Tools • Hierarchical clustering: finds clusters that transition at same time point • Manual search required to find transitions • SAM: finds transitions by looking for significant differences in average expression before and after a specified time point. • However, many of the genes selected by this method do not, in fact, have a transition at the specified time point. • EDGE: identify genes whose expression systematically change over time and significantly different from the mean of the expressions over time. • Clearly, this method doesn’t provide the direction and position of significant change directly.
Hierarchical vs. StepMiner Cluster that transitions at 3 hours StepMiner clearly shows other transition times
Comparison of StepMiner to Other Tools - STEM • Provides model profiles and their significance values • But profiles don’t look like step functions and therefore is not helpful to locate transitions
Strengths and Limitations • Easy to understand • Few parameters • Biologically transitions can be more interesting • Very fast < 15s for 15 microarrays of 40000 genes • Can deal with missing measurements • Provides statistical parameters like P-value, FDR etc. • Binary model • There can be other cases: eg, transition is not step • Short and long time courses are not good Most appropriate for 10-30 Time measurements.
Post StepMiner Analysis • Once StepMiner is run genes undergoing binary transitions can easily be partitioned into sets based on the number, direction, and timing of transitions. • These sets can be merged at the user’s discretion (e.g., the set of one-step genes that rise at time 3 could be merged with the two-step genes that rise at time 3), or can be further subdivided etc.
Replication vs. Resolution • For accuracy it is better to take more frequent measurements that to get replicates • It comes at a cost of correctly identifying the kind of step