320 likes | 465 Views
Outline. Experimental Design Two-Color Examples Affy Data Example – Trt vs. Control. The Importance of Good Experimental Designs. Designed experiments are a principal vehicle for transcriptomic knowledge discovery. The best designs maximize efficiency wrt time, money, and information gain.
E N D
Outline • Experimental Design • Two-Color Examples • Affy Data Example – Trt vs. Control
The Importance of Good Experimental Designs • Designed experiments are a principal vehicle for transcriptomic knowledge discovery. • The best designs maximize efficiency wrt time, money, and information gain. • Basic strategy is to identify, orthogonalize, and randomize primary sources of variability. • Remember field trials: e.g. split plots. • Rich history in stat literature; see also recent papers by Kerr and Churchill (NB: loop designs = incomplete block designs).
Two-Color Example: Toxicogenomics Study from NIEHS Microarray Center Hamadeh et al. (2002) Toxicological Sciences 67: 219-231 and 232-240 Peroxisome Proliferators (Clofibrate, Wyeth 14643, Gemfibrozil), Phenobarbital, and D-Mannitol effects on rat liver at 24 hours and 2 weeks 85 two-color arrays of 1700 genes, treated samples labeled with Cy5 and reference samples with Cy3
Toxicogenomics Example Experimental Design Reference sample design 28 rats, most with 3 reps 85 total arrays http://dir.niehs.nih.gov/microarray/datasets
Difficulties • Dye effects are completely confounded with treatments. • Reference sample are used for half of the measurements. • We need to account for various sources of variability: rat, array, spot, treatment, gene, and their interactions.
The Biologists The Statistician Greg, Wei, Rebecca, Kevin, & Gisele Russ Image Source: Kitano, Science, March 1, 2002 The Goal: Excellent Science
For Effective Collaboration The Biologists Must… • …trust the expertise of the statistician, e.g. • experimental design • data modeling of sources of variability • evaluation of inferential evidence • computer coding • visualization • …be willing to change routine practice in order to do better science
For Effective Collaboration The Statistician Must… • …trust the expertise of the biologist, e.g. • practical designs and protocols • knowledge of underlying mechanisms • evaluation of molecular evidence • RNA coding • visualization • …be willing to change routine practice in order to do better science
Drosophila Example from Greg’s Lab at NC State • 2 sexes, 2 lines, 2 ages • 6 reps including dye swaps • 24 total two-color arrays • Split-plot design, with age treatments always together on same chip Jin et al. (2001) Nature Genetics 29: 389-395 http://brooks.statgen.ncsu.edu/ggibson/Pubs.htm
Analysis Approach: Mixed-Model ANOVA • Potential for direct modeling of the log2 intensity measurements instead of ratios • accommodation known sources of variability across all of the arrays • a comprehensive analysis framework for complex experimental designs and unbalanced data • output useful for quality control, inference, and classification
Toxicogenomics Example Stage 1: Fit Normalization Model log2Y = + trt + time + trt*time + animal + array(animal) + dye(array animal) + error Notes: • Random effects are in red; others are fixed. • Some preprocessing may be desirable (e.g. loess). • This model averages across all genes.
Toxicogenomics Example (continued) Stage 2: Fit Gene-Specific Models For each of the 1700 genes, construct residuals from Stage 1 and use them as input to the following model: resid = + trt*time + animal + array(animal) + error
Drosophila Example Stage 1: Fit Normalization Model log2Y = + sex + line + age + sex*line + sex*age + line*age + sex*line*age + array + dye*array + error Notes: • Random effects are in red; others are fixed. • Some preprocessing may be desirable (e.g. loess). • This model averages across all genes.
Drosophila Example (continued) Stage 2: Fit Gene-Specific Models For each gene, construct residuals from Stage 1 and use them as input: resid = + sex + line + age + sex*line + sex*age + line*age + sex*line*age + array + error Note:The dye*array effect is dropped because it is completely confounded with error.
Rich Output Stage 3: Analyze Results from Stage 2 • Quality control using parameter estimates and residuals • Inference on effects and variance components of interest • Construct data to be used for clustering • Use dynamic visualization for exploration of data and results
Six Affy Yeast Chipsfrom Chi-Hse Teng, Pharmacia (MBSW, 2002)
Seven Models (continued) (3) Mixed Model: (4) Adjust (3) as in Irizarry et al, using
Seven Models (continued) Bivariate Mixed Model: (5) Test using (6) Test using
Seven Models (continued) (7) Li-Wong:
Treatment Effect Estimates Methods in same order as listed previously Red = gene has distance (||PM-MM||2/J) greater than median Blue = distance smaller than median
That Pesky Mismatch Data • MM value almost surely contains both true signal and cross-hybridization signal, but in what proportion?? • Direct subtraction from PM adds noise; probe level analyses are much more precise. • If amount of cross-hyb is proportionally constant, then analyzing on the log scale is appealing: log(c PM1) – log (c PM2) = log(PM1) – log (PM2)
Simulation Study: Li-Wong vs. Mixed Data simulated under Li-Wong model (left) and mixed model(right) Simulated Type I error (top), testing power (middle), and fitting R2 values (bottom) Red= LW C.I. Test Green=Wald test from LW Blue=t-test from mixed model
Curiosity: Mixed vs. Bivariate Mixed • “V” shape • Covariance within probe set (gene) - red : sa2 = 0 - black : sa2 > 0
Conclusions from Affy Example • Statistical methods differ in their results • Subtracting mismatch adds noise; probe level analyses much more precise • Bivariate mixed model, though computationally intensive, appears to have an edge Some papers online at http://statgen.ncsu.edu/ggibson/Pubs.htm
Wrap Up • Excellent science involves: • well-formulated hypotheses • efficient experimental designs • intelligent combo of statistical inference and graphics • Collaboration must work through communication, personal, scientific, and organizational barriers; it requires mutual openness and trust.