Outline

Outline • Experimental Design • Two-Color Examples • Affy Data Example – Trt vs. Control

The Importance of Good Experimental Designs • Designed experiments are a principal vehicle for transcriptomic knowledge discovery. • The best designs maximize efficiency wrt time, money, and information gain. • Basic strategy is to identify, orthogonalize, and randomize primary sources of variability. • Remember field trials: e.g. split plots. • Rich history in stat literature; see also recent papers by Kerr and Churchill (NB: loop designs = incomplete block designs).

Two-Color Example: Toxicogenomics Study from NIEHS Microarray Center Hamadeh et al. (2002) Toxicological Sciences 67: 219-231 and 232-240 Peroxisome Proliferators (Clofibrate, Wyeth 14643, Gemfibrozil), Phenobarbital, and D-Mannitol effects on rat liver at 24 hours and 2 weeks 85 two-color arrays of 1700 genes, treated samples labeled with Cy5 and reference samples with Cy3

Toxicogenomics Example Experimental Design Reference sample design 28 rats, most with 3 reps 85 total arrays http://dir.niehs.nih.gov/microarray/datasets

Difficulties • Dye effects are completely confounded with treatments. • Reference sample are used for half of the measurements. • We need to account for various sources of variability: rat, array, spot, treatment, gene, and their interactions.

The Biologists The Statistician Greg, Wei, Rebecca, Kevin, & Gisele Russ Image Source: Kitano, Science, March 1, 2002 The Goal: Excellent Science

For Effective Collaboration The Biologists Must… • …trust the expertise of the statistician, e.g. • experimental design • data modeling of sources of variability • evaluation of inferential evidence • computer coding • visualization • …be willing to change routine practice in order to do better science

For Effective Collaboration The Statistician Must… • …trust the expertise of the biologist, e.g. • practical designs and protocols • knowledge of underlying mechanisms • evaluation of molecular evidence • RNA coding • visualization • …be willing to change routine practice in order to do better science

Drosophila Example from Greg’s Lab at NC State • 2 sexes, 2 lines, 2 ages • 6 reps including dye swaps • 24 total two-color arrays • Split-plot design, with age treatments always together on same chip Jin et al. (2001) Nature Genetics 29: 389-395 http://brooks.statgen.ncsu.edu/ggibson/Pubs.htm

Analysis Approach: Mixed-Model ANOVA • Potential for direct modeling of the log2 intensity measurements instead of ratios • accommodation known sources of variability across all of the arrays • a comprehensive analysis framework for complex experimental designs and unbalanced data • output useful for quality control, inference, and classification

Toxicogenomics Example Stage 1: Fit Normalization Model log2Y =  + trt + time + trt*time + animal + array(animal) + dye(array animal) + error Notes: • Random effects are in red; others are fixed. • Some preprocessing may be desirable (e.g. loess). • This model averages across all genes.

Toxicogenomics Example (continued) Stage 2: Fit Gene-Specific Models For each of the 1700 genes, construct residuals from Stage 1 and use them as input to the following model: resid =  + trt*time + animal + array(animal) + error

Drosophila Example Stage 1: Fit Normalization Model log2Y =  + sex + line + age + sex*line + sex*age + line*age + sex*line*age + array + dye*array + error Notes: • Random effects are in red; others are fixed. • Some preprocessing may be desirable (e.g. loess). • This model averages across all genes.

Drosophila Example (continued) Stage 2: Fit Gene-Specific Models For each gene, construct residuals from Stage 1 and use them as input: resid =  + sex + line + age + sex*line + sex*age + line*age + sex*line*age + array + error Note:The dye*array effect is dropped because it is completely confounded with error.

Rich Output Stage 3: Analyze Results from Stage 2 • Quality control using parameter estimates and residuals • Inference on effects and variance components of interest • Construct data to be used for clustering • Use dynamic visualization for exploration of data and results

Live Software Analysis

Six Affy Yeast Chipsfrom Chi-Hse Teng, Pharmacia (MBSW, 2002)

Compare Seven Statistical Models

Seven Models (continued) (3) Mixed Model: (4) Adjust (3) as in Irizarry et al, using

Seven Models (continued) Bivariate Mixed Model: (5) Test using (6) Test using

Seven Models (continued) (7) Li-Wong:

Treatment Effect Estimates Methods in same order as listed previously Red = gene has distance (||PM-MM||2/J) greater than median Blue = distance smaller than median

Negative Log P-Values

That Pesky Mismatch Data • MM value almost surely contains both true signal and cross-hybridization signal, but in what proportion?? • Direct subtraction from PM adds noise; probe level analyses are much more precise. • If amount of cross-hyb is proportionally constant, then analyzing on the log scale is appealing: log(c PM1) – log (c PM2) = log(PM1) – log (PM2)

Simulation Study: Li-Wong vs. Mixed Data simulated under Li-Wong model (left) and mixed model(right) Simulated Type I error (top), testing power (middle), and fitting R2 values (bottom) Red= LW C.I. Test Green=Wald test from LW Blue=t-test from mixed model

Curiosity: Mixed vs. Bivariate Mixed • “V” shape • Covariance within probe set (gene) - red : sa2 = 0 - black : sa2 > 0

Conclusions from Affy Example • Statistical methods differ in their results • Subtracting mismatch adds noise; probe level analyses much more precise • Bivariate mixed model, though computationally intensive, appears to have an edge Some papers online at http://statgen.ncsu.edu/ggibson/Pubs.htm

Wrap Up • Excellent science involves: • well-formulated hypotheses • efficient experimental designs • intelligent combo of statistical inference and graphics • Collaboration must work through communication, personal, scientific, and organizational barriers; it requires mutual openness and trust.

All Possible Pairs of Genes

Outline

Outline

Presentation Transcript

Outline

Outline

Outline

Outline

Outline

Outline

Outline

outline

outline

OUTLINE

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline:

Outline

Outline

OUTLINE: