1 / 32

Outline

Outline. Experimental Design Two-Color Examples Affy Data Example – Trt vs. Control. The Importance of Good Experimental Designs. Designed experiments are a principal vehicle for transcriptomic knowledge discovery. The best designs maximize efficiency wrt time, money, and information gain.

elvis
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline • Experimental Design • Two-Color Examples • Affy Data Example – Trt vs. Control

  2. The Importance of Good Experimental Designs • Designed experiments are a principal vehicle for transcriptomic knowledge discovery. • The best designs maximize efficiency wrt time, money, and information gain. • Basic strategy is to identify, orthogonalize, and randomize primary sources of variability. • Remember field trials: e.g. split plots. • Rich history in stat literature; see also recent papers by Kerr and Churchill (NB: loop designs = incomplete block designs).

  3. Two-Color Example: Toxicogenomics Study from NIEHS Microarray Center Hamadeh et al. (2002) Toxicological Sciences 67: 219-231 and 232-240 Peroxisome Proliferators (Clofibrate, Wyeth 14643, Gemfibrozil), Phenobarbital, and D-Mannitol effects on rat liver at 24 hours and 2 weeks 85 two-color arrays of 1700 genes, treated samples labeled with Cy5 and reference samples with Cy3

  4. Toxicogenomics Example Experimental Design Reference sample design 28 rats, most with 3 reps 85 total arrays http://dir.niehs.nih.gov/microarray/datasets

  5. Difficulties • Dye effects are completely confounded with treatments. • Reference sample are used for half of the measurements. • We need to account for various sources of variability: rat, array, spot, treatment, gene, and their interactions.

  6. The Biologists The Statistician Greg, Wei, Rebecca, Kevin, & Gisele Russ Image Source: Kitano, Science, March 1, 2002 The Goal: Excellent Science

  7. For Effective Collaboration The Biologists Must… • …trust the expertise of the statistician, e.g. • experimental design • data modeling of sources of variability • evaluation of inferential evidence • computer coding • visualization • …be willing to change routine practice in order to do better science

  8. For Effective Collaboration The Statistician Must… • …trust the expertise of the biologist, e.g. • practical designs and protocols • knowledge of underlying mechanisms • evaluation of molecular evidence • RNA coding • visualization • …be willing to change routine practice in order to do better science

  9. Drosophila Example from Greg’s Lab at NC State • 2 sexes, 2 lines, 2 ages • 6 reps including dye swaps • 24 total two-color arrays • Split-plot design, with age treatments always together on same chip Jin et al. (2001) Nature Genetics 29: 389-395 http://brooks.statgen.ncsu.edu/ggibson/Pubs.htm

  10. Analysis Approach: Mixed-Model ANOVA • Potential for direct modeling of the log2 intensity measurements instead of ratios • accommodation known sources of variability across all of the arrays • a comprehensive analysis framework for complex experimental designs and unbalanced data • output useful for quality control, inference, and classification

  11. Toxicogenomics Example Stage 1: Fit Normalization Model log2Y =  + trt + time + trt*time + animal + array(animal) + dye(array animal) + error Notes: • Random effects are in red; others are fixed. • Some preprocessing may be desirable (e.g. loess). • This model averages across all genes.

  12. Toxicogenomics Example (continued) Stage 2: Fit Gene-Specific Models For each of the 1700 genes, construct residuals from Stage 1 and use them as input to the following model: resid =  + trt*time + animal + array(animal) + error

  13. Drosophila Example Stage 1: Fit Normalization Model log2Y =  + sex + line + age + sex*line + sex*age + line*age + sex*line*age + array + dye*array + error Notes: • Random effects are in red; others are fixed. • Some preprocessing may be desirable (e.g. loess). • This model averages across all genes.

  14. Drosophila Example (continued) Stage 2: Fit Gene-Specific Models For each gene, construct residuals from Stage 1 and use them as input: resid =  + sex + line + age + sex*line + sex*age + line*age + sex*line*age + array + error Note:The dye*array effect is dropped because it is completely confounded with error.

  15. Rich Output Stage 3: Analyze Results from Stage 2 • Quality control using parameter estimates and residuals • Inference on effects and variance components of interest • Construct data to be used for clustering • Use dynamic visualization for exploration of data and results

  16. Live Software Analysis

  17. Six Affy Yeast Chipsfrom Chi-Hse Teng, Pharmacia (MBSW, 2002)

  18. Compare Seven Statistical Models

  19. Seven Models (continued) (3) Mixed Model: (4) Adjust (3) as in Irizarry et al, using

  20. Seven Models (continued) Bivariate Mixed Model: (5) Test using (6) Test using

  21. Seven Models (continued) (7) Li-Wong:

  22. Treatment Effect Estimates Methods in same order as listed previously Red = gene has distance (||PM-MM||2/J) greater than median Blue = distance smaller than median

  23. Negative Log P-Values

  24. That Pesky Mismatch Data • MM value almost surely contains both true signal and cross-hybridization signal, but in what proportion?? • Direct subtraction from PM adds noise; probe level analyses are much more precise. • If amount of cross-hyb is proportionally constant, then analyzing on the log scale is appealing: log(c PM1) – log (c PM2) = log(PM1) – log (PM2)

  25. Simulation Study: Li-Wong vs. Mixed Data simulated under Li-Wong model (left) and mixed model(right) Simulated Type I error (top), testing power (middle), and fitting R2 values (bottom) Red= LW C.I. Test Green=Wald test from LW Blue=t-test from mixed model

  26. Curiosity: Mixed vs. Bivariate Mixed • “V” shape • Covariance within probe set (gene) - red : sa2 = 0 - black : sa2 > 0

  27. Conclusions from Affy Example • Statistical methods differ in their results • Subtracting mismatch adds noise; probe level analyses much more precise • Bivariate mixed model, though computationally intensive, appears to have an edge Some papers online at http://statgen.ncsu.edu/ggibson/Pubs.htm

  28. Wrap Up • Excellent science involves: • well-formulated hypotheses • efficient experimental designs • intelligent combo of statistical inference and graphics • Collaboration must work through communication, personal, scientific, and organizational barriers; it requires mutual openness and trust.

  29. All Possible Pairs of Genes

More Related