1 / 32

Quantitation of Gene Expression for High-Density Oligonucleotide Arrays: A SAFER Approach

Quantitation of Gene Expression for High-Density Oligonucleotide Arrays: A SAFER Approach. Daniel Holder, Bill Pikounis, Richard Raubertas, Vladimir Svetnik, and Keith Soper Biometrics Research Merck Research Laboratories. S cale Matters A dditive F its (probes and chips)

mabli
Download Presentation

Quantitation of Gene Expression for High-Density Oligonucleotide Arrays: A SAFER Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quantitation of Gene Expression for High-Density Oligonucleotide Arrays:ASAFERApproach Daniel Holder, Bill Pikounis, Richard Raubertas, Vladimir Svetnik, and Keith Soper Biometrics Research Merck Research Laboratories

  2. Scale Matters Additive Fits (probes and chips) Experimental-Unit Variability Robustness and Resistance

  3. Goals of Data Analysis • Which genes have we detected? • Which genes have changed ? • Which genes change together? • Prerequisites • Quantify transcript abundance (“gene expression index”) • Quantify precision • Assess quality

  4. Our Data Analysis Method • Normalize chips for overall fluorescence (based on MM)* • Transform data (linear-log hybrid scale) • Fit probe-specific model using all chips (highly resistant to outliers)* • Normalize for chip bias (scatterplot smooth)* • Assess differences (Include between-EU variability, e.g., ANOVA)* * offers opportunities for QC

  5. Fig 1:Hybrid Transformation (knot at c=20) f(x)=c*ln(x/c)+c f(x) f(x)=x f(x)=hybrid(0,c) x

  6. Linear-log Hybrid Scale f(x) = a if x<a = x if x in [a,c) = c*ln(x/c)+c if x c • Typically choose a=0 • Value of c chosen for additivity • Improved homogeneity of variance • For low expression genes compare differences, not ratios

  7. Probe Specific Effects • “Probe specific biases…are highly reproducible and predictable, and their adverse effect can be reduced by proper modeling and analysis methods” -Li and Wong (PNAS 2000) • Multiplicative model for PM - MM, for each probeset, (ith chip, jth probe) • Resistance achieved by iteratively omitting extreme points (or chips) and refitting using least squares

  8. Probe Specific Effects (Our Approach) • For each probeset, resistant, additive fit to PM - MM * • Use a fitting procedure that is highly resistant to extreme values (median polish) *Since logs are undefined for non-positive values and unstable for small values, we use a linear-log hybrid scale

  9. Adjusting for Chip Bias • Initial centering of chips • Chip bias may depend on gene expression level • Plot chip effects vs. Overall expression level (grand median) for each probeset • Omit probesets that appear to change • Between group |dev|/Within group |dev| • Omit probesets in top 25% • Fit a resistant scatterplot smoother (loess)

  10. Fig 4: Typical Chip Normalization Plot Chip Effects* (Hybrid scale) Grand Median 5 groups  2 chips/group, 7.1K probesets

  11. Terry Speed questions 3. How do you tell that one approach to quantifying expression at the probe set level (e.g. SAFER), is better than another (e.g. dChip)? • Compare on data for which we ‘know’ the answer • Spiking experiments (limited # genes) • Validation (eg TaqMan) • Create POS and NEG groups as best we can. • How to compare (depends on down-stream usage) • repeatibility • eg. signal to noise ⇛ t-statistic ⇛ p-value • fold changes

  12. Fibroblast/Adipocyte Mixing Expt • Mixture %’s (100/0, 75/25, 50/50, 25/75, 0/100) • 3 chips/mix (15 chips total, Mg74A) • 3 methods (SAFER, SAFER(log), dCHIP) • Create groups of probesets using 100/0 vs. 0/100 • POS (max p < 0.01, correct oligos, n=1049) • NEG (incorrect oligos, n=2611) • p-value from t-test (pooled variance, hybrid scale) • We will change the POS, NEG and p-value definitions on some of the later slides

  13. Fibroblast/Adipocyte Mixing Expt (2) • Performance based on 75/25 vs 25/75 • p-values from t-test (pooled variance, hybrid) • for POS require same sign as 100/0 vs 0/100 • pos rate, false pos rate (FPR), pos rate vs FPR • Linearity?

  14. Fig 5: CDF for 0% vs 100% (all probesets) SAFER log dChip SAFER n = 12,654

  15. Fig 6: CDFs for POS and NEG probesets 0% vs 100% POS 0% vs 100% NEG SAFER dChip dChip SAFER log SAFER log SAFER 25% vs 75% POS 25% vs 75% NEG SAFER Uniform dist. SAFER log dChip SAFER POS: maxp < 0.01 (n = 1049) NEG: wrong sequence (n = 2611)

  16. Fig 7: Positive Rate vs ‘False’ PositiveRate 25% vs 75% SAFER dChip SAFER log POS: maxp < 0.01 (n = 1049) NEG: wrong seq. (n = 2611))

  17. Fig 8: Positive Rate vs ‘False’ Positive Rate (log scale) 25% vs 75% SAFER dChip SAFER log POS: maxp < 0.01 (n = 1049) NEG: wrong seq. (n = 2611) log scale

  18. Fig 9: Positive Rate vs ‘False’ Positive Rate (log scale) 25% vs 75%, dChip p-values used for dChip SAFER dChip SAFER log POS: maxp < 0.01 (n = 1038) NEG: wrong seq. (n = 2611) log scale

  19. Fig 10: Positive Rate vs ‘False’ Positive Rate (log scale) 25% vs 75% SAFER SAFER log dChip POS: rank (dChip(p)) < 1000NEG: wrong seq. & rank (dChip(p)) >2611-1000 log scale

  20. Fig 11: Boxplot of R2 values for POS probesets R2 SAFER SAFER(log) dCHIP POS: maxp < 0.01 (n = 1049)

  21. Fig 12: Boxplot of R2 values for POS probesets exclude 100/0 and 0/100 groups R2 SAFER SAFER(log) dCHIP POS: maxp < 0.01 (n = 1049)

  22. Terry Speed questions 1. Do you lose anything not being able to down-weight non-performing probe pairs in the way Li & Wong can with their phi's (ie, probe effect)? Li & Wong SAFER Response: We don’t know. • Down-weighting non-performing probes seems like a good idea. • Is up-weighting ‘bright’ probes good? (variability, saturation) • Possible to incorporate weighting in polishing step.

  23. Terry Speed questions 2. Is SAFER QC as thorough as Li & Wong's (in detecting aberrant chips, probe-sets, probe pairs)? Response: QC is not as thorough, but:: • Primary goal is to quantitate mRNA detection (and error). Explicit QC methods aimed at avoiding the effects of aberrant arrays, probes, individual observations are less important when resistant methods are used. • SAFER provides same raw materials (fitted values and residuals) for QC as Li and Wong. QC summaries can easily be made available.

  24. Conclusions • For these data, it appears that the SAFER method performs better than dChip. • Better sensitivity (ROC Curve) • Slightly Better Linearity • Caveat: This is one analysis of one dataset.

  25. Acknowledgments • Biometrics Research • Bert Gunter • Other • David Gerhold (Pharmacology) • John Thompson (Immunology) • Eric Muise (Immunology) • Karen Richards (Drug Metabolism) • Jian Xu (Pharmacology) • Yuhong Wang (Bioinformatics)

  26. Backups

  27. Example Median Polish grandmedian probe probe effects 1 2 3 4 5 36 -34 -8 0 57 73 123 0 26 29 92 111 -2 015 0 0 -5 1 4 chipeffects chip 0 0 36 93 109 -2 28 0 0 0 31 43 51 106 121 14 0 0 -2 -3 residuals intensities

  28. Fig 2: Choose c using P-values from Tukey Non-additivity Test P-value Hybrid(0,1) Hybrid(0,20) Hybrid(0,40) Raw Scale 5 groups  2 chips/group, 7.1K probesets

  29. Fig 3: Within Group SD, Hybrid Scale Within Group SD Grand effect 5 groups  2 chips/group, 7.1K probesets

  30. Fig 9: Between EU variability as a percentage of Total variability All probesets Probesets with mean>50 (hybrid) 100*VarBetween/(VarBetween + VarWithin) Grand Median Grand Median P=known expressed Line = loess smooth 15 human livers  2 chips/liver, 1.5K probesets

  31. dChip vs SAFER differences 0% vs 100% (all probesets) 0% vs 100% (POS probesets) 25% vs 75% (all probesets) 25% vs 75% (POS probesets) POS: maxp < 0.01 (n = 1049)

  32. Positive Rate vs ‘False’ Positive Rate (log scale) 25% vs 75% SAFER dChip SAFER log POS: maxp < 0.01 (n = 1049) NEG: wrong seq. & minp > 0.5 (n = 270) log scale

More Related