1 / 27

AffyDEComp: towards a benchmark for differential expression methods

AffyDEComp: towards a benchmark for differential expression methods. Richard Pearson School of Computer Science University of Manchester. Overview. Why benchmark DE methods? The Golden Spike data set AffyDEComp Conclusions Recommendations. The need for benchmarks.

jalia
Download Presentation

AffyDEComp: towards a benchmark for differential expression methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AffyDEComp: towards a benchmark for differential expression methods Richard Pearson School of Computer Science University of Manchester

  2. Overview • Why benchmark DE methods? • The Golden Spike data set • AffyDEComp • Conclusions • Recommendations

  3. The need for benchmarks • Microarray analysis has many stages • Competing methods at each stage • Methodologists good at showing superiority • Results can appear contradictory • Confused end users choice driven by… • What they are familiar with • What colleagues use • What was used in their favourite paper • …and not by a scientific comparison

  4. Benchmarking requirements • Methods: a set we wish to compare • Benchmark data: where truth is known • Metrics: by which to compare methods • Affycomp • Methods: Summarisation methods • Benchmark data: various spike-in studies • Metrics: various, including, e.g. area under ROC curve for a fold change classifier • Affycomp doesn’t compare DE methods

  5. A benchmark for DE methods • Methods: • DE methods depend on summarisation • Compare summarisation/DE combinations • Benchmark data: • Affycomp spike-ins have few DE genes • Golden spike data has many DE genes, but also a few “issues”! • Metrics: • Based around areas under ROC curves

  6. The Golden Spike data • 3 “sample”, 3 “control” arrays • Many RNAs “spiked-in” at known levels • “DE”, “Equal” and “Empty” probesets. • Controversial data set • Non-uniform null p-value distributions - use ROC • Spike-in concentrations high - unrepresentative • “DE” spike-ins all up-regulated - unrepresentative • Concentrations and FC confounded - loess • Different FC between “Equal” and “Empty”

  7. “Empty” > FC than “Equal” • Most analyses have treated both Empty and Equal as True Negatives - to what effect?

  8. “Empty” > FC than “Equal” • To illustrate how analysis choices effect results I’ll treat Empty and Equal as true negative (TN) and DE<=1.2 as true positive (TP)

  9. 2-sided test • Large apparent difference between methods • Can you guess which paper used this chart?

  10. 2-sided test • Large apparent difference between methods • Are TP correctly identified as up-regulated?

  11. 1-sided test of up-regulation • Probesets identified as up-regulated not TP

  12. 1-sided test of down-regulation • DE probesets are mostly being identified as down-regulated, despite the fact that they are in truth up-regulated We appear to be identifying TP as down-regulated

  13. DE <=1.2 lower than Empty • TP are identified as down-regulated because most TN are “Empty” which have higher FC than DE <= 1.2

  14. Remove “empty” probesets • We can remedy this by using just Equal probesets as our TN… • …bearing in mind that this makes the data somewhat atypical

  15. Up-regulation - Empty in TN • Probesets identified as up-regulated generally not TP when using Empty in TN

  16. Up-regulation - TN Equal • Probesets identified as up-regulated more likely to be TP when using only Equal as TN

  17. Down-regulation - Empty in TN • DE probesets are mostly being identified as down-regulated, despite the fact that they are in truth up-regulated We appear to be identifying TP as down-regulated when including Empty in TN

  18. Down-regulation - TN Equal We generally don’t identify TP as down-regulated when excluding Empty in TN

  19. “Recommended” test • We recommend using just Equal as TN, and all DE as TP

  20. Recommended Up-reg • Using our recommendations, tests of up-regulation generally find TP, as expected

  21. Recommended Down-reg • Using our recommendations, tests of down-regulation generally don’t find TP, as expected

  22. Analysis decisions to make • Summarisation method • DE method • Direction of DE (recommend up) • Choice of true negatives (equal only) • Choice of true positives (all DE) • Post-summarisation normalisation (loess using equal only) • Type of ROC chart (standard ROC) • Proportion of x-axis to display (all)

  23. AffyDEComp - charts

  24. AffyDEComp - comparison

  25. AUCs - recommended choices

  26. Conclusions • First step towards a reliable benchmark for DE • Golden Spike data has some value if use of empty probesets is revisited • Certain combinations of summarisation/DE methods seem poor • Keep it open (Bioconductor) - because science should be reproducible!

  27. Recommendations • Create a new spike-in data set where • Spike-in concentrations are realistic • DE spike-ins both up- and down-regulated • Concentrations and FC not confounded • Larger number of arrays • Benchmarks using regulatory information • Benchmarks for Illumina data • Benchmarks for SNP chips (GWA studies) • manchester.ac.uk/bioinformatics/affydecomp

More Related