1 / 21

Randa Stringer Supervisor: Dr. Guillaume Par é

A review of quality control and pre-processing measures for the Illumina 450K BeadChip. Randa Stringer Supervisor: Dr. Guillaume Par é. Steps for Review. Sample Quality Probe Quality Background correction Normalization Cellular composition Batch effects. Array Design.

jules
Download Presentation

Randa Stringer Supervisor: Dr. Guillaume Par é

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A review of quality control and pre-processing measures for the Illumina 450K BeadChip Randa Stringer Supervisor: Dr. Guillaume Paré

  2. Steps for Review • Sample Quality • Probe Quality • Background correction • Normalization • Cellular composition • Batch effects

  3. Array Design • > 485,000 CpG sites • Covers 99% of RefSeq genes • Average of 17 sites per gene • Distributed across promoter, 5’ UTR, first exon, gene body, and 3’ UTR • Covers 96% of known CpG islands

  4. Sample Quality • Reported vs. predicted sex • Use DNA methylation to predict sex • Minfi – getSex function • yMed - xMed is less than cutoff we predict a female, otherwise male. • Sample detection cut-offs • Threshold of failed probes in a sample (usually < 0.05 or 0.01)

  5. Probe Quality • Probe detection cut-offs • Bead count ( > 3 ) • Remove probes on sex chromosomes • Probes containing SNPs • Cross-reactive probes • MAF > 1%

  6. Background Correction • Background subtraction method • Available in GenomeStudio • Background calculated from negative control probes is subtracted from all probes (separately for each channel [rd vsgrn]) (GenomeStudioMethylation Module v1.8 User Guide)

  7. Normalization • Goal: reduce non-biological variation • Equalizes probe intensity and signal distributions across arrays and between colour channels • New challenges with DNA methylation vs. gene expression techniques • Systematic/technical variation • Novel probe design

  8. Normalization for Illumina 450K • Problem: 2-type probe design Infinium I Probe 2 different probes per CpG Infinium II Probe Single base extension at CpG Maksimovic et al. Genome Biology 2012

  9. CpG Content • Infinium II ≤ 3 InfiniumI ≥ 3 • Compressed β value distribution in InfII • Solution: scale Infinium II probes to InfIprobes Maksimovic et al. Genome Biology 2012

  10. Normalization to Internal Controls • IlluminaGenomeStudio • Probe intensity multiplied by constant normalization factor (NF) • NF calculated as average of controls in a reference sample (GenomeStudioMethylation Module v1.8 User Guide) • Doesn’t account for the InfIvsInfIIprobe issues

  11. Peak-Based Correction (PBC) • Uses peak summits to correct β values • Convert β to M values • Determine peaks for I and II probes with kernel density estimation • Rescale M values by peak summits • Rescale these corrected M values to the I range and converted back to β values Raw PBC Dedeurwaerder et al. Epigenomics 2011

  12. Subset Quantile Normalization (SQN) • Modeled after SQN methods in expression • Probes separated and poor detection removed • ‘Anchors’ (RQs) chosen from InfI probes • Target quantiles are estimated for InfI and II • InfI and II normalized to their RQs • Dataset is rebuilt Touleimat and Tost, Epigenomics, 2012

  13. SQN Cont’d No normalization Unique RQs RQs by ‘relation to CpG’ RQs by ‘relation to gene sequence’ Maksimovic et al. Genome Biology 2012

  14. Subset Within-Array Normalization (SWAN) • Allows InfI and InfII probes to be normalized together • Subset of N InfI and InfII probes chosen based on underlying CpG content • Separate methylated and unmethylated channels • Mean intensity for each of 3N calculated • InfI and II probes adjusted separately by linear interpolation Maksimovic et al. Genome Biology 2012

  15. Beta-MIxtureQuantile normalization (BMIQ) • Novel normalization method • Fit 3-state (U/H/M) to InfI and InfII probes separately • Transform InfI U and M probes using the inverse of the cumulative beta distribution estimated from the respective InfII probes • For H probes perform dilation transformation to fit the data into the gap Teschendorff et al. Bioinformatics 2012

  16. START Data Raw Data SWAN Normalized

  17. Cellular Composition Adapted from Correa-Rocha et al. Pediatric Research 2012

  18. Estimations by Houseman Houseman et al. BMC Bioinformatics 2012

  19. Batch Effects • Can be assessed using principal component analysis or variations on singular variable decomposition (ex. sva) • ComBat method uses a parametric or non-parametric empirical Bayes framework to adjust for a known source of batch effects

  20. Singular Variable Decomposition (START)

  21. Questions&Discussion

More Related