1 / 84

Microarray Pre-processing, quality control and normalization

Microarray Pre-processing, quality control and normalization. Practical Problems 1. Comet Tails Likely caused by insufficiently rapid immersion of the slides in the blocking solution. Practical Problems 2. Practical Problems 3. High Background 2 likely causes: Insufficient blocking.

braith
Download Presentation

Microarray Pre-processing, quality control and normalization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microarray Pre-processing, quality control and normalization

  2. Practical Problems 1 • Comet Tails • Likely caused by insufficiently rapid immersion of the slides in the blocking solution.

  3. Practical Problems 2

  4. Practical Problems 3 High Background • 2 likely causes: • Insufficient blocking. • Precipitation of the labeled probe. Weak Signals

  5. Practical Problems 4 Spot overlap: Likely cause: too much rehydration during post - processing.

  6. Practical Problems 5 Dust

  7. Steps in Images Processing 1. Addressing: locate centers 2. Segmentation: classification of pixels either as signal or background. using seeded region growing). 3. Information extraction: for each spot of the array, calculates signal intensity pairs, background and quality measures.

  8. Steps in Image Processing 3. Information Extraction • Spot Intensities • mean (pixel intensities). • median (pixel intensities). • Pixel variation (IQR of log (pixel intensities). • Background values • Local • Morphological opening • Constant (global) • None • Quality Information Signal Background

  9. Addressing This is the process of assigning coordinates to each of the spots. Automating this part of the procedure permits high throughput analysis. 4 by 4 grids 19 by 21 spots per grid

  10. Addressing Registration Registration

  11. Problems in automatic addressing Misregistration of the red and green channels Rotation of the array in the image Skew in the array Rotation

  12. Segmentation methods • Fixed circles • Adaptive Circle • Adaptive Shape • Edge detection. • Seeded Region Growing. (R. Adams and L. Bishof (1994) :Regions grow outwards from the seed points preferentially according to the difference between a pixel’s value and the running mean of values in an adjoining region. • Histogram Methods • Adaptive threshold.

  13. Limitation of fixed circle method SRG Fixed Circle

  14. Limitation of circular segmentation • Small spot • Not circular Results from SRG

  15. Information Extraction • Spot Intensities • mean (pixel intensities). • median (pixel intensities). • Background values • Local • Morphological opening • Constant (global) • None • Quality Information Take the average

  16. Local Backgrounds

  17. Quality Measurements • Array • Correlation between spot intensities. • Percentage of spots with no signals. • Distribution of spot signal area. • Spot • Signal / Noise ratio. • Variation in pixel intensities. • Identification of “bad spot” (spots with no signal). • Ratio (2 spots combined) • Circularity

  18. QC implementation • marray and arrayQuality packages in Bioconductor (R) can help identify dye, hybridization and other experimental artifacts • Bioconductor: http://www.bioconductor.org/ • R: http://www.r-project.org/

  19. Why Normalization? • Many sources of systematic variation that affect measured gene expression. • Differences in labeling efficiency of red and green dyes • Print-tip effects • Array batch effects

  20. Within-Slide Normalization • Normalization balances red and green intensities. • Imbalances can be caused by • Different incorporation of dyes • Different amounts of mRNA • Different scanning parameters • In practice, we usually need to increase the red intensity a bit to balance the green

  21. Methods? log2R/G -> log2R/G - c = log2R/ (kG) Standard Practice (in most software) c is a constant such that normalized log-ratios have zero mean or median. Speed Approach: c is a function of overall spot intensity and print-tip-group. What genes to use? • All genes on the array • Constantly expressed genes (house keeping) • Controls • Spiked controls (e.g. plant genes) • Genomic DNA titration series • Other set of genes

  22. Experiment Probes: ~6,000 cDNAs, including 200 related to lipid metabolism.

  23. M vs. A M = log2(R / G) A = log2(R*G) / 2

  24. Normalization - Median • Assumption: Changes roughly symmetric • First panel: smooth density of log2G and log2R. • Second panel: M vs. A plot with median set to zero

  25. Normalization - lowess • Global lowess • Assumption: changes roughly symmetric at all intensities.

  26. Normalisation - print-tip-group Assumption:For every print group, changes roughly symmetric at all intensities.

  27. M vs. A - after print-tip-group normalization

  28. Within print-tip-group box plots forprint-tip-group normalized M

  29. Taking scale into account Assumptions: • All print-tip-groups have the same spread. True ratio is mij where i represents different print-tip-groups, j represents different spots. Observed is Mij, where Mij = aimij Robust estimate of ai is MADi = medianj { |yij - median(yij) | }

  30. Effect of location + scale normalization

  31. Comparing different normalisation methods

  32. Paired-slides: dye swap • Slide 1, M = log2 (R/G) - c • Slide 2, M’ = log2 (R’/G’) - c’ Combine bysubtracting the normalized log-ratios: [ (log2 (R/G) - c) - (log2 (R’/G’) - c’) ] / 2  [ log2 (R/G) + (log2 (G’/R’) ] / 2  [ log2 (RG’/GR’) ] / 2 provided c = c’ Assumption: the separate normalizations are the same.

  33. Summary Case 1: A few genes that are likely to change Within-slide: • Location: print-tip-group lowess normalization. • Scale: for all print-tip-groups, adjust MAD to equal the geometric mean for MAD for all print-tip-groups. Between slides (experiments) : • An extension of within-slide scale normalization (future work). Case 2: Many genes changing (paired-slides) • Self-normalization: taking the difference of the two log-ratios. • Check using controls or known information.

  34. Affymetrix Arrays

  35. A probe set = 11-20 PM,MM pairs There may be 5,000-55,000 probe sets per chip

  36. Chip QC: Defect Classes • In order of occurrence: • Dimness • High Background • Unevenness • Spots • Haze Band • Scratches • Brightness • Crop Circle • Cracked • Snow • Grid Misalignment • Training set of 7K chips (Human, Rat, Mouse)

  37. Spots, Scratches, etc.

  38. Spots, Scratches, etc.

  39. Grid Alignment

More Related