Microarray Design and Analysis. Jeremy D. Glasner. Genetics 875 November 20, 2007. What is a Microarray?. A collection of DNA sequences arrayed on a solid substrate usually with thousands of individual DNA spots. Gene expression analysis. Massively parallel biochemistry aimed
Jeremy D. Glasner
November 20, 2007
A collection of DNA sequences arrayed on a solid substrate
usually with thousands of individual DNA spots
Massively parallel biochemistry aimed
at measuring RNA levels
Genome wide localization of insertion mutations
Sassetti CM, Boyd DH, Rubin EJ. Proc Natl Acad Sci U S A. 2001 98(22):12712-7
Rajashekara G, Glasner JD, Glover DA, Splitter GA. J Bacteriol. 2004 186(15):5040-51.
a.k.a. genome-wide occupancy profiling
Identify the chromosomal locations of a DNA binding protein
Number of Replicates
What samples should be compared?
Directly on same chip/across arrays?
Calibrators and common references
What controls are necessary?
ORF, UTR, functional RNA prediction
Array Design, Replication
They are inversely related
Hughes TR, et al., Nat Biotechnol 2001. Apr;19(4):342-7.
Perfect Match and Mismatch
Average Difference Values
A “probe set”
A “probe pair”
RNA samples are extracted for the experiment and fluorescent dyes are incorporated
Direct vs. indirect labeling
Sum signal intensities in X and Y directions
Estimating Foreground and Background with the “Fixed Circle” Method
Estimating Foreground and Background with the “Histogram” Method
From Tseng et al., 2001. NAR 29(12):2549-2557
Data Normalization is necessary if the overall signal differs between experiments and can be complicated if the relationship is nonlinear.
Internal controls can also be used for normalization.
Data from Schadt et al 2001. Journal of Cellular Biochemistry Supplement 37:120-125.
Assume linear relationship
Apply non-linear normalization
Normalize to “house-keeping genes”
Normalize to internal Standards
Determine which changes are significant:
Fixed cutoff (fold-change>4)
Replication allows assessment of variability
Common statistics such as the t-test are often used for gene expression data. Significance of the value is then determined by referring to the t distribution. This assumes that the data is normally distributed, which may not be true.
Gene expression experiments may require thousands of statistical tests and significance should be adjusted to reflect this. A standard Bonferroni correction is the p-value multiplied by the number of tests but is likely too conservative.
Millenaar et al., BMC Bioinformatics2006, 7:137