Microarray Design and Analysis. Jeremy D. Glasner. Genetics 875 November 20, 2007. What is a Microarray?. A collection of DNA sequences arrayed on a solid substrate usually with thousands of individual DNA spots. Gene expression analysis. Massively parallel biochemistry aimed
Microarray Design and Analysis
Jeremy D. Glasner
November 20, 2007
What is a Microarray?
A collection of DNA sequences arrayed on a solid substrate
usually with thousands of individual DNA spots
Gene expression analysis
Massively parallel biochemistry aimed
at measuring RNA levels
Why do gene expression analysis?
Typical gene expression experiment
TraSH: transposon site hybridization
Genome wide localization of insertion mutations
Sassetti CM, Boyd DH, Rubin EJ. Proc Natl Acad Sci U S A. 2001 98(22):12712-7
CGH: Comparative genome hybridization
Rajashekara G, Glasner JD, Glover DA, Splitter GA. J Bacteriol. 2004 186(15):5040-51.
ChIP-Chip: Chromatin immunoprecipitation, chip hybridization
a.k.a. genome-wide occupancy profiling
Identify the chromosomal locations of a DNA binding protein
Flow of Information in Array Analyses
Experimental Design Issues
Number of Replicates
What samples should be compared?
Directly on same chip/across arrays?
Calibrators and common references
What controls are necessary?
Considerations when designing the sequences for a chip
ORF, UTR, functional RNA prediction
Array Design, Replication
Sensitivity vs. specificity as a function of oligo length
They are inversely related
Hughes TR, et al., Nat Biotechnol 2001. Apr;19(4):342-7.
Two methods for array production
Perfect Match and Mismatch
Average Difference Values
A “probe set”
A “probe pair”
Flexibility of NimbleGen Arrays
RNA samples are extracted for the experiment and fluorescent dyes are incorporated
Direct vs. indirect labeling
Hybridization & Scanning
Automatic Grid Finding
Sum signal intensities in X and Y directions
Estimating Foreground and Background with the “Fixed Circle” Method
Estimating Foreground and Background with the “Histogram” Method
Quality Filtering of Data
From Tseng et al., 2001. NAR 29(12):2549-2557
Data Normalization is necessary if the overall signal differs between experiments and can be complicated if the relationship is nonlinear.
Internal controls can also be used for normalization.
Data from Schadt et al 2001. Journal of Cellular Biochemistry Supplement 37:120-125.
Print Tip Group Normalization
Intensity Dependent Normalization
Assume linear relationship
Apply non-linear normalization
Normalize to “house-keeping genes”
Normalize to internal Standards
Ratio Calculation Methods
Detecting differential expression
Determine which changes are significant:
Fixed cutoff (fold-change>4)
Replication allows assessment of variability
Common statistics such as the t-test are often used for gene expression data. Significance of the value is then determined by referring to the t distribution. This assumes that the data is normally distributed, which may not be true.
Gene expression experiments may require thousands of statistical tests and significance should be adjusted to reflect this. A standard Bonferroni correction is the p-value multiplied by the number of tests but is likely too conservative.
Millenaar et al., BMC Bioinformatics2006, 7:137