160 likes | 383 Views
Microarray Pitfalls. Stem Cell Network Microarray Course, Unit 3 October 2006. Goals. To provide some guidelines on Affymetrix microarrays: How to use them How not to use them Things to keep in mind when designing experiments and analyzing data
E N D
Microarray Pitfalls Stem Cell Network Microarray Course, Unit 3 October 2006
Goals • To provide some guidelines on Affymetrix microarrays: • How to use them • How not to use them • Things to keep in mind when designing experiments and analyzing data • This is a general discussion of issues and is by no means exhaustive
Inconsistent Annotations • Affymetrix provided probeset annotations change over time • The gene symbol associated with a given probeset is not necessarily stable • This is due to changes in gene prediction as new information becomes available.
Inconsistent Annotations (2) An inconsistently annotated probeset • Perez-Iratxeta, C. and M.A. Andrade. 2005. Inconsistencies over time in 5% of NetAffx probe-to-gene annotations. BMC Bioinformatics.6, 183. • 5% of probesets have gene identifiers that change over the two year time span covered by this analysis
Inconsistent Annotations (3) • How do we deal with this? • Always note annotation version used in analysis especially when it is for publication • Report probeset name as well as gene symbol • Remember that re-analysis with later annotations may yield different results • Keep your annotation files up to date
Old chips, new data • Expression microarrays are designed based the best available model of the genome of interest • The model for the HG-U133 microarrays was a human genome assembly that was only 25% complete! • The human assembly is >99% complete now
Old chips, new data (2) • How do we deal with this? • A number of groups provide re-mappings of probes to probesets based upon the latest data available, for example: • Dai M, et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 2005;33:e175
Multiple Testing Corrections • A single expression microarray experiment actually consist of hundreds of thousands of simultaneous parallel experiment • This means you can test many hypotheses simultaneously • This is not free: the significance of any given result is decreases as a function of the number of hypotheses tested
Multiple Testing Corrections (2) • How do we deal with this? • Limit the number of hypothesis you are testing instead of just ‘fishing’ in the whole data set. • Do this by selecting a set of candidate genes ahead of time based on your knowledge of the biology of the system.
Multiple Testing (3) • Sandrine Dudoit, Juliet Popper Shaffer and Jennifer C. Boldrick Multiple Hypothesis Testing in Microarray ExperimentsStatistical Science 2003, Vol. 18, No. 1, 71–103 • “The biological question of differential expression can be restated as a problem in multiple hypothesis testing: the simultaneous test for each gene of the null hypothesis of no association between the expression levels and the responses” • Talk to a statistician if you have doubts
Not everything is in the array • Probesets are designed with a bias towards the 3’ end of the gene. • they won’t distinct splice variants • won’t pick up alternative 3’ endings
Not everything is in the array (2) • What can we do about this? • You should be aware of this, but not much can be done. • Use other technologies to complement your microarray results (PCR, sequencing)
What are you measuring? • Remember that you are detecting the average mRNA over a population of cells. • Is your sample homogenous? • If it’s not homogenous then what are you measuring? How many types of cells in what state? • Time series of differentiating cells are particularly problematic.
Inhomogenous Samples? • Many sources of inhomogeneity • Source organism gender • Cell cycle • Tissue source • Diet • Some can be eliminated • All should be documented where possible
Chips don’t detect protein • Central assumption of microarray analysis: The level of mRNA is positively correlated with protein expression levels. • Higher mRNA levels mean higher protein expression, lower mRNA means lower protein expression • Other factors: • Protein degradation, mRNA degradation, polyadenylation, codon preference, translation rates,….
Conclusion • This is a general discussion of issues, doesn’t cover all pitfalls. • Please contact ogicinfo@ohri.ca if you have any comments, corrections or questions. • See associated bibliography for references from this presentation and further reading. • Thanks for your attention!