Introduction to Microarrays Dr. Özlem İLK & İbrahim ERKAN 2011, Ankara
Gene • The fundamental unit in a living organism. It holds information necessary for building cells and passing genetic traits to offsprings. • Every single cell in the human body have the same set of genes. • However, different genes are active (and therefore "expressed") in different kinds of cells and tissues.
Gene • The amount of activity (expression) in a gene is an indicator whether it is being used to form an organic structure or not.
What is a microarray? • DNA microarrays are small solid surfaces on which thousands of gene sequences are contained.
Microarray • The spots of gene sequences are placed in an order. • Therefore, the researcher can keep track of the gene sequences and uses the location of each spot in the array to identify a particular gene sequence.
Microarray • Microarrays help easily and quickly analyze thousands of genes at once. • Analyzing genes refers to the determination of activity and also the amount of activity in a specific gene. • This can help finding active genes in the existence of a certain disease or a treatment providing guidance for medical researchers. • Interdisciplinary work – statisticians, biologists, doctors, engineers, …
Methods Preprocessing – MAS5, dChip,RMA, gcRMA ... fold change Clustering ANOVA, t-test multiple testing correction
Fold change: (GE1 / GE2) • Easy • Can use in data sets without any replicates • Not a statistical method, doesn’t take the variance into account, its sensitivity and reliability are in doubt
Clustering • Easy • Can’t measure the statistical significance and would find clusters in the data even if there aren’t reasonable clusters in the dataset • Can be affected from the data transformation or from the measurement unit (Xu et al., Human Molecular Genetics, 2002)
ANOVA H0: µ1i=µ2i=µ3i , H1: at least one is different, i: prob set
Interpretation of the results • Rejection regionRejection region -1.96 1.96 z • If p-value < alpha, then reject H0 • Alpha is usually 0.05, 0.01 or 0.1 • E.g., expected false positive is 10,000 gene * 0.05 = 500
ANOVA • Can test the difference between groups • Takes the variance into account • Can’t be applied to data w/o replicates • Assumes data come from Normal distribution • The rejection of H0 does not provide the information on which groups are different, we need pairwise comparisons (t-tests) for this.
t-test H0: µ1i=µ2i , H1: µ1i≠µ2i • Can measure the differences between two groups • Takes the variance into account • Can’t be applied to data w/o replicates • Assumes data come from Normal distribution • Warning: paired/ unpairedt-test
Multiple testing correction • Assume that we are holding two independent tests (!) for two genes. Let the probability of each being correct be 0.95. Due to independence, the probability of both test being correct is 0.95*0.95 = 0.9025. • Bonferroni • New alpha = alpha / test number • Too conservative • Benjamini-Hochberg FDR • Example, suppose you found 100 expressed genesout of 10,000 at alpha= 0.0001 • Expected false positive number 10,000 gene * 0.0001 = 1 • False Discovery Rate (FDR) = 1/100 = 0.01
Recommended references • Pavlidis, P.. Using ANOVA for gene selection from microarray studies of the nervous system. Methods, (2003); 31, 282–289. • Books in the series of QP624.5 in the library • A few courses in Computer Engineering Department (Tolga Can)