1 / 12

Homework on the Analysis of Affymetrix GeneChip Data

Homework on the Analysis of Affymetrix GeneChip Data. EPP 245 Statistical Analysis of Laboratory Data. Importing Affy Data. Make sure all the .CEL files are in a single directory (with no other .CEL files). Make sure the default directory is the one containing the .CEL files.

Download Presentation

Homework on the Analysis of Affymetrix GeneChip Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Homework on theAnalysis of Affymetrix GeneChip Data EPP 245 Statistical Analysis of Laboratory Data

  2. Importing Affy Data • Make sure all the .CEL files are in a single directory (with no other .CEL files). • Make sure the default directory is the one containing the .CEL files. • Make sure that Bioconductor has been installed since the last updated installation of R. • Make sure the affy package is loaded. • Run ReadAffy() EPP 245 Statistical Analysis of Laboratory Data

  3. > getwd() [1] "C:/TD/CLASS/EPP245 2007 Fall/RData" > library(affy) Loading required package: Biobase Loading required package: tools Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation(pkgname)'. Loading required package: affyio Loading required package: preprocessCore > rrdata <- ReadAffy() > colnames(exprs(rrdata)) [1] "LN0A.CEL" "LN0B.CEL" "LN1A.CEL" "LN1B.CEL" "LN2A.CEL" "LN2B.CEL" [7] "LN3A.CEL" "LN3B.CEL" "LN4A.CEL" "LN4B.CEL" "LN5A.CEL" "LN5B.CEL" Check column names to see which sample is in which column, EPP 245 Statistical Analysis of Laboratory Data

  4. Constructing the Expression Index • We use RMA in the form of the rma() function or the JustRMA() function. Use the latter if the former fails due to lack of memory. • The input is the output from ReadAffy(), which is an AffyBatch object, and the output is an expression set object. EPP 245 Statistical Analysis of Laboratory Data

  5. > eset <- rma(rrdata) trying URL 'http://bioconductor.org/packages/2.1/… Content type 'application/zip' length 1352776 bytes (1.3 Mb) opened URL downloaded 1.3 Mb package 'hgu95av2cdf' successfully unpacked and MD5 sums checked The downloaded packages are in C:\Documents and Settings\dmrocke\Local Settings… updating HTML package descriptions Background correcting Normalizing Calculating Expression EPP 245 Statistical Analysis of Laboratory Data

  6. Using LMGene to Analyze the Data • For each “gene” (probe set), we want to see if there are differences that are statistically significant. • LMGene can apply a linear model (regression or ANOVA) to each gene and determine one or more p-values. • It can use an improved method of testing, and can adjust for multiplicity. EPP 245 Statistical Analysis of Laboratory Data

  7. First, LMGene needs to be installed from CRAN using the Packages menu in R. • Then it needs to be loaded. • The variables in the experiment need to be defined. • The model needs to be specified. • Then run genediff() and pvadjust() to obtain the results. EPP 245 Statistical Analysis of Laboratory Data

  8. > library(LMGene) > colnames(exprs(eset)) [1] "LN0A.CEL" "LN0B.CEL" "LN1A.CEL" "LN1B.CEL" "LN2A.CEL" "LN2B.CEL" [7] "LN3A.CEL" "LN3B.CEL" "LN4A.CEL" "LN4B.CEL" "LN5A.CEL" "LN5B.CEL" > group <- factor(c(0,0,1,1,2,2,3,3,4,4,5,5)) > vlist <- list(group=group) > vlist $group [1] 0 0 1 1 2 2 3 3 4 4 5 5 Levels: 0 1 2 3 4 5 > eset.lmg <- neweS(exprs(eset),vlist) > lmg.results <- LMGene(eset.lmg) This results in a list of 1173 genes that are differentially expressed after using the moderated F statistic. Compare to 119 if the moderated statistic is not used. EPP 245 Statistical Analysis of Laboratory Data

  9. > genediff.results <- genediff(eset.lmg) > names(genediff.results) [1] "Gene.Specific" "Posterior" > hist(genediff.results$Gene.Specific) > hist(genediff.results$Posterior) > pv2 <- pvadjust(genediff.results) > names(pv2) [1] "Gene.Specific" "Posterior" "Gene.Specific.FDR" [4] "Posterior.FDR" > sum(pv2$Gene.Specific < .05) [1] 2615 > sum(pv2$Posterior < .05) [1] 3082 > sum(pv2$Gene.Specific.FDR < .05) [1] 119 > sum(pv2$Posterior.FDR < .05) [1] 1173 Using genediff results in two lists of 12625 p-values. One uses the standard 6df denominator and the other uses the moderated F-statistic with a denominator derived from an analysis of all of the MSE’s from all the linear models. EPP 245 Statistical Analysis of Laboratory Data

  10. > library(annaffy) Loading required package: GO Loading required package: KEGG > generank <- order(pv2$Posterior) > ranked.gene.pvs <- pv2$Posterior.FDR[generank] > ranked.gene.pvs[1:5] [1] 9.329426e-05 9.329426e-05 9.329426e-05 9.329426e-05 1.618971e-04 > ranked.gene.ids <- featureNames(eset)[generank] > ranked.gene.ids[1:5] [1] "38608_at" "37208_at" "AFFX-M27830_5_at" "34301_r_at” "33646_g_at" > ranked.lls <- aafLocusLink(ranked.gene.ids,"hgu95av2") > browseURL(getURL(ranked.lls[1])) EPP 245 Statistical Analysis of Laboratory Data

  11. > ranked.gene.symbols <- aafSymbol(ranked.gene.ids,"hgu95av2") Loading required package: hgu95av2 > ranked.gene.symbols[1:5] An object of class "aafList" [[1]] An object of class “aafSymbol” [1] "LGALS7" [[2]] An object of class “aafSymbol” [1] "PSPHL" [[3]] An object of class “aafSymbol” character(0) [[4]] An object of class “aafSymbol” [1] "KRT17" [[5]] An object of class “aafSymbol” [1] "GM2A" EPP 245 Statistical Analysis of Laboratory Data

  12. EPP 245 Statistical Analysis of Laboratory Data

More Related