1 / 30

Overview of Bioconductor

Aed í n Culhane aedin@jimmy.harvard.edu. Overview of Bioconductor. http://bcb.dfci.harvard.edu/~aedin http://www.hsph.harvard.edu/research/aedin-culhane. Bioconductor. Biannual release (normally April, October) to coincide with R release. Current: Bioconductor 2.9

jovan
Download Presentation

Overview of Bioconductor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Aedín Culhane aedin@jimmy.harvard.edu Overview of Bioconductor http://bcb.dfci.harvard.edu/~aedin http://www.hsph.harvard.edu/research/aedin-culhane

  2. Bioconductor Biannual release (normally April, October) to coincide with R release. Current: Bioconductor 2.9 (release coincide with R 2.14) To install use script on Bioconductor Website source("http://www.bioconductor.org/biocLite.R") biocLite()

  3. Packages Overview BioConductor web site • Bioconductor BiocViews Task view Software Annotation Data Experimental Data

  4. What Packages do I need? Specific to you data and analysis pipeline but for examples: • Bioconductor Workshops • Bioconductor Workflows

  5. Main types of Annotation Packages • Gene centric AnnotationDbi packages: • Organism: org.Mm.eg.db. • Technology/Platform: hgu133plus2.db. • GeneSets and Pathway (biology level): GO.db or KEGG.db • .db packages can be queried with sql or accessed using annotation package (totable, get, mget) • Genome centric GenomicFeatures packages: • Transriptome level: TxDb.Hsapiens.UCSC.hg19.knownGene • Generic features: Can generate via GenomicFeatures • biomaRt: • Query web-based `biomart' resource for genes, sequence, SNPs, and etc. • See http://www.bioconductor.org/help/course-materials/2011/BioC2011/LabStuff/AnnotationSlidesBioc2011.pdf

  6. Bioconductor resources • Mailing List (sign up for daily digest) • Documentation, workshop/course material online • Slides from talks, pdf of tutorials, R code • Help available for each software package • Each package MUST contain vignette (howto)‏ • Other resources ww.Rseek.orgwww.r-bloggers.com

  7. Vignette • Tutorials, provide worked example of package • Required in Bioconductor packages • Written in Sweave (Leisch, 2002). • LATEX dynamic reports in which R code is embedded and executable • All R code in vignette is checked (and executed) by R CMD check • http://www.bioconductor.org/docs/vignettes.html library("Biobase") library("GOstats") # Load package of interest openVignette()

  8. S4 classes and ExpressionSet • Within Bioconductor, you will encounter packages are structured around S4 object-oriented programming proposed by John Chambers (developer of S) • A class provides a software abstraction of a real world object. • A method performs an action on a class (Think of a class as a noun, and method as verb)

  9. Object (S4) • An object is an instance of a class. • Descriptions are stored in slots • slotNames(ob1) lists all slots in object, or use str(). • To access slots • ob1@slotname • slotname(ob1), or • slot(ob1, “slotname")

  10. Example: ExpressionSet > ALL ExpressionSet (storageMode: lockedEnvironment) assayData: 12625 features, 128 samples element names: exprs protocolData: none phenoData sampleNames: 01005 01010 ... LAL4 (128 total) varLabels: cod diagnosis ... date last seen (21 total) varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' pubMedIds: 14684422 16243790 Annotation: hgu95av2 library(ALL) data(ALL) slotNames(ALL) ALL@phenoData phenoData(ALL) class(ALL) ?ExpressionSet

  11. Method which act on a S4 class showMethods(class= "ExpressionSet") getMethod("write.exprs", "ExpressionSet") Or if you wish to see how the package really works, download and look the source code

  12. Aedín Culhane aedin@jimmy.harvard.edu Getting Data into R & Bioconductor http://www.hsph.harvard.edu/research/aedin-culhane/

  13. Simple Excel SpreadSheet data • Simple table • read.table() • read.csv() • scan() • However more datatype specialized. See Technologies on BiocViews. • http://www.bioconductor.org/packages/release/BiocViews.html • Large data files. Also see http://www.revolutionanalytics.com

  14. Some common data types • Microarray • SNP • NGS

  15. A Microarray Overview

  16. Reading Affymetrix Data library(affy) require(affy) # Alternative affybatch <- ReadAffy(celfile.path="[Location of your data]") eSet<-justRMA()

  17. Sample R code

  18. ExpressionSet Class in R

  19. Assessing Data Quality

  20. Public Microarray Data ArrayExpress 21997 Studies (622,617 profiles,) GEO 22,735 Studies (558,074 profiles) Statistics May 2011

  21. R Code

  22. More on GEOquery require(GEOquery) Let's try to load the GDS810 dataset which contains data on Alzheimer's disease at various stages of severity. GDS810<-getGEO("GDS810") The getGEO function returns an object of class GEOData. You can get a description of this class like this: help("GEOData-class") Meta(GDS810) Columns(GDS810) head(Table(GDS810))

  23. Affy SNP Arrays

  24. Process – Affy SNP Arrays (Oligo package)

  25. Other Arrays • Illumina • Lumi package • 2 color spotted arrays • Limma package • Other arrays • http://www.bioconductor.org/help/workflows/oligo-arrays/

  26. Next Generation Sequencing Data

  27. R Code

  28. Exercise • Install the library GEOquery • Download the dataset GSE1297 using getGEO • This data will be downloaded as an eSet, so to see the expression data and phenoData, use pData and exprs • Use ArrayQualityMetrics to Assess the data quality of these data

  29. R basics: Getting help • To get help • ?mean • help(mean) • help.search(“mean”)‏ • apropos("mean") • example(mean)‏ • http://www.bioconductor.org/help/

  30. With thanks to • www.bioconductor.org/help/course.../Bioconductor-Introduction-lab.pdf

More Related