aed n culhane aedin@jimmy harvard edu n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Getting Data into R & Bioconductor PowerPoint Presentation
Download Presentation
Getting Data into R & Bioconductor

Loading in 2 Seconds...

play fullscreen
1 / 20

Getting Data into R & Bioconductor - PowerPoint PPT Presentation


  • 103 Views
  • Uploaded on

Aed í n Culhane aedin@jimmy.harvard.edu. Getting Data into R & Bioconductor. http://www.hsph.harvard.edu/research/aedin-culhane/. Simple Excel SpreadSheet data. Already described Read.table() Read.csv() scan() Are other formats eg netcdf However more datatype specialized.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Getting Data into R & Bioconductor


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Aedín Culhane aedin@jimmy.harvard.edu Getting Data into R & Bioconductor http://www.hsph.harvard.edu/research/aedin-culhane/

    2. Simple Excel SpreadSheet data • Already described • Read.table() • Read.csv() • scan() • Are other formats eg netcdf • However more datatype specialized. • Look at Technologies on BiocViews. • http://www.bioconductor.org/packages/release/BiocViews.html

    3. Some common data types Microarray SNP Increasingly NGS

    4. A Microarray Overview

    5. Reading Affymetrix Data library(affy) require(affy) # Alternative affybatch <- ReadAffy(celfile.path="[Location of your data]") eSet<-justRMA()

    6. Sample R code

    7. ExpressionSet Class in R

    8. Assessing Data Quality

    9. Public Microarray Data ArrayExpress 21997 Studies (622,617 profiles,) GEO 22,735 Studies (558,074 profiles) Statistics May 2011

    10. >500,000 arrays x $500 = $250,000,000 Cancer Studies account for >14% of all studies in databases…

    11. R Code

    12. More on GEOquery require(GEOquery) Let's try to load the GDS810 dataset which contains data on Alzheimer's disease at various stages of severity. GDS810<-getGEO("GDS810") The getGEO function returns an object of class GEOData. You can get a description of this class like this: help("GEOData-class") Meta(GDS810) Columns(GDS810) head(Table(GDS810))

    13. Affy SNP Arrays

    14. Process – Affy SNP Arrays (Oligo package)

    15. Other Arrays • Illumina • Lumi package • 2 color spotted arrays • Limma package • Other arrays • http://www.bioconductor.org/help/workflows/oligo-arrays/

    16. Next Generation Sequencing Data

    17. R Code

    18. Exercise From GEO bring down GSE Download the dataset GSE1297 using getGEO This data will be downloaded as an eSet, so to see the expression data and phenoData, use pData and exprs Use ArrayQualityMetrics to Assess the data quality of these data

    19. With thanks to www.bioconductor.org/help/course.../Bioconductor-Introduction-lab.pdf

    20. Quick Aside: Interpreting hierarchical clustering trees Hierarchical analysis results viewed using a dendrogram (tree) Distance between nodes (Scale) Ordering of nodes not important (like baby mobile) A B Tree A and B are equivalent