rna seq datasets n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
RNA-Seq datasets PowerPoint Presentation
Download Presentation
RNA-Seq datasets

Loading in 2 Seconds...

play fullscreen
1 / 11

RNA-Seq datasets - PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on

RNA-Seq datasets. Dan Lawson. New buzz word (old data). In the beginning there were ESTs... and then there was Roche 454.. and then Solexa/Illumina. Why do we generate data sets? Who is producing data sets? Where do we obtain these? What can we use them for? How do we organise these?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'RNA-Seq datasets' - myrna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
new buzz word old data
New buzz word (old data)
  • In the beginning there were ESTs...
  • and then there was Roche 454..
  • and then Solexa/Illumina.
  • Why do we generate data sets?
  • Who is producing data sets?
  • Where do we obtain these?
  • What can we use them for?
  • How do we organise these?

VectorBase 2012

2

why do we produce rna seq data sets
Why do we produce RNA-Seq data sets?
  • Access to the transcriptome of an organism (speed v cost)
  • Technical issues with the genome of that species (size, repeat content)
  • Quantification of gene expression levels (absolute & relative)
  • Analysis of these data sets both require and can deliver improvements to the quality of the predicted gene structures

VectorBase 2012

3

who is producing rna seq data sets
Who is producing RNA-Seq data sets?
  • Almost all de novo genome sequencing projects in order to produce a substrate for gene prediction
  • Large studies (such as the Vosshall and Krzywinski DBPs)
  • Small studies (such as Zweibel chemosensors)
  • XXXXX[orgn] AND study_type_transcriptome_analysis[prop]

VectorBase 2012

4

rna seq data sets in vectorbase
RNA-Seq data sets in VectorBase
  • We do not want to be the archival database for these data sets (as they are large and will be very common)
  • We do want to identify important sets and present some level of processed/analysed data
  • All sets require some level of QC/filtering
  • All sets require alignment back to a reference genome
    • Default aligner has been bowtie (but we know this is sub-optimal)
    • Other aligners used include inchworm, gsnap, bwa
    • Output is a BAM file
    • Use SAMtools to index the BAM files (so that Ensembl tools can use these sets, tools are a viewer and slicer)
  • {To Do} Move indexed BAM files on FTP site

VectorBase 2012

5

using rna seq data gene prediction
Using RNA-Seq data: Gene prediction
  • Aligned RNA-Seq data sets provide
    • Coverage plots which can be processed to transfrags
    • Exon-Intron junction data
  • Use in automated annotation (MAKER)
    • Requires assembly/clustering for performance issues
    • Useful for providing training data for ab initio predictiors
    • transfrags should be used with caution in early rounds of MAKER
  • Use in manual annotation (Apollo/Artemis)
    • Identification of novel predictions, exons
    • Confirmation/correction of intron junction data
    • Manual inclusion of UnTranslated Regions (UTRs)

VectorBase 2012

6

using rna seq data gene expression
Using RNA-Seq data: Gene expression
  • Use the abundance of reads in an RNA-Seq experiment to assay the level of expression for a locus
  • Requires:
    • Aligned RNA-Seq data sets (BAM)
    • Annotation sets (GFF/GTF)
  • Processed to give FPKM/RPKM values for expression levels
  • Storage of these data in BASE2/GDAV (as discussed by Bob yesterday)

VectorBase 2012

7

rna seq visualization of coverage
RNA-Seq visualization of coverage
  • BAM viewer (VectorBase)
    • Good for single (or small number of lanes)
    • Flexible, user chooses which experiments to visualize
    • Becomes slow and unwieldy with a medium-large number of lanes
  • Multiple experiments (FlyBase)
    • Good for multiple experiments
    • Pre-defined set of experiments
    • Fast response time

VectorBase 2012

8

rna seq questions 1
RNA-Seq questions #1
  • Given limited space/speed
    • What are the key experiments we can support?
    • Criteria fo defining these?
    • Pre/post publication data sets?
    • Shelf life for an RNA-Seq experiment?
  • How do we aggregate across different experiments?
    • Coverage/Junctions
    • By species, developmental stage, body part, condition

VectorBase 2012

10