rna seq datasets n.
Skip this Video
Loading SlideShow in 5 Seconds..
RNA-Seq datasets PowerPoint Presentation
Download Presentation
RNA-Seq datasets

Loading in 2 Seconds...

play fullscreen
1 / 11

RNA-Seq datasets - PowerPoint PPT Presentation

  • Uploaded on

RNA-Seq datasets. Dan Lawson. New buzz word (old data). In the beginning there were ESTs... and then there was Roche 454.. and then Solexa/Illumina. Why do we generate data sets? Who is producing data sets? Where do we obtain these? What can we use them for? How do we organise these?.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

RNA-Seq datasets

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
new buzz word old data
New buzz word (old data)
  • In the beginning there were ESTs...
  • and then there was Roche 454..
  • and then Solexa/Illumina.
  • Why do we generate data sets?
  • Who is producing data sets?
  • Where do we obtain these?
  • What can we use them for?
  • How do we organise these?

VectorBase 2012


why do we produce rna seq data sets
Why do we produce RNA-Seq data sets?
  • Access to the transcriptome of an organism (speed v cost)
  • Technical issues with the genome of that species (size, repeat content)
  • Quantification of gene expression levels (absolute & relative)
  • Analysis of these data sets both require and can deliver improvements to the quality of the predicted gene structures

VectorBase 2012


who is producing rna seq data sets
Who is producing RNA-Seq data sets?
  • Almost all de novo genome sequencing projects in order to produce a substrate for gene prediction
  • Large studies (such as the Vosshall and Krzywinski DBPs)
  • Small studies (such as Zweibel chemosensors)
  • XXXXX[orgn] AND study_type_transcriptome_analysis[prop]

VectorBase 2012


rna seq data sets in vectorbase
RNA-Seq data sets in VectorBase
  • We do not want to be the archival database for these data sets (as they are large and will be very common)
  • We do want to identify important sets and present some level of processed/analysed data
  • All sets require some level of QC/filtering
  • All sets require alignment back to a reference genome
    • Default aligner has been bowtie (but we know this is sub-optimal)
    • Other aligners used include inchworm, gsnap, bwa
    • Output is a BAM file
    • Use SAMtools to index the BAM files (so that Ensembl tools can use these sets, tools are a viewer and slicer)
  • {To Do} Move indexed BAM files on FTP site

VectorBase 2012


using rna seq data gene prediction
Using RNA-Seq data: Gene prediction
  • Aligned RNA-Seq data sets provide
    • Coverage plots which can be processed to transfrags
    • Exon-Intron junction data
  • Use in automated annotation (MAKER)
    • Requires assembly/clustering for performance issues
    • Useful for providing training data for ab initio predictiors
    • transfrags should be used with caution in early rounds of MAKER
  • Use in manual annotation (Apollo/Artemis)
    • Identification of novel predictions, exons
    • Confirmation/correction of intron junction data
    • Manual inclusion of UnTranslated Regions (UTRs)

VectorBase 2012


using rna seq data gene expression
Using RNA-Seq data: Gene expression
  • Use the abundance of reads in an RNA-Seq experiment to assay the level of expression for a locus
  • Requires:
    • Aligned RNA-Seq data sets (BAM)
    • Annotation sets (GFF/GTF)
  • Processed to give FPKM/RPKM values for expression levels
  • Storage of these data in BASE2/GDAV (as discussed by Bob yesterday)

VectorBase 2012


rna seq visualization of coverage
RNA-Seq visualization of coverage
  • BAM viewer (VectorBase)
    • Good for single (or small number of lanes)
    • Flexible, user chooses which experiments to visualize
    • Becomes slow and unwieldy with a medium-large number of lanes
  • Multiple experiments (FlyBase)
    • Good for multiple experiments
    • Pre-defined set of experiments
    • Fast response time

VectorBase 2012


rna seq questions 1
RNA-Seq questions #1
  • Given limited space/speed
    • What are the key experiments we can support?
    • Criteria fo defining these?
    • Pre/post publication data sets?
    • Shelf life for an RNA-Seq experiment?
  • How do we aggregate across different experiments?
    • Coverage/Junctions
    • By species, developmental stage, body part, condition

VectorBase 2012