1 / 32

RNA- S eq : An assessment of technical reproducibility and comparison with gene expression arrays

RNA- S eq : An assessment of technical reproducibility and comparison with gene expression arrays. Wei Zhang University of Minnesota - Twin City March 21 st 2011. Outline. DNA Microarrays RNA- Seq Overview Experimental Design Normalization of RNA- Seq Data

sana
Download Presentation

RNA- S eq : An assessment of technical reproducibility and comparison with gene expression arrays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RNA-Seq: An assessment of technical reproducibility and comparison with gene expression arrays Wei Zhang University of Minnesota - Twin City March 21st 2011

  2. Outline • DNA Microarrays • RNA-Seq Overview • Experimental Design • Normalization of RNA-Seq Data • RNA-Seq Global Data Properties • Identifying Differentially Expressed Genes • Comparison of Results Across Technologies

  3. Outline • DNA Microarrays • RNA-Seq Overview • Experimental Design • Normalization of RNA-Seq Data • RNA-Seq Global Data Properties • Identifying Differentially Expressed Genes • Comparison of Results Across Technologies

  4. DNA Microarrays • Since the mid-1990s, DNA microarrays have been the technology of choice for large-scale studies of gene expression levels. • The ability of these arrays simultaneously interrogate thousands of transcripts has led to important advances in a wide range of biological problems. • Identification of differential expressed genes. • New insights into developmental processes, pharmacogenomics responses. • Evolution of gene regulation

  5. Limitation of DNA Microarrays • Post-transcriptional regulations: not all mRNAs are translated. • Seriously limit the detection of RNA splice patterns and previously unmapped genes. • Lack of synchronization: translation rates, alternative splicing, translation lag

  6. Outline • DNA Microarrays • RNA-Seq Overview • Experimental Design • Normalization of RNA-Seq Data • RNA-Seq Global Data Properties • Identifying Differentially Expressed Genes • Comparison of Results Across Technologies

  7. RNA-SeqOverview • High-throughput sequencing technology for sequencing RNAs (actually cDNAs which contain the RNAs' content) • Avoids the need for bacterial cloning of the cDNA input. • The resulting sequence reads are individually mapped to the source genome and counted reads to obtain the number and density of reads corresponding to RNA from each known exon.

  8. RNA-SeqMotivation • Allows researchers to obtain information like: • gene/transcript/exon expressions • alternative splicing • gene fusions • post-transcriptional mutations • single nucleotide variations

  9. RNA-SeqDetails • Briefly, long RNAs are first converted into a library of cDNA fragments through either RNA fragmentation or DNA fragmentation. Sequencing adaptors (blue) are subsequently added to each cDNA fragment and a short sequence is obtained from each cDNA using high-throughput sequencing technology. The resulting sequence reads are aligned with the reference genome or transcriptome, and classified as three types: exonic reads, junction reads and poly(A) end-reads. These three types are used to generate a base-resolution expression profile for each gene

  10. RNA-SeqDetails

  11. Alternative Splicing

  12. Fusion Gene • A fusion gene is a hybrid gene formed from two previously separate genes. It can occur as the result of a translocation, interstitial deletion, or chromosomal inversion. Often, fusion genes are oncogenes. • Alarge set reads would be expected to map to known exon-exon junctions. The remaining unmapped short reads would then be further analyzed to determine whether they match an exon-exon junction where the exons come from different genes.

  13. Outline • DNA Microarrays • RNA-Seq Overview • Experimental Design • Normalization of RNA-Seq Data • RNA-Seq Global Data Properties • Identifying Differentially Expressed Genes • Comparison of Results Across Technologies

  14. Experimental Design

  15. Outline • DNA Microarrays • RNA-Seq Overview • Experimental Design • Normalization of RNA-Seq Data • RNA-Seq Global Data Properties • Identifying Differentially Expressed Genes • Comparison of Results Across Technologies

  16. Histograms of the distribution of reads across genes

  17. Histograms of the distribution of reads across genes

  18. Affymetrix probe intensity distribution

  19. Normalization Method: • Reads that fell onto exons were summed up for each locus and normalized by the predicted mRNA length into expanded exonic read density. • Reads per kilo base million reads (RPKM): • C is the number of mappable reads that fell onto the gene’s exons • N is the total number of mappable reads in the experiment • L is the sum of exons in base pairs

  20. Outline • DNA Microarrays • RNA-Seq Overview • Experimental Design • Normalization of RNA-Seq Data • RNA-Seq Global Data Properties • Identifying Differentially Expressed Genes • Comparison of Results Across Technologies

  21. RNA-Seq Global Data Properties • From the technical replicate data, the Spearman is 0.96. Such result suggest that RNA-Seq has high reproducibility.

  22. RNA-Seq Global Data Properties • Distribution of uniquely mappable reads onto gene parts in the liver sample. 93% of the reads fall onto exon or the predicted exons regions, 4% of the reads falls onto introns, 3% in intergenic regions.

  23. Outline • DNA Microarrays • RNA-Seq Overview • Experimental Design • Normalization of RNA-Seq Data • RNA-Seq Global Data Properties • Identifying Differentially Expressed Genes • Comparison of Results Across Technologies

  24. Identifying Differentially Expressed Genes • Affymetrix Microarray: T-test • RNA-Seq: Likelihood ratio test Each procedure leads to a P-value for each gene. The significance threshold to control FDR at a give value was calculated using the method of Storey and Tibshirani (2003).

  25. Likelihood ratio test Given samples from group A and group B, estimate , and for the Poisson model, where is the average of all the samples. Now, compute the likelihood ratio as follows, ) Where , , and are samples in A, samples in B and all samples. Pois The probability that there are exactly k occurrences

  26. Outline • DNA Microarrays • RNA-Seq Overview • Experimental Design • Normalization of RNA-Seq Data • RNA-Seq Global Data Properties • Identifying Differentially Expressed Genes • Comparison of Results Across Technologies

  27. Comparing counts from Illumina sequencing with normalized intensities from the array

  28. Comparing counts from Illumina sequencing with normalized intensities from the array • Compare the number of sequence reads mapped to each gene with the corresponding (normalized) absolute intensities from the array. • These two independent measures of transcript abundance are highly correlated (Spearman correlation =0.73 for liver, 0.75 for kidney). • The array intensities are large and the sequence counts small.

  29. Comparison of estimated log2 fold changes from Illuminaand Affymetrix • DE by ILM • Red >250 • Green <250 • Black Not DE by ILM

  30. Comparison of Results Across Technologies

  31. Summary

  32. Question ?

More Related