80 likes | 111 Views
Explore the dynamic transcriptome via NGS workflows. Learn about sequencing steps, assembly, data formats, and downstream analysis. Discover the nuances of RNA sequencing depth and variations between tissues and over time.
E N D
NGS Transcriptomic WorkflowsHugh Shanahan & Jamie al-NasirRoyal Holloway, University of London
Setting the scene • Transcriptome – total sequence and abundance of RNA generated by a cell • RNA is transcribed from DNA • Genome is fixed for a organism • Transcriptome is dynamic • Variation between tissues • Variation over time • RNA transcripts are 1,000’s-10,000 bases in length
Interested in • How many copies of a particular transcript are there • What is the sequence • - sequence comes from genome but alternative splicing means a transcript may not just be a contiguous block of DNA
Sequencing steps • Fragment transcripts into shorter pieces (reads) • 100-300 bases longs • Have many overlapping reads • Amplify (make lots of copies of) the short reads • Can sequence these short reads and then assemble them to reconstruct transcripts. • Size of data set depends on size of transcriptome but also amount of fragmentation (sequencing depth) • Can either assemble with a reference genome or de novo (very hard)
Final points • File formats have been updated to binary – used to use flat text so sizes were huge (Reference Genome – 39 Gbyte -> 0.8 Gybte) • Raw image data is actually discarded • Discussions focusses on assembly and down-stream analysis • Much of this data is deposited in the Sequence Read Archive (SRA) • We’ve papered over everything that happens before sequencing – i.e. the biochemical steps carried out • This is highly variable • These steps are not properly annotated