1 / 78

NESCENT : NGS : Measuring expression

NESCENT : NGS : Measuring expression . Jen Taylor Bioinformatics Team CSIRO Plant Industry. Measuring Expression. What & Why What is expression and why do we care? How Platforms / Technology Closed approaches – Microarray Open approaches - Sequencing Experimental Design Analysis

taber
Download Presentation

NESCENT : NGS : Measuring expression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

  2. Measuring Expression • What & Why • What is expression and why do we care? • How • Platforms / Technology • Closed approaches – Microarray • Open approaches - Sequencing • Experimental Design • Analysis • Biases • Bioinformatics • Statistical Issues and Analysis • In action • Workshop – Detection of Differential Expression • Case Studies in Plant functional genomics CSIRO. Nescent August 2011 - Measuring Expression

  3. mRNA tRNA rRNA siRNA DNA microRNA piRNA tasiRNA lncRNA What is expression / transcriptome ? CSIRO. Nescent August 2011 - Measuring Expression

  4. Beyond the Genome: 1995 Human Genome sequencing begins in earnest “Mapping the Book of Life” 2000 - First Draft 2003 - Essential Completion = approx 140, 000 genes = 30, 000 – 40,000 genes ?? = 24, 195 genes !!!??? Commemorative stained glass window for F.C. Crick, designed by Maria McClafferty.(Photograph: Paul Forster) Gonville & Caius College, Cambridge, UK. CSIRO. Nescent August 2011 - Measuring Expression

  5. “The failure of the human genome” “despite more than 700 genome-scanning publications and nearly $100bn spent, geneticists still had not found more than a fractional genetic basis for human disease “ Manolio et al., Nature, 2009 “The most likely explanation for why genes for common diseases have not been found is that, with few exceptions, they do not exist. …., if inherited genes are not to blame for our commonest illnesses, can we find out what is? “ Guardian, 2011 CSIRO. Nescent August 2011 - Measuring Expression

  6. Complexity Regulation Transcriptome Beyond the Genome: Gene Number ≠ Complexity Gene Commemorative stained glass window for F.C. Crick, designed by Maria McClafferty.(Photograph: Paul Forster) Gonville & Caius College, Cambridge, UK. CSIRO. Nescent August 2011 - Measuring Expression

  7. Why the expression ? High-throughput friendly Genome Predicts Biology ** Regulatory network Transcriptome Context dependent Proteome **Li et al., 2004 CSIRO. Nescent August 2011 - Measuring Expression

  8. Measuring Expression ? Parts Description • Function? • Interconnectedness? Comparisons • Population - level • Between genomes CSIRO. Nescent August 2011 - Measuring Expression

  9. Measuring Expression ? • What are important members of a transcriptome? • mRNA • polyadenylated, coding • alternatively spliced • Noncoding RNA (small RNA) • varying lengths, functions (18 – 32 bases) • microRNA, siRNA, piRNA, tasiRNA, long non-coding RNA • “Dark” RNA • transcription outside of annotated genes • Non-polyadenylated • Anti-sense transcription CSIRO. Nescent August 2011 - Measuring Expression

  10. Measuring Expression ? • How does the transcriptome vary to give rise to phenotype ? • Changes in Abundance • Abundance = Rate of Transcription – Rate of Decay • Changes in Function • Availability for function – polyadenylation, silencing, localisation • Suitability for function – alternate splicing CSIRO. Nescent August 2011 - Measuring Expression

  11. How to measure Expression PLATFORMS / TECHNOLOGY CSIRO. Nescent August 2011 - Measuring Expression

  12. Measuring Expression : platforms • Closed systems – microarray • Probes immobilised on a substrate profile target species in the transcriptome CSIRO. Nescent August 2011 - Measuring Expression

  13. CSIRO. Nescent August 2011 - Measuring Expression

  14. Single colour Probe Library Labelling Sample A Two colour Labelling Array Experimental Control Single and two colour arrays Hybridisation Array Manufacture Scanning CSIRO. Nescent August 2011 - Measuring Expression

  15. Array profiling Affymetrix Array Targets • Arabidopsis Genome 24,000 • C. elegans Genome 22,500 • Drosophila Genome 18, 500 • E. coli Genome 20, 366 • Human Genome U133 Plus 47,000 • Mouse Genome 39, 000 • Yeast Genome • S.cerevisiae 5, 841 • S. pombe 5, 031 • Rat Genome 30, 000 • Zebrafish 14, 900 • Plasmodium / Anopheles • P. faciparum 4,300 • A. gambiae 14,900 • Barley (25,500), Soybean (37,500 + 23,300 pathogen), Grape (15,700) • Canine (21,700), Bovine (23,000) • B.subtilis (5,000), S. aureus (3,300 ORFS), Xenopus (14, 400) CSIRO. Nescent August 2011 - Measuring Expression

  16. CSIRO. Nescent August 2011 - Measuring Expression

  17. CSIRO. Nescent August 2011 - Measuring Expression

  18. Closed System – Microarray • Pros • High-throughput • Targeted profiling • Inexpensive – “population friendly” • Analytical methods are standardised • Negative • “Closed system” , novel = invisible • Difficult to see allelle-specific expression • Biases due to hybridisation • SNPs • Competitive and non-specific hybridisation CSIRO. Nescent August 2011 - Measuring Expression

  19. Open systems – RNA Sequencing Technology: • Illumina • SOLiD, IonTorrent • 454 Pros: • Transcript discovery • Allelic expression • High resolution abundance measures Cons: • Analysis can be complex • Expensive • Sensitivity is sequencing depth dependent CSIRO. Nescent August 2011 - Measuring Expression

  20. RNA Sequencing Mortazavi et al., 2008 CSIRO. Nescent August 2011 - Measuring Expression

  21. RNASeq - Correspondence • Range > 5 orders of magnitude • Better detection of low abundance transcripts Marioni et al., 2009 CSIRO. Nescent August 2011 - Measuring Expression

  22. Platform Choice / Sample Preparation Choice What do you want to profile ? • Polyadenylated • PolyA RNA extraction • Small RNA (< 100 bases) • Size filtering by gel • Strand-specific • RNA – Protein Interactions • RNA Immunoprecipitation (IP) CSIRO. Nescent August 2011 - Measuring Expression

  23. RNASeq - Workflow Sample Total RNA PolyA RNA Small RNA Mapping to Genome Differential Expression SNP detection Transcript structure Secondary structure Targets or Products Library Construction Assembly to Contigs Sequencing Base calling & QC CSIRO. Nescent August 2011 - Measuring Expression

  24. Illumina RNASeq : TruSeq CSIRO. Nescent August 2011 - Measuring Expression

  25. Small RNA sequencing Small RNA smallRNA separation: PAGE 134 110 75 25 • small RNA < 35bp CSIRO. Nescent August 2011 - Measuring Expression

  26. Strand - specificity Using adaptors Using chemical modification Ligation : 3’ and 5’ adaptors added sequentially dUTP : Addition and removal after selection SMART : addition of C’s on 5’ end Levin et al., 2010 CSIRO. Nescent August 2011 - Measuring Expression

  27. Levin et al., 2010 CSIRO. Nescent August 2011 - Measuring Expression

  28. Non-polyA methods • Total RNA extraction • Ribosomal RNA and tRNA > 95-97% of total RNA • Ribosomal reduction methods • Subtractive hybridisation with rRNA probes • Exonuclease cleave of rRNA • NuGen – “proprietary combination of reverse transcriptase and primers in the Ovation RNA-Seq System” • cDNA normalisation methods • Partial digestion of any highly abundant species (Evrogen) CSIRO. Nescent August 2011 - Measuring Expression

  29. Platform Choice / Sample Preparation Choice What do you want to profile ? • Polyadenylated • PolyA RNA extraction • Small RNA (< 100 bases) • Size filtering by gel • Strand-specific • RNA – Protein Interactions • RNA Immunoprecipitation (IP) • Non - PolyA • rRNA reduction CSIRO. Nescent August 2011 - Measuring Expression

  30. EXPERIMENTAL DESIGN and ANALYSIS CSIRO. Nescent August 2011 - Measuring Expression

  31. RNASeq Experimental Design • Issues: • sequencing depth - how much ? • number of replicates – how many ? • Aims of the data : • Transcriptome assembly / transcript characterisation • Maximise depth • Detection of differential expression (denovo or reference) • Balance depth and replication CSIRO. Sequencing Depth V.S. Number of Replicates

  32. Library 1 Library 2 Library 3 Library 4 Multiplex Lane 1 L1 L2 L3 L4 25% lane / sample Defining Replicates • Technical Replicates • Biological Replicates Individual Individual 1 Individual 2 , Library 1 Library 2 Library 1 Library 2 Lane 1 Lane 2 Lane 3 Lane 4 Lane 1 Lane 2 Depth = 2 x 100% lane / sample 100% lane / sample CSIRO. Sequencing Depth V.S. Number of Replicates

  33. CSIRO. Sequencing Depth V.S. Number of Replicates

  34. Coverage Depth CSIRO. Sequencing Depth V.S. Number of Replicates

  35. Number of Replicates • edgeR <= 0.01 , DESeq <= 0.01 More information in biological replicates than depth For differential expression CSIRO. Sequencing Depth V.S. Number of Replicates

  36. RNASeq Analysis • Overall Aim : • To get an accurate measurement of transcript abundance, structure and identity • Biases and Compositions • Alignment • TopHat / Cufflinks • Assembly • ABySS CSIRO. Nescent August 2011 - Measuring Expression

  37. Assumptions Every transcript / k-mer has equal chance of being sequenced No. sequences observed ≈ transcript abundance Gene A = z Reads / million Gene B = y Reads / million z = 2 x y Gene A> Gene B CSIRO. Nescent August 2011 - Measuring Expression

  38. Length Bias Oshlack and Wakefield, 2009 CSIRO. Nescent August 2011 - Measuring Expression

  39. Alignment Bias CSIRO. Nescent August 2011 - Measuring Expression

  40. Alignment Bias CSIRO. Nescent August 2011 - Measuring Expression

  41. Sequencing Bias Hansen et al., 2010 CSIRO. Nescent August 2011 - Measuring Expression

  42. Bias Every transcript / k-mer has equal chance of being sequenced No. sequences observed ≈ transcript abundance Gene A = z Reads / million / kb Gene B = y Reads / million / kb • Weighting schemas (e.g. Cufflinks) : • Mapability • kmer / fragment frequencies CSIRO. Nescent August 2011 - Measuring Expression

  43. Bias Every transcript / k-mer has equal chance of being sequenced No. sequences observed ≈ transcript abundance Sample A vs Sample B Gene A1 = z Reads per million Gene A2 = y Reads per million z = 2 x y CSIRO. Nescent August 2011 - Measuring Expression

  44. Read density variability CSIRO. Nescent August 2011 - Measuring Expression

  45. RNASeq – Compositional properties Depth of Sequence • Sequence count ≈ Transcript Abundance • Majority of the data can be dominated by a small number of highly abundant transcripts • Ability to observe transcripts of smaller abundance is dependent upon sequence depth • Fixed budget of reads CSIRO. Nescent August 2011 - Measuring Expression

  46. Expected counts Expected counts sample I A 1000 2000 B 1000 2000 C 1000 D 1000 A simple example – compositional bias Sequencing budget / depth: 4000 reads sample II A B CSIRO. Nescent August 2011 - Measuring Expression

  47. Soil diversity by phylogenetic analysis - Phylum level 454-sequence analysis of bacterial 16S rRNA gene ~410,000 sequences Recognized bacterial phyla A B C 0% 20% 40% 60% 80% 100% % distribution A. Richardson, CSIRO CSIRO. Nescent August 2011 - Measuring Expression

  48. RNASeq Bioinformatics Analysis • Aims: • To get an accurate measurement of transcript abundance, structure and identity • Biases and Compositions • Relative abundances NOT absolute • Alignment • TopHat • Assembly • ABySS CSIRO. Nescent August 2011 - Measuring Expression

  49. RNA Sequencing analysis Sequence Data Genome? Assembly Alignment Contigs Read Density Differential Expression SNPs Transcript Characterisation CSIRO. Nescent August 2011 - Measuring Expression

  50. RNASeq – Alignment Considerations • Reads with multiple locations • Discard / Random Allocation • Clustering - local coverage • Weighting • Reads Spanning Exons • Make and align to exon junction libraries • Denovo junction detection • Summarisation of counts • Exons • Transcript boundaries • Inferred read boundaries CSIRO. Nescent August 2011 - Measuring Expression

More Related