1 / 39

Next Generation Sequencing and its data analysis challenges

Next Generation Sequencing and its data analysis challenges. Background Alignment and Assembly Applications Genome Epigenome Transcriptome. References.

vanida
Download Presentation

Next Generation Sequencing and its data analysis challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

  2. References This lecture is about the opportunities and challenges, not detailed statistical techniques. The materials are taken from some review articles. Cell 2013, 155:27 Cell 2013, 155:39 Annu. Rev. Plant Biol. 2009, 60: 305. Annu. Rev. Genomics Hum. Genet. 2009, 10:135. Curr. Opin. Biotechnology, 24:22. Nat. Biotech. 2009, 25:195. Nat. Methods. 2009, 6:S6. Nat. Rev. Genet. 2009, 10:669. Nat Rev Genet. 2010 Jan;11(1):31-46. Genomics. 2010 Jun;95(6):315-27.

  3. Background “Method of the year” 2007 by Nature Methods. The name: “Next generation sequencing” “Deep sequencing” “High-throughput sequencing” “Second-generation sequencing” The key characteristics: Massive parallel sequencing amount of data from a single run ~ amount of data from the human genome project The reads are short ~ a few hundred bases / read

  4. Background Potential impact: The “$1000 genome” will become reality very soon Genome sequencing will become a regular medical procedure. Personalized medicine Predictive medicine Ethical issues For statisticians: Data mining using hundreds of thousands of genomes Finding rare SNPs/mutations associated with diseases New methods to analyze epigeomics/transcriptomics data Finding interventions to improve life quality

  5. Background The companies use different techniques. We use Illumina’s as an example. (http://seqanswers.com/forums/showthread.php?t=21)

  6. Background

  7. Background

  8. Background

  9. Background An incomplete list of some common platforms. BMC Genomics 2012, 13:341

  10. Background

  11. Background Advantages: Fast and cost effective. No need to clone DNA fragments. Drawbacks: Short read length (platform dependent) Some platforms have trouble on identical repeats Non-uniform confidence in base calling in reads. Data less reliable near the 3’ end of each read.

  12. Background What deep sequencing can do:

  13. Background Nat Methods. 2009 Nov;6(11 Suppl):S2-5.

  14. Alignment and Assembly Sequence the genome of a person? --- Alignment Can rely on existing human genome as a blue print. Align the short reads onto the existing human genome. Need a few fold coverage to cover most regions. Sequence a whole new genome? --- Assembly Overlaps are required to construct the genome. The reads are short  need ~30 fold coverage. If 3G data per run, need 30 runs for a new genome similar to human size.

  15. Alignment and Assembly Hash table-based alignment. Similar to BLAST in principle. (1) Find potential locations: (2) Local alignment.

  16. Alignment and Assembly From read to graph:

  17. Alignment and Assembly

  18. Alignment and Assembly de Bruijn graph assembly Red: read error.

  19. Alignment and Assembly de Bruijn graph assembly

  20. Alignment and Assembly de Bruijn graph assembly

  21. Whole gnome/exome/transcriptome sequencing

  22. Genomics Whole genome sequencing detects all variants (SNP alleles, rare variants, mutations) Could be associated with disease: Rare variants (burden testing by collapsing by gene) De novo mutations (need family tree) Rare Mendelian disorders Structural variants in cancer

  23. Medical Genomics Example: Extreme-case sequencing to find rare variants associated with a disease. Nature Reviews Genetics 11, 415

  24. Medical Genomics Example: Cancer genome

  25. Epigenomics http://www.roadmapepigenomics.org/

  26. ChIP-Seq ChIP-Seq. Purpose: analyze which part of the DNA sequence bind to a certain protein. Transcription factor (Regulome) Modified histone (Epigenome)

  27. ChIP-Seq Overall ChIP-Seq workflow

  28. ChIP-Seq Before deep sequencing, the same information was obtained by using array in the place of sequencing.

  29. ChIP-Seq

  30. ChIP-Seq Different kind of profiles in different applications. Elongation Silencing

  31. ChIP-Seq Example of active gene chromatin pattern found by ChIP-Seq. Initiation site Elongation

  32. RNA-Seq

  33. RNA-Seq

  34. RNA-Seq Deep sequencing provides more information about each mRNA

  35. RNA-Seq Finding novel exons. Splicing? (short read could be an issue.)

  36. RNA-Seq Gene expression profiling – to replace arrays? Exon-specific abundance.

  37. RNA-Seq Sequencin small RNA.

  38. RNA-Seq Quantification of miRNA and de novo detection of miRNAs MicroRNA: 21-23 in length. Regulate gene expression by complementary binding . Derived from non-coding RNAs that form stem-loop structure.

  39. RNA-Seq Directly probe mRNA targets of miRNA.

More Related