1 / 11

Scalable Algorithms for Next-Generation Sequencing Data Analysis

Scalable Algorithms for Next-Generation Sequencing Data Analysis. Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering. Next Generation Sequencing. Illumina HiSeq. Roche/454. SOLiD 5500. Ion Proton. PacBio RS. Oxford Nanopore.

elvis
Download Presentation

Scalable Algorithms for Next-Generation Sequencing Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Algorithms for Next-Generation Sequencing Data Analysis Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering

  2. Next Generation Sequencing Illumina HiSeq Roche/454 SOLiD 5500 Ion Proton PacBio RS Oxford Nanopore

  3. Ongoing Projects • Transcriptome Analysis • Transcriptome quantification and differential expression analysis • Computational deconvolution of heterogeneous samples • Transcriptome and meta-transcriptome assembly • Viral quasispecies • Quasispecies reconstruction from NGS reads • IBV evolution and vaccine optimization • Transmission graphs • Immunoinformatics • Genomics-guided immunotherapy • Deep panning for early cancer detection • Sequencing error correction, genome assembly and scaffolding, metabolomics, biomarker selection, … • More info & software at http://dna.engr.uconn.edu

  4. Transcriptome Quantification • IsoEM algorithm for isoform expression estimation • - Incorporates fragment length distribution, hexamer bias correction, … Ion Torrent MAQC datasets A B C A C • RNA-PhASE pipeline for allele-specific isoform expression

  5. Differential Expression • Fast estimation enables the use of accurate bootstrapping-based methods MAQC 454 datasets UHRR SRX002934 vs HBRR SRX002935

  6. Computational Deconvolution of Heterogeneous Samples • Goal: characterization expression of mesoderm progenitor cells • Whole-transcriptome expression data for NSB cell mixtures + single-cell qPCR data for few genes • Three step approach • Cluster of single cell qPCR data and infer “reduced” cell type signatures • Infer mixing proportions based on reduced signatures using quadratic programming • Infer full expression signatures based on mixing proportions, solving one quadratic program per gene

  7. Reference-Guided Transcriptome Reconstruction 1 2 3 4 5 6 7 1 2 3 4 5 6 7 t1 : 1 3 4 5 6 7 t2 : 1 2 3 4 5 7 t3 : 1 3 4 5 7 t4 :

  8. TRIP: TransciptomeReconstruction using Integer Programming • Select the smallest set of putative transcripts that yields a good statistical fit between • empirically determined during library preparation • implied by “mapping” read pairs 500 1 2 3 200 200 200 Mean : 500; Std. dev. 50 300 1 3 Mean : 500; Std. dev. 50 200 200

  9. De Novo (Meta)Transcriptome Assembly of BugulaNeritina and its Symbiont • Uncultured bacterial symbiont produces bryostatins • - Symbiont absent in Northern Atlantic populations

  10. De Novo (Meta)Transcriptome Assembly of BugulaNeritina and its Symbiont • Developing scalable multi-sample meta transcriptome assembly pipeline based on differential-coverage clustering of reads

  11. Acknowledgements Sahar Al Seesi Abdul Banday Amir Bayegan Gabriel Ilie Caroline Jakuba James Lindsay Rahul Kanadia Craig Nelson Marius Nicolae Adrian Caciula Nicole Lopanik SergheiMangul Yvette TemateTiagueu Alex Zelikovsky

More Related