1 / 19

Reconstruction of Haplotype Spectra from NGS Data

Reconstruction of Haplotype Spectra from NGS Data. Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering University of Connecticut. Haplotype Spectra Reconstruction. Given NGS reads, reconstruct: Full length sequences

chaves
Download Presentation

Reconstruction of Haplotype Spectra from NGS Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reconstruction of Haplotype Spectra from NGS Data Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering University of Connecticut

  2. Haplotype Spectra Reconstruction • Given NGS reads, reconstruct: • Full length sequences • Sequence frequencies • Example applications: • Single individual haplotyping • Allele specific transcriptome reconstruction • Viral quasispecies reconstruction

  3. Single Individual Haplotyping • Somatic cells are diploid, containing two nearly identical copies of each autosomal chromosome • Heterozygous loci found by mapping reads to reference genome • Long haplotype fragments can be generated by sequencing fosmid pools [Duitama et al. 2012]

  4. RefHap Algorithm [Duitama et al. 12] • Reduce the problem to Max-Cut • Solve Max-Cut • Build haplotypes according with the cut f4 h1 00110 h2 11001 -1 1 3 f1 f2 1 -1 f3 Chr. 22, 32k SNPs, 14k fragments

  5. Haplotype Spectra Reconstruction • Given short sequence fragments, reconstruct: • Full length sequences • Sequence frequencies • Example applications: • Single individual haplotyping • Allele specific transcriptome reconstruction • Viral quasispecies reconstruction

  6. TranscriptomeReconstruction Challenge: Alternative Splicing [Griffith and Marra 07]

  7. 1 2 3 4 5 6 7 1 2 3 4 5 6 7 t1 : 1 3 4 5 6 7 t2 : 1 2 3 4 5 7 t3 : 1 3 4 5 7 t4 :

  8. TRIPTransciptomeReconstruction using Integer Programming • Map the RNA-Seq reads to genome • Construct Splice Graph - G(V,E) • V : exons • E: splicing events • Generate candidate transcripts • Depth-first-search (DFS) • Filter candidate transcripts • Fragment length distribution (FLD) • Integer programming Genome

  9. How to filter? • Select the smallest set of putative transcripts that yields a good statistical fit between • empirically determined during library preparation • implied by “mapping” read pairs 500 1 2 3 200 200 200 Mean : 500; Std. dev. 50 300 1 3 Mean : 500; Std. dev. 50 200 200

  10. Allele Specific Expression

  11. Haplotype Spectra Reconstruction • Given short sequence fragments, reconstruct: • Full length sequences • Sequence frequencies • Example applications: • Single individual haplotyping • Allele specific transcriptome reconstruction • Viral quasispecies reconstruction

  12. RNA Virus Replication High mutation rate (~10-4) Lauring & Andino, PLoS Pathogens 2011

  13. Shotgun vs. Amplicon Reads • Shotgun reads starting positions distributed ~uniformly • Amplicon reads have predefined start/end positions covering fixed overlapping windows

  14. Reconstruction from Shotgun Reads: ViSpA Read Error Correction Read Alignment Preprocessing of Aligned Reads Shotgun reads Frequency Estimation Read Graph Construction Contig Assembly Quasispecies sequences w/ frequencies

  15. Reconstruction from Amplicon Reads: VirA Error-correctedSAM/BAM Read data Amplicon Read Graph Estimate Amplicons Reference in FASTAformat Viral population variants with frequencies Max-Bandwidth Paths Frequency Estimation

  16. Amplicon Read Graph • K amplicons represented by K-layer read graph • Vertices ⇔ distinct reads • Edges ⇔ reads with consistent overlap • Vertices have count function c(v)

  17. Read Graph Transformation • Heuristic to reduce edges in dense graphs • Replace bipartite cliques with star subgraphs

  18. Challenges • Scalability • Exploit inherent sparsity of biological instances • E.g., exact scaffolding algorithm using non-serial dynamic programming based on SPQR trees • Flexibility • Long (noisy) reads + short • Heterogeneous data, e.g., RNA-Seq + TSSeq + PolyA-Seq • Quantifying reconstruction uncertainty • Compute intensive, e.g., bootstrapping + + - - + - + -

  19. Acknowledgements Sahar Al Seesi Mazhar Kahn Rachel O’Neill Alexander Artyomenko Adrian Caciula Nicholas Mancuso SergheiMangul BassamTork Alex Zelikovsky Jorge Duitama Irina Astrovskaya PavelSkums

More Related