1 / 48

Genotype and Haplotype Reconstruction from Low-Coverage Short Sequencing Reads

Genotype and Haplotype Reconstruction from Low-Coverage Short Sequencing Reads. Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint work with S. Dinakar, J. Duitama, Y. Hernández, J. Kennedy, and Y. Wu. Outline. Introduction Single SNP Genotype Calling

jbennington
Download Presentation

Genotype and Haplotype Reconstruction from Low-Coverage Short Sequencing Reads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genotype and Haplotype Reconstruction from Low-Coverage Short Sequencing Reads Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint work with S. Dinakar, J. Duitama, Y. Hernández, J. Kennedy, and Y. Wu

  2. Outline Introduction Single SNP Genotype Calling Multilocus Genotyping Problem Experimental Results Conclusion

  3. Ultra-high throughput DNA sequencing • Recent massively parallel sequencing technologies deliver orders of magnitude higher throughput compared to classic Sanger sequencing Roche/454 FLX Titanium 400bp reads 400-600Mb/10h run ABI SOLiD 3 35-50bp reads 5-7.5Gb/3.5-7 day run Helicos HeliScope 25-55bp reads ~2.5Gb/day Illumina Genome Analyzer II 35-75bp reads 2-3Gb/2 day run

  4. C.Venter Sanger@7.5x J. Watson 454@7.4x NA18507 Illumina@36x SOLiD@12x UHTS enables personal genomics

  5. Sequencing can potentially provide all genetic variations (SNPs, CNVs, genome rearrangements) at single-base resolution… However, medical use requires determination of both alleles(genotype) at variable loci Accurate genotype calling is limited by coverage depth due to random nature of shotgun sequencing For the Venter and Watson genomes (both sequenced at ~7.5x average coverage), comparison with SNP genotyping chips has shown only ~75% accuracy for sequencing based calls of heterozygous SNPs [Levy et al 07, Wheeler et al 08] Challenges for medical applications of sequencing

  6. Allele coverage for heterozygous SNPs (Watson 454 @ 5.85x avg. coverage)

  7. Allele coverage for heterozygous SNPs (Watson 454 @ 2.93x avg. coverage)

  8. Allele coverage for heterozygous SNPs (Watson 454 @ 1.46x avg. coverage)

  9. Prior work • Most prior genotype calling methods are based on allele coverage • [Levy et al 07] and [Wheeler et al 08] require that each allele be covered by at least 2 reads in order to be called • Combined with hypothesis testing based on the binomial distribution when calling hets • Binomial probability for the observed number of alleles must be at least 0.01 • [Wendl&Wilson 08] generalize coverage methods to allow an arbitrary minimum allele coverage k

  10. Prior work (contd.) • MAQ [Li,Ruan&Durbin 08] • Widely used read mapping program • Single SNP genotype calling incorporating read mapping confidence and quality scores • Mostly tuned for de novo SNP discovery…

  11. [Wendl&Wilson 08] estimate that 21x coverage will be required for sequencing of normal tissue samples based on idealized theory that “neglects any heuristic inputs” What coverage is required?

  12. We propose methods incorporating additional sources of information extracted from a reference panel such as Hapmap: Allele/genotype frequencies Linkage disequilibrium Experimental results show significantly improved genotyping accuracy Do heuristic inputs help?

  13. Outline Introduction Single SNP Genotype Calling Multilocus Genotyping Problem Experimental Results Conclusion

  14. Known SNP positions Biallelic SNPs 0 = major allele, 1 = minor allele SNP genotypes: 0/2 = homozygous major/minor, 1=heterozygous Basic assumptions

  15. ri = set of mapped reads covering SNP locus i For each read r in ri r(i) = the allele observed at locus i = probability that r(i) is incorrect, where qr(i) is the phred quality score of r(i) mr= mapping confidence of r 012100120 Inferred genotypes Incorporating base call and read mapping uncertainty Mapped reads with allele 0 Mapped reads with allele 1 Sequencing errors

  16. ri = set of mapped reads covering SNP locus i For each read r in ri r(i) = the allele observed at locus i = probability that r(i) is incorrect, where qr(i) is the phred quality score of r(i) mr= mapping confidence of r Incorporating base call and read mapping uncertainty

  17. Applying Bayes’ formula: Where are genotype frequencies inferred from a representative panel Single SNP genotype calling

  18. Outline Introduction Single SNP Genotype Calling Multilocus Genotyping Problem Experimental Results Conclusion

  19. Haplotype structure in human populations

  20. HMM model of haplotype frequencies • Similar models proposed in [Schwartz 04, Rastas et al. 05, Kennedy et al. 07, Kimmel&Shamir 05, Scheet&Stephens 06]

  21. F1 F2 Fn H1 H2 Hn Graphical Model Representation • Random variables • Fi = founder haplotype at locus i • Hi = observed allele at locus i • For fully specified model and given haplotype h, P(H=h|M) can be computed in O(nK2) using forward algorithm, where n=#SNPs, K=#founders

  22. P(f1), P(f’1), P(fi+1|fi), P(f’i+1|f’i), P(hi|fi), P(h’i|f’i) trained using Baum-Welch algorithm on haplotypes inferred from the populations of origin for mother/father HF-HMM for multilocus genotype inference … F1 F2 Fn H1 H2 Hn … F'1 F'2 F'n H'1 H'2 H'n G1 G2 Gn R1,1 … R1,c R2,1 … R2,c Rn,1 … Rn,c n 1 2

  23. P(gi|hi,h’i) set to 1 if h+h’i=gi and to 0 otherwise HF-HMM for multilocus genotype inference … F1 F2 Fn H1 H2 Hn … F'1 F'2 F'n H'1 H'2 H'n G1 G2 Gn R1,1 … R1,c R2,1 … R2,c Rn,1 … Rn,c n 1 2

  24. HF-HMM for multilocus genotype inference … F1 F2 Fn H1 H2 Hn … F'1 F'2 F'n H'1 H'2 H'n G1 G2 Gn R1,1 … R1,c R2,1 … R2,c Rn,1 … Rn,c n 1 2

  25. Multilocus genotyping problem • GIVEN: • Shotgun read sets r=(r1, r2, … , rn) • Trained HMM models representing LD in populations of origin for mother/father • Quality scores & read mapping confidence values • FIND: • Multilocus genotype g*=(g*1,g*2,…,g*n) with maximum posterior probability, i.e., g*=argmaxg P(g | r)

  26. Computational complexity Idea: reduction from the clique problem Theorem: maxgP(g | r) cannot be approximated within unless ZPP=NP

  27. Posterior decoding algorithm For each i = 1..n, compute Return

  28. Forward-backward computation fi … … hi f’i … … h’i gi r1,c ri,1 ri,c r1,1 … … Rn,1 … Rn,c 1 n i

  29. Forward-backward computation fi … … hi f’i … … h’i gi r1,c ri,1 ri,c r1,1 … … Rn,1 … Rn,c 1 n i

  30. Forward-backward computation fi … … hi f’i … … h’i gi r1,c ri,1 ri,c r1,1 … … Rn,1 … Rn,c 1 n i

  31. Forward-backward computation fi … … hi f’i … … h’i gi r1,c ri,1 ri,c r1,1 … … Rn,1 … Rn,c 1 n i

  32. Forward-backward computation fi … … hi f’i … … h’i gi r1,c ri,1 ri,c r1,1 … … Rn,1 … Rn,c 1 n i

  33. Runtime reduced to O(nK3) by reusing common terms: where Runtime • Direct recurrences for computing forward probabilities:

  34. Outline Introduction Single SNP Genotype Calling Multilocus Genotyping Problem Experimental Results Conclusion

  35. … … … … … … Pipeline for LD-Based Genotype Calling Mapped reads & confidence values Reference genome sequence Read sequences >gnl|ti|1779718824 name:EI1W3PE02ILQXT TCAGTGAGGGTTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTTGAGACAGAATTTTGCTCTT GTCGCCCAGGCTGGTGTGCAGTGGTGCAACCTCAGCTCACTGCAACCTCTGCCTCCAGGTTCAAGCAATT CTCTGCCTCAGCCTCCCAAGTAGCTGGGATTACAGGCGGGCGCCACCACGCCCAGCTAATTTTGTATTGT TAGTAAAGATGGGGTTTCACTACGTTGGCTGAGCTGTTCTCGAACTCCTGACCTCAAATGAC >gnl|ti|1779718825 name:EI1W3PE02GTXK0 TCAGAATACCTGTTGCCCATTTTTATATGTTCCTTGGAGAAATGTCAATTCAGAGCTTTTGCTCAGCTTT TAATATGTTTATTTGTTTTGCTGCTGTTGAGTTGTACAATGTTGGGGAAAACAGTCGCACAACACCCGGC AGGTACTTTGAGTCTGGGGGAGACAAAGGAGTTAGAAAGAGAGAGAATAAGCACTTAAAAGGCGGGTCCA GGGGGCCCGAGCATCGGAGGGTTGCTCATGGCCCACAGTTGTCAGGCTCCACCTAATTAAATGGTTTACA >gnl|ti|1779718824 name:EI1W3PE02ILQXT TCAGTGAGGGTTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTTGAGACAGAATTTTGCTCTT GTCGCCCAGGCTGGTGTGCAGTGGTGCAACCTCAGCTCACTGCAACCTCTGCCTCCAGGTTCAAGCAATT CTCTGCCTCAGCCTCCCAAGTAGCTGGGATTACAGGCGGGCGCCACCACGCCCAGCTAATTTTGTATTGT TAGTAAAGATGGGGTTTCACTACGTTGGCTGAGCTGTTCTCGAACTCCTGACCTCAAATGAC >gnl|ti|1779718825 name:EI1W3PE02GTXK0 TCAGAATACCTGTTGCCCATTTTTATATGTTCCTTGGAGAAATGTCAATTCAGAGCTTTTGCTCAGCTTT TAATATGTTTATTTGTTTTGCTGCTGTTGAGTTGTACAATGTTGGGGAAAACAGTCGCACAACACCCGGC AGGTACTTTGAGTCTGGGGGAGACAAAGGAGTTAGAAAGAGAGAGAATAAGCACTTAAAAGGCGGGTCCA GGGGGCCCGAGCATCGGAGGGTTGCTCATGGCCCACAGTTGTCAGGCTCCACCTAATTAAATGGTTTACA >gnl|ti|1779718824 name:EI1W3PE02ILQXT TCAGTGAGGGTTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTTGAGACAGAATTTTGCTCTT GTCGCCCAGGCTGGTGTGCAGTGGTGCAACCTCAGCTCACTGCAACCTCTGCCTCCAGGTTCAAGCAATT CTCTGCCTCAGCCTCCCAAGTAGCTGGGATTACAGGCGGGCGCCACCACGCCCAGCTAATTTTGTATTGT TAGTAAAGATGGGGTTTCACTACGTTGGCTGAGCTGTTCTCGAACTCCTGACCTCAAATGAC >gnl|ti|1779718825 name:EI1W3PE02GTXK0 TCAGAATACCTGTTGCCCATTTTTATATGTTCCTTGGAGAAATGTCAATTCAGAGCTTTTGCTCAGCTTT TAATATGTTTATTTGTTTTGCTGCTGTTGAGTTGTACAATGTTGGGGAAAACAGTCGCACAACACCCGGC AGGTACTTTGAGTCTGGGGGAGACAAAGGAGTTAGAAAGAGAGAGAATAAGCACTTAAAAGGCGGGTCCA GGGGGCCCGAGCATCGGAGGGTTGCTCATGGCCCACAGTTGTCAGGCTCCACCTAATTAAATGGTTTACA >gi|88943037|ref|NT_113796.1|Hs1_111515 Homo sapiens chromosome 1 genomic contig, reference assembly GAATTCTGTGAAAGCCTGTAGCTATAAAAAAATGTTGAGCCATAAATACCATCAGAAATAACAAAGGGAG CTTTGAAGTATTCTGAGACTTGTAGGAAGGTGAAGTAAATATCTAATATAATTGTAACAAGTAGTGCTTG GATTGTATGTTTTTGATTATTTTTTGTTAGGCTGTGATGGGCTCAAGTAATTGAAATTCCTGATGCAAGT AATACAGATGGATTCAGGAGAGGTACTTCCAGGGGGTCAAGGGGAGAAATACCTGTTGGGGGTCAATGCC CTCCTAATTCTGGAGTAGGGGCTAGGCTAGAATGGTAGAATGCTCAAAAGAATCCAGCGAAGAGGAATAT TTCTGAGATAATAAATAGGACTGTCCCATATTGGAGGCCTTTTTGAACAGTTGTTGTATGGTGACCCTGA AATGTACTTTCTCAGATACAGAACACCCTTGGTCAATTGAATACAGATCAATCACTTTAAGTAAGCTAAG TCCTTACTAAATTGATGAGACTTAAACCCATGAAAACTTAACAGCTAAACTCCCTAGTCAACTGGTTTGA ATCTACTTCTCCAGCAGCTGGGGGAAAAAAGGTGAGAGAAGCAGGATTGAAGCTGCTTCTTTGAATTTAC >gi|88943037|ref|NT_113796.1|Hs1_111515 Homo sapiens chromosome 1 genomic contig, reference assembly GAATTCTGTGAAAGCCTGTAGCTATAAAAAAATGTTGAGCCATAAATACCATCAGAAATAACAAAGGGAG CTTTGAAGTATTCTGAGACTTGTAGGAAGGTGAAGTAAATATCTAATATAATTGTAACAAGTAGTGCTTG GATTGTATGTTTTTGATTATTTTTTGTTAGGCTGTGATGGGCTCAAGTAATTGAAATTCCTGATGCAAGT AATACAGATGGATTCAGGAGAGGTACTTCCAGGGGGTCAAGGGGAGAAATACCTGTTGGGGGTCAATGCC CTCCTAATTCTGGAGTAGGGGCTAGGCTAGAATGGTAGAATGCTCAAAAGAATCCAGCGAAGAGGAATAT TTCTGAGATAATAAATAGGACTGTCCCATATTGGAGGCCTTTTTGAACAGTTGTTGTATGGTGACCCTGA AATGTACTTTCTCAGATACAGAACACCCTTGGTCAATTGAATACAGATCAATCACTTTAAGTAAGCTAAG TCCTTACTAAATTGATGAGACTTAAACCCATGAAAACTTAACAGCTAAACTCCCTAGTCAACTGGTTTGA ATCTACTTCTCCAGCAGCTGGGGGAAAAAAGGTGAGAGAAGCAGGATTGAAGCTGCTTCTTTGAATTTAC >gi|88943037|ref|NT_113796.1|Hs1_111515 Homo sapiens chromosome 1 genomic contig, reference assembly GAATTCTGTGAAAGCCTGTAGCTATAAAAAAATGTTGAGCCATAAATACCATCAGAAATAACAAAGGGAG CTTTGAAGTATTCTGAGACTTGTAGGAAGGTGAAGTAAATATCTAATATAATTGTAACAAGTAGTGCTTG GATTGTATGTTTTTGATTATTTTTTGTTAGGCTGTGATGGGCTCAAGTAATTGAAATTCCTGATGCAAGT AATACAGATGGATTCAGGAGAGGTACTTCCAGGGGGTCAAGGGGAGAAATACCTGTTGGGGGTCAATGCC CTCCTAATTCTGGAGTAGGGGCTAGGCTAGAATGGTAGAATGCTCAAAAGAATCCAGCGAAGAGGAATAT TTCTGAGATAATAAATAGGACTGTCCCATATTGGAGGCCTTTTTGAACAGTTGTTGTATGGTGACCCTGA AATGTACTTTCTCAGATACAGAACACCCTTGGTCAATTGAATACAGATCAATCACTTTAAGTAAGCTAAG TCCTTACTAAATTGATGAGACTTAAACCCATGAAAACTTAACAGCTAAACTCCCTAGTCAACTGGTTTGA ATCTACTTCTCCAGCAGCTGGGGGAAAAAAGGTGAGAGAAGCAGGATTGAAGCTGCTTCTTTGAATTTAC Quality scores >gnl|ti|1779718824 name:EI1W3PE02ILQXT 28 28 28 28 26 28 28 40 34 14 44 36 23 13 2 27 42 35 21 7 27 42 35 21 6 28 43 36 22 10 27 42 35 20 6 28 43 36 22 9 28 43 36 22 9 28 44 36 24 14 4 28 28 28 27 28 26 26 35 26 40 34 18 3 28 28 28 27 33 24 26 28 28 28 40 33 14 28 36 27 26 26 37 29 28 28 28 28 27 28 28 28 37 28 27 27 28 36 28 37 28 28 28 27 28 28 28 24 28 28 27 28 28 37 29 36 27 27 28 27 28 33 23 28 33 23 28 36 27 33 23 28 35 25 28 28 36 27 36 27 28 28 28 24 28 37 29 28 19 28 26 37 29 26 39 33 13 37 28 28 28 21 24 28 27 41 34 15 28 36 27 26 28 24 35 27 28 40 34 15 >gnl|ti|1779718824 name:EI1W3PE02ILQXT 28 28 28 28 26 28 28 40 34 14 44 36 23 13 2 27 42 35 21 7 27 42 35 21 6 28 43 36 22 10 27 42 35 20 6 28 43 36 22 9 28 43 36 22 9 28 44 36 24 14 4 28 28 28 27 28 26 26 35 26 40 34 18 3 28 28 28 27 33 24 26 28 28 28 40 33 14 28 36 27 26 26 37 29 28 28 28 28 27 28 28 28 37 28 27 27 28 36 28 37 28 28 28 27 28 28 28 24 28 28 27 28 28 37 29 36 27 27 28 27 28 33 23 28 33 23 28 36 27 33 23 28 35 25 28 28 36 27 36 27 28 28 28 24 28 37 29 28 19 28 26 37 29 26 39 33 13 37 28 28 28 21 24 28 27 41 34 15 28 36 27 26 28 24 35 27 28 40 34 15 >gnl|ti|1779718824 name:EI1W3PE02ILQXT 28 28 28 28 26 28 28 40 34 14 44 36 23 13 2 27 42 35 21 7 27 42 35 21 6 28 43 36 22 10 27 42 35 20 6 28 43 36 22 9 28 43 36 22 9 28 44 36 24 14 4 28 28 28 27 28 26 26 35 26 40 34 18 3 28 28 28 27 33 24 26 28 28 28 40 33 14 28 36 27 26 26 37 29 28 28 28 28 27 28 28 28 37 28 27 27 28 36 28 37 28 28 28 27 28 28 28 24 28 28 27 28 28 37 29 36 27 27 28 27 28 33 23 28 33 23 28 36 27 33 23 28 35 25 28 28 36 27 36 27 28 28 28 24 28 37 29 28 19 28 26 37 29 26 39 33 13 37 28 28 28 21 24 28 27 41 34 15 28 36 27 26 28 24 35 27 28 40 34 15 SNP genotype calls Hapmap haplotypes rs12095710 T T 9.988139e-01 rs12127179 C T 9.986735e-01 rs11800791 G G 9.977713e-01 rs11578310 G G 9.980062e-01 rs1287622 G G 8.644588e-01 rs11804808 C C 9.977779e-01 rs17471528 A G 5.236099e-01 rs11804835 C C 9.977759e-01 rs11804836 C C 9.977925e-01 rs1287623 G G 9.646510e-01 rs13374307 G G 9.989084e-01 rs12122008 G G 5.121655e-01 rs17431341 A C 5.290652e-01 rs881635 G G 9.978737e-01 rs9700130 A A 9.989940e-01 rs11121600 A A 6.160199e-01 rs12121542 A A 5.555713e-01 rs11121605 T T 8.387705e-01 rs12563779 G G 9.982776e-01 rs11121607 C G 5.639239e-01 rs11121608 G T 5.452936e-01 rs12029742 G G 9.973527e-01 rs562118 C C 9.738776e-01 rs12133533 A C 9.956655e-01 rs11121648 G G 9.077355e-01 rs9662691 C C 9.988648e-01 rs11805141 C C 9.928786e-01 rs1287635 C C 6.113270e-01 90 209342 16 F 0 0 2110001?0100210010011002122201210211?1221220212000 18 F 15 16 21100012010021001001100?100201?10111110111?0212000 15 M 0 0 21120010012001201001120010110101011111011110212000 7 M 0 0 2110001001000200122110001111011100111?121210222000 8 F 0 0 011202100120022012211200101101210211122111?0120000 12 F 9 10 21100010010002001221100010110111001112121210220000 9 M 0 0 011?001?012002201221120010?10121021112211110120000 11 M 7 8 21100210010002001221100012110111001112121210222000 90 209342 16 F 0 0 2110001?0100210010011002122201210211?1221220212000 18 F 15 16 21100012010021001001100?100201?10111110111?0212000 15 M 0 0 21120010012001201001120010110101011111011110212000 7 M 0 0 2110001001000200122110001111011100111?121210222000 8 F 0 0 011202100120022012211200101101210211122111?0120000 12 F 9 10 21100010010002001221100010110111001112121210220000 9 M 0 0 011?001?012002201221120010?10121021112211110120000 11 M 7 8 21100210010002001221100012110111001112121210222000 90 209342 16 F 0 0 2110001?0100210010011002122201210211?1221220212000 18 F 15 16 21100012010021001001100?100201?10111110111?0212000 15 M 0 0 21120010012001201001120010110101011111011110212000 7 M 0 0 2110001001000200122110001111011100111?121210222000 8 F 0 0 011202100120022012211200101101210211122111?0120000 12 F 9 10 21100010010002001221100010110111001112121210220000 9 M 0 0 011?001?012002201221120010?10121021112211110120000 11 M 7 8 21100210010002001221100012110111001112121210222000

  36. Datasets Watson Sequencing data: 74.4 million 454 reads (of 106.5 million reads used in [Wheeler et al 08]) Reference panel: CEU genotypes from Hapmap r23a phased using the ENT algorithm [Gusev et al. 08] Ground truth: duplicate Affymetrix 500k SNP genotypes

  37. Datasets (contd.) NA18507 (Illumina & SOLiD) Sequencing data: 525 million Illumina reads (36bp, paired) and 764 million SOLiD reads (24 - 44bp, unpaired) Reference panel: YRI haplotypes from Hapmap r22 excluding NA18507 haplotypes Ground truth: Hapmap r22 genotypes

  38. Mapping Procedure 454 reads mapped on human genome build 36.3 using the NUCMER tool of the MUMmer package [Kurtz et al 04] with default parameters Additional filtering: at least 90% of the read length matched to the genome, no more than 10 errors (mismatches or indels) Reads meeting above conditions at multiple genome positions (likely coming from genomic repeats) were discarded Illumina and SOLiD reads mapped using MAQ [Li,Ruan&Durbin 08] with default parameters For reads mapped at multiple positions MAQ returns best position (breaking ties arbitrarily) together with mapping confidence We filtered bad alignments and discarded paired end reads that are not mapped in pairs using the “submap -p” command

  39. Mapping statistics

  40. Concordance vs. avg. coverage(Watson 454 reads)

  41. Tradeoff with call rate (5.85x Watson 454 reads, homo SNPs)

  42. Tradeoff with call rate (5.85x Watson 454 reads, het SNPs)

  43. Concordance vs. avg. coverage for NA18507 (Illumina & SOLiD reads)

  44. Effect of local recombination rate (NA18507 Illumina)

  45. Effect of SNP coverage (NA18507 Illumina)

  46. Posterior decoding algorithm has scalable running time and yields significant improvements in genotyping calling accuracy Improvement depends on the coverage depth (higher at lower coverage), e.g., accuracy achieved by previously proposed binomial test at 5-6x average coverage is achieved by HMM-based posterior decoding algorithm using less than 1/4 of the reads Open source code available at http://dna.engr.uconn.edu/software/GeneSeq/ LD-based genotype calling increasingly attractive as reference panels improve (denser, more samples, more populations) Allows sequencing larger populations for the same cost Conclusions

  47. Haplotype reconstruction Promising preliminary results using Viterbi-like algorithm based on HF-HMM Extension to population sequencing data Removes need for reference panels! Integrated read mapping, SNP identification, and haplotype reconstruction EM algorithm that iteratively refines two full haplotype sequences and read mapping probabilities Integrates read data with LD info available for known SNPs Takes advantage of reads overlapping multiple SNP loci Allows reconstruction of complete sequences for CNVs Reconstruction of complex haplotype spectra mRNA isoforms, quasispecies Ongoing work

  48. Acknowledgments • Work supported in part by NSF awards IIS-0546457 and DBI-0543365 to IM and IIS-0803440 to YW. SD and YH performed this research as part of the Summer REU program “Bio-Grid Initiatives for Interdisciplinary Research and Education" funded by NSF award CCF-0755373.

More Related