1 / 12

Variant discovery

BIT 815: Analysis of Deep Sequencing Data. Variant discovery. Different approaches: With or without a reference? With a reference Limiting factors are CPU time and memory required Crossbow – a cluster-based cloud computing approach Without a reference

trish
Download Presentation

Variant discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BIT 815: Analysis of Deep Sequencing Data Variant discovery • Different approaches: With or without a reference? • With a reference • Limiting factors are CPU time and memory required • Crossbow – a cluster-based cloud computing approach • Without a reference • CPU time and RAM requirements are still limiting • Now error rate and distribution become limiting also • Statistical methods for estimating probability that a putative SNP is a true SNP are still developing • Some analytical methods require experimental designs specifically for the variant discovery objective

  2. Structural variants in 7 flavors Figure from Alkan et al, Nature Reviews Genetics 2011 doi:10.1038/nrg2958

  3. Different technologies have different resolutions Figure from Alkan et al, Nature Reviews Genetics 2011 doi:10.1038/nrg2958

  4. Four strategies for discovery of structural variation using parallel sequencing technologies All are affected by the repetitive sequence content of the genome and by sampling error Assembly de novo of a complete genome sequence is the most expensive but most complete approach Figure from Alkan et al, Nature Reviews Genetics 2011 doi:10.1038/nrg2958

  5. BIT 815: Deep Sequencing • Comparison of various methods for variant discovery, based on analysis of 185 human genomes • (a) Two microarray-based methods compared with Sanger sequencing of fosmid ends (40 kb inserts) – counts include only variants > 5 kb • (b) Three parallel-sequence-based methods compared. • The numbers of variants discovered is several-fold higher than in part (a) • There is relatively little overlap among the variants discovered using different methods Figure from Alkan et al, Nature Reviews Genetics 2011 doi:10.1038/nrg2958

  6. Small indels create problems for SNP-calling programs Incorrectly aligned Correctly aligned Incorrectly aligned Artifactual SNP calls Figure from http://samtools.sourceforge.net/mpileup.shtml

  7. Base Alignment Quality (BAQ) is one approach to dealing with the problem Figure from http://samtools.sourceforge.net/mpileup.shtml

  8. Small indels create problems for SNP-calling programs Incorrectly aligned Correctly aligned Incorrectly aligned BAQ downgrades quality scores of thesebases so they are not considered reliable by SNP-calling programs Figure from http://samtools.sourceforge.net/mpileup.shtml

  9. Short Read Multiple Aligner (SRMA) is another approach – actually re-aligns reads Figure from Homer and Nelson, Genome Biology 2010, 11:R99

  10. Short Read Multiple Aligner (SRMA) The downside is computational intensity From Homer and Nelson Genome Biology 2010, 11:R99

  11. What about rare alleles? Efficient screening in pooled samples • Druley TE, et al. (2009) Quantification of rare allelic variants from pooled genomic DNA. Nat Methods 6(4):263-5. • Vallania FL, et al. (2010) High-throughput discovery of rare insertions and deletions in large cohorts. Genome Res 20(12):1711-8. • Bansal V, et al (2010) Accurate detection and genotyping of SNPs utilizing population sequencing data.Genome Res 20(4):537-45. • Bansal V. (2010) A statistical method for the detection of variants from next-generation resequencing of DNA pools. Bioinformatics 26(12):i318-24. • Bansal V, et al. (2011) Efficient and cost effective population resequencing by pooling and in-solution hybridization. PLoS One 6(3):e18353. • Altmann A, et al (2011) vipR: variant identification in pooled DNA using R. Bioinformatics 27(13):i77-84.

  12. What about rare alleles? Comparing across pools adds to power Five alternate base calls in one of four pooled samples is unlikely to arise by chance sequencing errors alone – the p-value from the contingency table is 0.002 Five of nine alternate base calls in one pool, with one call in each of the other three, is a pattern that cannot be distinguished from that expected of sequencing errors – the contingency table p-value is 0.24

More Related