90 likes | 120 Views
This informative text covers the differences between SVs and CNVs in genome analysis, their importance in understanding human genetic variation, and the challenges in their detection using NGS data. It discusses the role of CNVs in human diseases and personalized medicine, the transition from aCGH to NGS for CNV detection, and various approaches like read count, paired-end reads, and split-read alignments. The text provides insights into the methods, tools, and challenges associated with detecting CNVs in NGS data.
E N D
Detection of Structural Variants (SVs) and Copy Number Variations (CNVs) on NGS Data
SVs and CNVs • They are often confused… • SVs: regions contain insertions and deletions (indels) or inversions. • CNVs: regions appearing a different number of times in different individuals. • They are closely related phenomena. • SVs are operationally defined as genomic events involving >50bp. They include CNVs as well as rearrangements such as inversions and translocations.
Why studying SVs and CNVs • It has been acknowledged only very recently that human genomes differ more as a consequence of SVs (including CNVs) than of single-base differences. [first "hypothesis" in 2004-2005 not taken seriously, and evidence only in 2010 with NGS]. • In particular, many studies observed CNVs and did genotyping with them: they are the easiest SVs to detect. • I am not aware of tools for detecting inversions and translocations of DNA durectly on NGS data • There are some tools for detecting CNVs: from now on we discuss them only.
Copy Number Variations • The challenge is to discover effects of CNVs on human diseases, complex traits (combination of genetic and environmental effects), and evolution. • Genotyping of human CNVs is far from being a routine procedure. This is a limit to personalized medicine, for which CNVs detection is a crucial step. • No standard method (many and very recent tools, not yet a fair/sharp comparison) exists.
Detecting CNV • Since late nineties and until very recently (and still) CNVs were/are detected using aCGH: Array Comparative Hybridization. • The array platform are not as rapid and cheap as NGS, and their data cannot be re-used once processed. • The aGCH Array has inherent limits on the size and frequency of detectable CNVs. • NGS opened a new era in CNV-detection!
CNV calling • There are three approaches for CNV calling: • Based on read count (RC), or read alignment coverage (Breakdancer, CNVnator, CNV-seq, and tools of [Campbell et al, 2008], [Alkan et al, 2009], [Sudmant et al, 2010], [Yoon et al, 2009]). • Based on paired end reads (PEMer, CNVer, VariationHunter, MoDIL, Breakdancer, tools of [Sindi et al, 2009], [Quinlan et al, 2010]). • Based on split-read alignments (Pindel, Mosaik, tool of [Mills et al, 2006]).
Read Count Approaches • Measuring the amount of reads mapped to a location in the reference genome. • Identify CNV regions • Estimating Copy Number • Some use a sliding window. • Problems when coverage is not uniform within a CNV…
Paired End Reads Approaches • Mate/Paired End Reads are mapped on the reference genome.
Split Read Approach • A read is not mapped in a single locations because of possible structural variation. • All un-aligned reads are split and then mapping is sought again. • Iterate to find the actual breakpoint.