190 likes | 312 Views
Join Dr. Elisha Roberson as he explores the complexities of clinical-grade next-generation sequencing (NGS) at Washington University in St. Louis. He discusses methods like Sanger sequencing and various NGS technologies, emphasizing their differences in error rates and throughput. You'll learn about actionable somatic variants detection, the importance of sample collection, and the nuances of bioinformatics in genomic analysis. From DNA quality to depth of sequencing, discover how these approaches inform cancer research and patient care.
E N D
Clinical grade next-generation sequencing of UM Elisha Roberson, Ph.D. Depts. of Internal Medicine and Genetics Washington University in St. Louis
All talk content can be tweeted and blogged @thatdnaguy
Sanger ≠ next-gen sequencing • Sanger • Consensus of a population of molecules • ~600-800bp sequence • Very low error rate • Low-throughput • Targeted • Expensive* (256 bp/$) • NGS • Single-molecules • 35 bp – 3+ kb • High error rates (1-15+%) • High-throughput • Can shotgun • Cheap* (11.5 Mbp/$) *Our lab’s current Sanger & HiSeq2500 costs
UM Clinical sequencing • CLIA/CAP • Detect actionable somatic variants • Insurance pre-approval • Submit DNA • Wait a long time • Pay $7000+ / sample • Research (discovery, epidemiology, etc) • IRB approved DNA collection • You sequence and interpret
My sequencing wishlist • Tissue • Fresh tissue • Large amount of high-quality DNA • No PCR amplification • Sequencing (whole genome) • Low-error, long reads • Paired-end • 30X or greater germline coverage • 60X or greater tumor coverage • Bioinformatics • De novo assembly of both germline and tumor, compare to reference & each other • Genotyping algorithm that is pair-aware
NGS technologies • ABI – SOLiD • Emulsion PCR by di-base ligation • Illumina – Solexa • Single-molecule fluor sequencing • Life Tech - Ion Torrent • Single-molecule semiconductor sequencing • PacBio – SMRT • Zero-mode wave guide with single polymerase in-well
Where should I get DNA??? • Germline • NO blood • Sequencing single molecules!!! • Spit kits, washed skin biopsies • Tumor tissue • Fresh primary tumor • Fresh metastatic tumor • FFPE?
Calculating sequencing depth • NGS technology dependent • How deeply do you want to see mosaicism? • General guidelines • 30X germline • 60X or higher tumor • BUT sequencing follows Poisson distribution • i.e. 30X average coverage != all targets 30X
Coverage also varies by tissue* *Plot available on Figshare
FASTQ Preprocessing • Demultiplex samples • Discard no index • Convert to PHRED quality scale • -10 x log10( probability of base error ) • Remove adapter contamination • cutadapt • Trim low-quality trailing bases and 3’ Ns • No 5’ trimming!!! • Run FASTQC!
Alignment to reference • Current human is GRCh37 • Repeat masking • Hard mask repeats are N • Soft mask repeats are lowercase • Prefer soft masking • Ref aligners have generally low memory use • Mostly use Burrows-Wheeler transform • bowtie 1 & 2 • bwa-aln & bwa-mem • Novoalign (high memory)
De novo assembly • Most De Bruijn graphs with kmers of sequence • Mostly very high memory usage • Depends on depth and number of kmers • Try running diginorm first (C. Titus Brown) • Aligners • ABySS • MIRA • SOAPdenovo • Velvet
Post-processing • Picard tools • Convert to BAM format • Add read-group tags • Mark duplicates (Picard tools) • Genome Analysis ToolKit (GATK) • Local realignment of indels • Base quality score recalibration
Genotyping • General • Genome analysis toolkit • Unified Genotyper • Samtools • Mpileup • Somatic specific • MuTect • Somatic Sniper • VirMiD
Variant filtering strategies –sequential evolution Mutation with metastatic advantage leaves the eye Mosaic primary Initiating event Metastasis has primary mutations & Metastatic mutation (maybe in primary?) & New mutations
Variant filtering strategies –parallel evolution Mosaic primary Very little overlap in mutations between primary and met! Initiating event Metastasis has few primary mutations & Metastatic mutation (not in primary) & New mutations
Variant confirmation • Sanger sequencing • Fluidigm • TaqMan • castPCR • Sequenom • Illumina Golden Gate Discover in a focused set with sequencing Type with these technologies in everything