190 likes | 592 Views
Irys data analysis. January 10 th , 2014. Irys Workflow – Data Analysis. Using a reference ( eg hg19) Using a second genome map Using NGS contigs. Short NGS Contigs. RefSeq Reference. Structural variation detection. ir y s ™ ICS. ir y s View ™. Gross assembly quality
E N D
Irys data analysis January 10th, 2014
Irys Workflow – Data Analysis Using a reference (eg hg19) Using a second genome map Using NGS contigs Short NGS Contigs RefSeq Reference Structural variation detection irys™ICS irysView™ Gross assembly quality (reiterate) Missing sites, extra sites, interval differences structural differences Consed Sequence Assembly Validation Genome Map (.cmap) Single molecule maps (.bnx) Sample Anchoring (.xmap) Image processing Alignment in irysview manual editing AGP output Conversion to FASTA Reimport superscaffolds to reiterate Analysis Integration Sequence contig scaffolding Scanning Assembly Genome Map 2 Sequence scaffolding without de novo assembly Two color applications: epigenetics, DNA damage Mapping based variant calling
workshops • De novo assembly (Using irysview (Alex); Python/command line – Heng/Ernest) • SV detection – Warren/Andy
Core workflow: Data QC: molecule quality report Always consider the mapping rate with respect to the stringency setting Mapping rate helps us estimate the useful coverage depth as well as data quality
Stretch normalization • Evaporation (increasing [salt]) during the scanning prolonged of version 2 chips results in shortening of molecules in nanochannels. This can be corrected for by measuring the average stretch in each scan and correcting with a normalization factor. • Determining average stretch: • Internal ruler based normalization • Reference mapping based normalization
Core workflow: De novo assembly: optArg From molecule quality report and .err file p value based on genome size or as stringent as possible Stringencies vary based on step
No reference? • With no reference, we can run a de novo assembly based on expectations and data QC observations: • Expected genome size • Site density (in silico) • Label density (empirical) • Molecule n50 (empirical) • Run de novo assembly (relaxed) • Use the result of the de novo assembly to run molecule quality report • Update error characteristics (stretch normalization) and rerun de novo assembly
De novo assembly QC We started with 1.8Gb (>100kb) that mapped at 40%. We had a good quality reference so we expect to use ~0.8Gb. Genome has 14 chromosomes Expected size is 20Mb Map n50 is good, we may be able to further improve it with additional depth or optimized sample prep
De novo assembly QC Higher stringency assembly The higher stringency assembly misses some of the genome but resolves the chimera
Applications: Sequence anchoring
DNA sequence scaffolding Short-Read NGS Only NGS + Cosmids 3rd-Gen Reads BioNano Genomics
Sequence anchoring 1 Mb Illumina: 9.08Mb, 124 contigs, n50 length: 92kb , 8.9Mb anchored Illumina + cosmids: 11.38Mb, 97 contigs, n50 length: 154kb , 11.38Mb anchored Pac Bio: 11.63Mb, 20 contigs, n50 length: 918kb , 11.63Mb anchored Validate sequence assembly Find errors Scaffold/Orient/Size gaps Output FASTA or AGP (soon)
Applications: Structural variation
Structural Variation-Insertion/Deletion Calls (vs hg19) 95 regions in BioNano GenomeMaps correspond to N-based gaps in hg19 (not included in graph). The gaps may contain repeats and polymorphic regions, where SV enriches.
Structural Variant Examples: Insertions and Deletions hg19 Genome Map Molecules +4.9kb 17.5 kb region 12.6 kb region hg19 Genome Map Molecules -176,265 kb 4.9 kb region 181.2 kb region
workshops • De novo assembly (Using irysview (Alex); Using Python/command line – Heng/Ernest) • OptArg- iterations, stringencies, merging, ref mapping • Output • .err file • Alignref • Visualization of genome maps to molecules • Identification of chimeras • SV detection – Warren/Andy • Explain the SV detection application (consider IP issues) • Discuss stringency parameters • Show resulting table • ranges • explain types