Irys data analysis

Irys data analysis January 10th, 2014

Irys Workflow – Data Analysis Using a reference (eg hg19) Using a second genome map Using NGS contigs Short NGS Contigs RefSeq Reference Structural variation detection irys™ICS irysView™ Gross assembly quality (reiterate) Missing sites, extra sites, interval differences structural differences Consed Sequence Assembly Validation Genome Map (.cmap) Single molecule maps (.bnx) Sample Anchoring (.xmap) Image processing Alignment in irysview manual editing AGP output Conversion to FASTA Reimport superscaffolds to reiterate Analysis Integration Sequence contig scaffolding Scanning Assembly Genome Map 2 Sequence scaffolding without de novo assembly Two color applications: epigenetics, DNA damage Mapping based variant calling

workshops • De novo assembly (Using irysview (Alex); Python/command line – Heng/Ernest) • SV detection – Warren/Andy

Core workflow: Data QC: basic molecule stats

Core workflow: Data QC: molecule quality report Always consider the mapping rate with respect to the stringency setting Mapping rate helps us estimate the useful coverage depth as well as data quality

Stretch normalization • Evaporation (increasing [salt]) during the scanning prolonged of version 2 chips results in shortening of molecules in nanochannels. This can be corrected for by measuring the average stretch in each scan and correcting with a normalization factor. • Determining average stretch: • Internal ruler based normalization • Reference mapping based normalization

Core workflow: De novo assembly: optArg From molecule quality report and .err file p value based on genome size or as stringent as possible Stringencies vary based on step

No reference? • With no reference, we can run a de novo assembly based on expectations and data QC observations: • Expected genome size • Site density (in silico) • Label density (empirical) • Molecule n50 (empirical) • Run de novo assembly (relaxed) • Use the result of the de novo assembly to run molecule quality report • Update error characteristics (stretch normalization) and rerun de novo assembly

De novo assembly QC We started with 1.8Gb (>100kb) that mapped at 40%. We had a good quality reference so we expect to use ~0.8Gb. Genome has 14 chromosomes Expected size is 20Mb Map n50 is good, we may be able to further improve it with additional depth or optimized sample prep

De novo assembly QC

De novo assembly QC Higher stringency assembly The higher stringency assembly misses some of the genome but resolves the chimera

Applications: Sequence anchoring

DNA sequence scaffolding Short-Read NGS Only NGS + Cosmids 3rd-Gen Reads BioNano Genomics

Sequence anchoring 1 Mb Illumina: 9.08Mb, 124 contigs, n50 length: 92kb , 8.9Mb anchored Illumina + cosmids: 11.38Mb, 97 contigs, n50 length: 154kb , 11.38Mb anchored Pac Bio: 11.63Mb, 20 contigs, n50 length: 918kb , 11.63Mb anchored Validate sequence assembly Find errors Scaffold/Orient/Size gaps Output FASTA or AGP (soon)

Applications: Structural variation

Structural Variation-Insertion/Deletion Calls (vs hg19) 95 regions in BioNano GenomeMaps correspond to N-based gaps in hg19 (not included in graph). The gaps may contain repeats and polymorphic regions, where SV enriches.

Structural Variant Examples: Insertions and Deletions hg19 Genome Map Molecules +4.9kb 17.5 kb region 12.6 kb region hg19 Genome Map Molecules -176,265 kb 4.9 kb region 181.2 kb region

workshops • De novo assembly (Using irysview (Alex); Using Python/command line – Heng/Ernest) • OptArg- iterations, stringencies, merging, ref mapping • Output • .err file • Alignref • Visualization of genome maps to molecules • Identification of chimeras • SV detection – Warren/Andy • Explain the SV detection application (consider IP issues) • Discuss stringency parameters • Show resulting table • ranges • explain types

Irys data analysis

Irys data analysis

Presentation Transcript

Data Analysis

Data analysis

Data analysis

Data Analysis

Data analysis

Data Analysis

DATA ANALYSIS

DATA ANALYSIS

DATA ANALYSIS

DATA ANALYSIS

Data Analysis

Data Analysis

Data Analysis

Data Analysis

Data Analysis

Data Analysis

Data Analysis

RFID For Jewelry | Irys Pte Ltd

Data Analysis

DATA ANALYSIS

Analysis of Data Qualitative Data Analysis