1 / 19

Irys data analysis

Irys data analysis. January 10 th , 2014. Irys Workflow – Data Analysis. Using a reference ( eg hg19) Using a second genome map Using NGS contigs. Short NGS Contigs. RefSeq Reference. Structural variation detection. ir y s ™ ICS. ir y s View ™. Gross assembly quality

lefty
Download Presentation

Irys data analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Irys data analysis January 10th, 2014

  2. Irys Workflow – Data Analysis Using a reference (eg hg19) Using a second genome map Using NGS contigs Short NGS Contigs RefSeq Reference Structural variation detection irys™ICS irysView™ Gross assembly quality (reiterate) Missing sites, extra sites, interval differences structural differences Consed Sequence Assembly Validation Genome Map (.cmap) Single molecule maps (.bnx) Sample Anchoring (.xmap) Image processing Alignment in irysview manual editing AGP output Conversion to FASTA Reimport superscaffolds to reiterate Analysis Integration Sequence contig scaffolding Scanning Assembly Genome Map 2 Sequence scaffolding without de novo assembly Two color applications: epigenetics, DNA damage Mapping based variant calling

  3. workshops • De novo assembly (Using irysview (Alex); Python/command line – Heng/Ernest) • SV detection – Warren/Andy

  4. Core workflow: Data QC: basic molecule stats

  5. Core workflow: Data QC: molecule quality report Always consider the mapping rate with respect to the stringency setting Mapping rate helps us estimate the useful coverage depth as well as data quality

  6. Stretch normalization • Evaporation (increasing [salt]) during the scanning prolonged of version 2 chips results in shortening of molecules in nanochannels. This can be corrected for by measuring the average stretch in each scan and correcting with a normalization factor. • Determining average stretch: • Internal ruler based normalization • Reference mapping based normalization

  7. Core workflow: De novo assembly: optArg From molecule quality report and .err file p value based on genome size or as stringent as possible Stringencies vary based on step

  8. No reference? • With no reference, we can run a de novo assembly based on expectations and data QC observations: • Expected genome size • Site density (in silico) • Label density (empirical) • Molecule n50 (empirical) • Run de novo assembly (relaxed) • Use the result of the de novo assembly to run molecule quality report • Update error characteristics (stretch normalization) and rerun de novo assembly

  9. De novo assembly QC We started with 1.8Gb (>100kb) that mapped at 40%. We had a good quality reference so we expect to use ~0.8Gb. Genome has 14 chromosomes Expected size is 20Mb Map n50 is good, we may be able to further improve it with additional depth or optimized sample prep

  10. De novo assembly QC

  11. De novo assembly QC Higher stringency assembly The higher stringency assembly misses some of the genome but resolves the chimera

  12. Applications: Sequence anchoring

  13. DNA sequence scaffolding Short-Read NGS Only NGS + Cosmids 3rd-Gen Reads BioNano Genomics

  14. Sequence anchoring 1 Mb Illumina: 9.08Mb, 124 contigs, n50 length: 92kb , 8.9Mb anchored Illumina + cosmids: 11.38Mb, 97 contigs, n50 length: 154kb , 11.38Mb anchored Pac Bio: 11.63Mb, 20 contigs, n50 length: 918kb , 11.63Mb anchored Validate sequence assembly Find errors Scaffold/Orient/Size gaps Output FASTA or AGP (soon)

  15. Applications: Structural variation

  16. Structural Variation-Insertion/Deletion Calls (vs hg19) 95 regions in BioNano GenomeMaps correspond to N-based gaps in hg19 (not included in graph). The gaps may contain repeats and polymorphic regions, where SV enriches.

  17. Structural Variant Examples: Insertions and Deletions hg19 Genome Map Molecules +4.9kb 17.5 kb region 12.6 kb region hg19 Genome Map Molecules -176,265 kb 4.9 kb region 181.2 kb region

  18. workshops • De novo assembly (Using irysview (Alex); Using Python/command line – Heng/Ernest) • OptArg- iterations, stringencies, merging, ref mapping • Output • .err file • Alignref • Visualization of genome maps to molecules • Identification of chimeras • SV detection – Warren/Andy • Explain the SV detection application (consider IP issues) • Discuss stringency parameters • Show resulting table • ranges • explain types

More Related