1 / 22

Illumin8er: Software for the Illumina GAII

Illumin8er: Software for the Illumina GAII. Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron & Graham Taylor Leeds Institute of Molecular Medicine, Leeds Teaching Hospitals & Cancer Research UK. Sipping from the hosepipe. The cost of DNA sequencing is plummeting

osma
Download Presentation

Illumin8er: Software for the Illumina GAII

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Illumin8er: Software for the Illumina GAII Ian Carr, Joanne Morgan, Phil Chambers, Alex Markham, David Bonthron& Graham Taylor Leeds Institute of Molecular Medicine, Leeds Teaching Hospitals & Cancer Research UK

  2. Sipping from the hosepipe • The cost of DNA sequencing is plummeting • Current sequence output from an Illumina GAII is over 1 Gigabase per day • Managing the data is the single biggest challenge to bringing the benefits to patients and cost savings to to the Healthcare budget • The next biggest challenge is optimising the workflow to achieve cost efficiency

  3. What should the software do? • Scan for and report mutations against a defined reference sequence. • Be able to handle bar-code sequence tags • Be easy to use • Report on data quality • Export to a database

  4. Why Illumina? • Cost: 0002p per base • Capacity: 3.5 Gigabase per run • Simplicity: library>cluster station>sequence>data

  5. 500,000,000 bases per channel

  6. Software requirements • Runs in MS Windows • User definable reference sequence • Quality scores • Automatic mutation calling • SNPs • Indels • Speed

  7. Initial data manipulation • Illuminator can transform data in prb.txt or seq.txt in to fasta files • If tagged data is used each tag is separated in to an individual file. • The prb.txt files can be filtered for low quality data

  8. Reference files • Reference files are created from plain text files of the genomic sequence and a cDNA sequence in either a plain text file or a genbank web page. • If a genbank page is used the SNP data in the page is also imported with cDNA sequence. • The reference file contains the position of the exons and ORF relative to the genomic sequence to aid mutation annotation.

  9. Each octamer in the reference sequence is mapped to an array of 65537 octamers (the extra one is for unmapped rubbish such as ‘nnnnnnnn’) Some octamers have no positions in the reference while others have several. Indexing the reference sequence GCTGGTGAGGGGTGGGGCAGGAGTGCTTGGGTTGTGGTGAAACATTGG nnnnnnnn aaaaaaaa aaaaaaac aaaaaaag aaaaaaat aaaaaaca aaaaaacc ~65000 tttttttc tttttttg tttttttt

  10. Mapping reads with 3’ mismatches GCTGGTGAGGGGTGGGGCAGGAGTGCTTGGGTTGTGGTGAAACATTGG TGAGGGGTGGGGCAGGAGTGCTTGGGTTGTGGGAAA Position where octamer is found in ref seq 1830 2500 606 2900 5000 614 8900 306 622 1400 Match up positions where octamer increase by 8 not +8bp NA 606 2900 5000 614 8900 306 622 1400 +8bp +8bp 3’ mismatches have a run of 3 foot prints with the last octomer missing. This goes in to array 2 (phase 2)

  11. Mapping reads with 5’ mismatches GCTGGTGAGGGGTGGGGCAGGAGTGCTTGGGTTGTGGTGAAACATTGG GTGAGGGGGGGGCAGGAGTGCTTGGGTTGTGGTGAA Position where octamer is found in ref seq 630 5700 614 8900 306 622 1400 Match up positions where octamer increase by 8 not +8bp 630 NA 614 8900 306 622 1400 +8bp +8bp 5’ mismatches have a run of 3 foot prints with the first octomer missing. This goes in to array 3 (phase 3)

  12. Mapping reads with internal mismatches GCTGGTGAGGGGTGGGGCAGGAGTGCTTGGGTTGTGGTGAAACATTGG TGAGGGGTGGGGCAGAAGTGCTTGGGTTGTGGTGAA Position where octamer is found in ref seq 630 606 2900 5000 1664 5900 306 622 1400 Match up positions where octamer increase by 8 not +8bp not +8bp 630 606 2900 5000 1664 5900 306 622 1400 +8bp +16bp internal mismatches have a run of 3 foot prints with either the second or third octamer out of phase. This goes in to array 4 (phase 4)

  13. What each phase is used for • Phase 1 = perfect matches • Phase 2 = indels and small mutations at end of a read • Phase 3 = indels and small mutations at start of a read • Phase 4 = small mutations in the middle of read

  14. Small changes • These are found by looking at Phase 4 data. • Homozygous mutation are in Phase 4 but not phase 1 (seen as a hole) • Heterozygous variants are in seen in phase 4 and wt seen in phase 1 data. Mut in Phase 4 Data. (The wt allele Is present due to seq errors elsewhere in the read.) WT in Phase 1 data

  15. InDels • Phase 2 data gets indels from end of the read while Phase 3 gets them from the start of the read. • In a perfect world Phase 2 and 3 data should mirror each other.

  16. Global view The red and blue lines show the read depth of forward and reverse reads. Data for a PCR product containing two exons; blue = exonic DNA pink = protein coding DNA The lower panel shows the reference and deduced sequences around the a point on the upper panel selected by clicking on the panel with the mouse

  17. Data view Patient sequence Score for each nucleotide Reference genomic, cDNA and protein sequence Patient’s other allele sequence Read depth Heterozygous base Forward and Reverse sequences

  18. Forward and Reverse sequences Indel interface Reference sequence Patient sequences with indel at start and end of read Consensus sequence of patient reads across indel Alignment of patient and reference sequence to identify indel

  19. Data export • The program can both export and import the alignment data as a plain text file • Create an updatable library of sequence variants • Export sequence variants as a text file • Create a LOVD import file for the sequence variants

  20. Validation: BRCA1&BRCA2 • Illuminator detected all the mutations previously identified by dye terminator Sanger sequencing of the exons in BRCA1 and 2 of 10 individuals. Each nucleotide had a read depth of at least 75 reads (approximately 6.6x103 sequences per gene). The alignment and mutation annotation took ~50 seconds per gene per person

  21. Conclusions • Illumin8er is • Easy to use • Rapid • Runs on Windows desktop • Uses standard Illumina output files • Reports mutations in a sensitive and specific manner

  22. Next steps.. • Make freely available by download • http://dna.leeds.ac.uk/illumin8er/ • Design compatible LOVD • Large scale validation trial

More Related