1 / 31

Aligning Reads Ramesh Hariharan Strand Life Sciences IISc

Aligning Reads Ramesh Hariharan Strand Life Sciences IISc. What is Read Alignment?. Subject’s Genome. Where do these match in the Reference?. AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC. AGGCTACGCAT G TCCCATAA T GACCCAC A CTTAAGTTC. Reference Genome.

necia
Download Presentation

Aligning Reads Ramesh Hariharan Strand Life Sciences IISc

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Aligning Reads Ramesh HariharanStrand Life SciencesIISc

  2. What is Read Alignment?

  3. Subject’s Genome Where do these match in the Reference? AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC Reference Genome Close but not quite the same as the Subject’s Genome

  4. What does “Match” mean?

  5. Exact Match With Mismatches With Gaps GCTACGCA CATAAAGAC CACTT_AGT AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC Reference Genome

  6. Why mismatches and gaps?

  7. The subject genome could be different from the reference

  8. Mismatches and Gaps Reference Genome Deletion Reads SNP

  9. The reading process could be erroneous

  10. How many mismatches and gaps?

  11. Short reads ~50, few mismatches and gaps Long reads, ~1000, many more mismatches and gaps

  12. How do aligners fare?

  13. Separate handling for RNASeq No handling of adaptor trimming for small RNA BWA: Very few mismatches and gaps BowTie: only mismatches, no gaps CoBWeb BWA-SW: Many mismatches and gaps BowTie2 No paired read handling

  14. How does an Aligner work?

  15. For simplicity, assume Exact Match

  16. For each read, scan the entire reference genome sequence SLOW!!!!

  17. The Reference T C C G A C G Index the Reference C G A T T A C G A C

  18. How can we find Exact Matches of a read quickly with this index?

  19. The Reference T C C G A C G C G A T T A C G A C C C G

  20. The problem: 24GB

  21. Can this structure be compressed?

  22. The Burrows-Wheeler based Index The Reference C G A C $ All its circular shifts, sorted lexicographically This column is the BWT A C $ C G C G A C $ C $ C G A G A C $ C $ C G A C The Index: now an array instead of a tree Sampled to reduce memory at the expense of speed (Ferragina and Manzini)

  23. How about Mismatches and Gaps?

  24. BWA, BWA-SW and BowTie force mismatches and gaps into the BW Index searching procedure

  25. CoBWebuses the BW Index to find a ‘seed’ exact match and does Smith-Waterman around this seed This 15-mer occurs at locations x1, x2… This 15-mer occurs at locations x3, x4… This whole 30-mer occurs at location x5

  26. Dynamic Programming • Given a location in the reference with an read anchor, how well does the read match here? Anchor 14 mer Reference Read • Smith-Waterman (optimized for large gaps)

  27. Comparison with BWA BWA: 2 mismatches + 1 gap of possibly multiple length CoBWeb: 3 mismatches and 2 gaps Read Length 50 Read Length 150 20% faster than BWA with comparable results

  28. Comparison with BWA-SW 8 mismatches plus 10 gaps Read Length 400 5650 mapped incorrecty by BWA-SW The remainder has poor BWA mapping quality

  29. Avadis NGS

  30. Avadis NGS Alignment, DNA Var Detection, RNASeq, ChIPSeq, Small RNASeq

  31. Thank You

More Related