Download
genomics quick start n.
Skip this Video
Loading SlideShow in 5 Seconds..
Genomics Quick Start PowerPoint Presentation
Download Presentation
Genomics Quick Start

Genomics Quick Start

136 Views Download Presentation
Download Presentation

Genomics Quick Start

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Genomics Quick Start Mikhail DvorkinVladislav IsenbaevEugene Kapun Scientific advisors Acad. Konstantin Skryabin, Bioengineering RASProf. Anatoly Shalyto, SPbSU ITMO   

  2. Collaboration with Bioengineering RAS • Bioengineering RAS • Conducts biological experiments • Sets problems • Provides biological data • SPbSU ITMO • Develops algorithms and programs • Started in the end of 2009 • Why us? SPbSU ITMO: Genomics Quick Start

  3. SPbSU ITMO at ACM ICPC We train Zürich ETHMay be, MIT? :-) EugeneKapun VladislavIsenbaev MikhailDvorkin GeorgiyKorneev SPbSU ITMO: Genomics Quick Start

  4. MikhailDvorkin SPbSU ITMO: Genomics Quick Start

  5. EugeneKapun VladislavIsenbaev SPbSU ITMO: Genomics Quick Start

  6. SPbSU ITMO: Genomics Quick Start

  7. Genome Team Coach • GeorgiyKorneev Members • Mikhail Dvorkin • Vladislav Isenbaev • Eugene Kapun SPbSU ITMO: Genomics Quick Start

  8. Problems Being Solved • DNA assembly de novo based on pair reads • Generalized suffix tree traversal • Reduction to single reads • DNA alignment with transfers SPbSU ITMO: Genomics Quick Start

  9. DNA Assembly 1 Generalized suffix tree traversal

  10. Suffix Tree • Built upon reads • Arc weight: number and quality of reads • Possible extensions • Erroneous nucleotides detection SPbSU ITMO: Genomics Quick Start

  11. Building up a Contig • Start with high-quality read • Use pair reads to select a nucleotide • “Backward” – match the past • “Forward” – match the future • Build up to a branch SPbSU ITMO: Genomics Quick Start

  12. Results • Caenorhabditis elegans • Escherichia coli K-12 SPbSU ITMO: Genomics Quick Start

  13. DNA Assembly 2 Reduction to single reads

  14. Concept • De Bruijn graph with all reads • Pair reads • Path in the graph • Low density – backtracking • Slow – Meet-in-the-middle SPbSU ITMO: Genomics Quick Start

  15. Error detection • Poorly covered vertices • Erroneous • Delete them • Repeat • Paths • Single reads • Use another tool SPbSU ITMO: Genomics Quick Start

  16. Results • 60% erroneous reads detected • < 0.1% errors left after one iteration • 99.5% DNA coverage SPbSU ITMO: Genomics Quick Start

  17. DNA Alignment with transfers

  18. Concept • Parts • Matched (small edit distance) • Unmatched • Swapping allowed • Penalties • Number of parts • Edit distance in matched parts • Length of unmatched parts SPbSU ITMO: Genomics Quick Start

  19. Implementation First DNA • Tear into small pieces • Hash ‘em and store ‘em Second DNA • Tear into small pieces • Look them up • Build them up SPbSU ITMO: Genomics Quick Start

  20. Results SPbSU ITMO: Genomics Quick Start