1 / 17

An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

An Improved Search Algorithm for Optimal Multiple-Sequence Alignment. Paper by: Stefan Schroedl Presentation by: Bryan Franklin. Outline. Multiple-Sequence Alignment (MSA) Graph Representation Computing Path Costs Heuristics Experimental Results Other Optimizations.

hafwen
Download Presentation

An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Improved Search Algorithm for Optimal Multiple-Sequence Alignment • Paper by: Stefan Schroedl • Presentation by: Bryan Franklin

  2. Outline • Multiple-Sequence Alignment (MSA) • Graph Representation • Computing Path Costs • Heuristics • Experimental Results • Other Optimizations

  3. Multiple-Sequence-Alignment • Sequence: • DNA: String over alphabet {A,C,G,T} • Protein: String with |Σ|=20 (one symbol for each amino acid) • Alignment: • Insert gaps (_) into sequences to line up matching characters.

  4. Multiple-Sequence-Alignment Sequences: ABCB, BCD, DB

  5. Multiple-Sequence-Alignment • Indel: Insertion, Deletion, Point mutation (single symbol replacement) • Find a minimum set of indels between two or more sequences. • NP-Hard for an arbitrary number of sequences

  6. Multiple-Sequence-Alignment • Applications • Common ancestry between species • Locating useful portions of DNA • Predicting structure of folded proteins

  7. Graph Representation

  8. Computing G(n) • Considerations • Biological Meaning • Cost of computation • Sum-of-pairs • Substitution matrix ((|Σ|+1)2) • Sum alignment costs for all pairs • Costs can depend on neighbors

  9. Computing G(n) Sequences: ABCB, BCD, DB

  10. Computing G(n) 7 7 8 7 6

  11. Computing G(n)

  12. Heuristics • Methods Examined • 2-fold (hpair) • divide-conquer • 3-fold (h3,all) • 4-fold (h4,all)

  13. Heuristic Comparison

  14. Resource Usage

  15. Optimizations • Sparse Path Representation • Curve fitting for predicting threshold values • Sub-optimal paths periodically deleted

  16. The End • Any Questions?

More Related