Optimal Sum of Pairs Multiple Sequence Alignment

1 / 9

# Optimal Sum of Pairs Multiple Sequence Alignment - PowerPoint PPT Presentation

Optimal Sum of Pairs Multiple Sequence Alignment. David Kelley. Dynamic Programming Extension. Standard pairwise sequence alignment methods can be extended to handle k strings. But…. Runtime is O(2 k N k ) k = # of sequences N = average length of sequences Space is O(N k )

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Optimal Sum of Pairs Multiple Sequence Alignment

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Optimal Sum of Pairs Multiple Sequence Alignment

David Kelley

Dynamic Programming Extension
• Standard pairwise sequence alignment methods can be extended to handle k strings
But…
• Runtime is O(2kNk)
• k = # of sequences
• N = average length of sequences
• Space is O(Nk)
• Quickly becomes unfeasible
Enter Carillo-Lipman
• Lower bound the score
• Estimate distance from cell to end
• Calculate sum of all pairwise distances from cell to end
• If current score + estimate < lower bound
• Ignore that path
MSA
• Implemented in 1989 program MSA.
• Used a simple progressive alignment procedure to obtain a lower bound
• “generally can align 6 to 8 sequences of length 200-300 residues”
Gupta 1995 update
• Re-implemented MSA more efficiently
• Uses a star-tree heuristic for lower bound
• Ran on Sun SparcStation 10 with 128MB of RAM
• Runtimes varied (based on similarity of sequences too)
• 10 Globin B proteins of ~150 a.a. took 10 min
Can we do better?
• Better hardware
• more RAM
• multi-core processors
• Better heuristics
• MUSCLE, MAFFT very fast, accurate
• Higher lower bound means more of the matrix can be ignored
My Project
• Implement concepts from Carillo-Lipman
• Use MUSCLE for lower bound
• Look for opportunities to parallelize
• Using openMP
• Run on modern hardware
Can optimal alignment be made practical?
• How much better can we do than the previous attempts?
• How will maximizing sum of pairs compare to more popular alignment programs?
• Compare on multiple sequence alignment database, BAliBase