1 / 13

Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path

Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path. Ilya N. Shindyalov, Philip E. Bourne. Why Align Structures?. Additional measure of protein similarity Structure generally preserved better than sequence over the course of evolution

barb
Download Presentation

Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne

  2. Why Align Structures? • Additional measure of protein similarity • Structure generally preserved better than sequence over the course of evolution • May help in protein fold identification • Interesting combinatorial problem

  3. The Structural Alignment Problem • We know how to optimally superimpose two proteins of the same length so as to minimize RMSD (Hendrickson, 1979) • However, no obvious way to compare objects of different length, or to optimally add or remove gaps • Heuristic methods for structural alignment are the best we can do at the moment

  4. Alignment Fragment Pairs • For a pair of proteins A and B, an alignment fragment pair (AFP) is defined as a continuous segment of A aligned against a continuous segment of B of the same size (without gaps). • If n1 and n2 are the lengths of A and B, and AFP length is set to m, then there is a total of (n1 m)(n2 m) AFPs.

  5. Defining an Alignment • An alignment is defined as a continuous path of AFPs of fixed length m s.t. for every two consecutive AFPs there may be gaps inserted into either A or B, but not into both. That is, for every two consecutive AFPs i and i+1, we have • and or • and or • and Where piArepresents the starting position of AFP i in protein A

  6. The CE Algorithm • Goal: Find a “good” local alignment for structures of proteins A and B. • Basic idea: • Select some initial AFP. • Build an alignment path by incrementally adding AFPs in a way that satisfies the conditions on the previous slide. • Repeat step (2) until the length of each protein is traversed, or until no “good” AFPs remain.

  7. Algorithm Specifics • How do we choose the starting AFP? • What are the criteria for adding AFPs to our alignment path? • How do we know when to stop? That is, at what point do we know that there no “good” AFPs left? There are various heuristics that could be used to supply answers to the above questions.

  8. Sample Heuristics: AFP Distances • We can define the distance between two different AFPs i and j as: Here, dA(p,q) represents the distance between the alpha carbon atoms at positions p and q in protein A. Setting i=j, and using the same formula, we can define the distance Diibetween two fragments of the same AFP.

  9. Sample Heuristics:Extending the Alignment Path • Suppose our alignment path already consists of AFPs 0…n1, and we are trying to decide whether to add AFP n to the path. We will do so only if: • (4)

  10. Extending Alignment Path (Cont) Where: • D0 and D1are specified cut-off distances. • The decision whether AFP n is “fit” is based on 4. • The decision whether AFP n “works” with all the other alignments in the path is based on the 5. • The decision whether we should extend the alignment path at all is based on 6.

  11. Alignment Assessment and Post-alignment Optimization • To assess how good the alignment produced by CE is, we can compare it to the alignment of a random pair of structures, and compute the Z-score based on the RMSD distance and number of gaps in the final alignment. • Since CE does not penalize gaps, we can perform additional optimization after the CE is completed in order to remove excess gaps using dynamic programming.

  12. Results and Conclusion • The CE method is highly configurable, which is at once its strength and weakness. Adjusting multiple parameters, such as AFP length m, cutoff distances D0and D1, and definitions for AFP distances, can result varying alignments and execution speeds.

  13. Results and Conclusion • In general, CE does not outperform previously existing structural alignment methods, such as Dali and VAST: it does better for some pairs of structures, and worse for others. • Since it is fairly straightforward and easy to implement, CE provides an interesting addition to the toolbox of structural alignment algorithms.

More Related