1 / 37

An Extended Dead-End Elimination Algorithm to Determine Gap-Free Lists of Low Energy States

An Extended Dead-End Elimination Algorithm to Determine Gap-Free Lists of Low Energy States. EDDA KLOPPMANN, G. MATTHIAS ULLMANN, TORSTEN BECKER. Improved Pruning algorithms and Divide-and-Conquer strategies for Dead-End Elimination, with application to protein design.

silvio
Download Presentation

An Extended Dead-End Elimination Algorithm to Determine Gap-Free Lists of Low Energy States

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Extended Dead-End Elimination Algorithm toDetermine Gap-Free Lists of Low Energy States EDDA KLOPPMANN, G. MATTHIAS ULLMANN, TORSTEN BECKER

  2. Improved Pruning algorithms and Divide-and-Conquer strategiesfor Dead-End Elimination, with application to protein design Ivelin Georgiev1, Ryan H. Lilien, Bruce R. Donald 2006

  3. Dead End Elimination Motivation • Structure determines function • Lowest free energy state is most probable by laws of thermodynamics • Direct calculation rarely possible So: • Conformation space is discretized • Allows for exhaustive search • Desire for an algorithm which deterministically finds the lowest energy state while circumventing combinatorial exhaustion

  4. DEE Overview(Desment, et al, 1992) • Originally applied to predict side chain positions in homology modeling • Views proteins as a set of residues (sites), each of which may adopt a finite number of rotamers (forms) • DEE identifies the highest energy forms of sites which are incompatible with the state of lowest energy • High energy forms are considered dead-ends and pruned from consideration

  5. DEE Overview Continued • DEE solves the combinatorial problem of identifying the global energy minimum for discrete pairwise system • Energy is expressed in terms of intrinsic energies of sites and pairwise interactions between sites • Each site adopts a discrete form that determines its contribution to the total energy

  6. DEE Theory • DEE identifies and eliminate forms of sites which cannot contribute to the lowest energy conformation in order to circumvent an exhaustive search • The DEE criterion employs rotameric energy interactions to identify and prune rotamers that are provably not part of the GMEC. • DEE criterion compares the energy of two forms of a site μ, dμ and cμ • If all states that contain dμ are higher in energy than the corresponding states that contain cμ, dμis a dead end and removed from consideration

  7. Motivation for X-DEE(Kloppmann, et al, 2007) • Proteins are flexible systems which may adapt several functionally relevant states • Preference for a more complete picture of the available low energy states • X-DEE produces a gap-free list of low energy states (i.e., complete up to a given distance from the global energy minimum) • Implemented to determine the lowest energy protonation states of proteins

  8. X-DEE Intuition • General idea is to exclude a list of states from the search space explored by DEE in order to construct a gap-free list • Basic idea: If a gap-free list of k low energy states {x1, · · ·, xk} is already known, the (k + 1)th state can be found by restricting the search for the lowest energy state to the set of all states M excluding the set of already known states • General idea: restrict the DEE search space to a set M (complete set of states) \ L (list of states to be excluded) for any given list L of states. • In case L is not gap-free, identify the state of lowest energy not included in L until a gap-free list of low energy states is obtained.

  9. Excluding a list of states from consideration • There is no straightforward way to exclude an arbitrary list of states L from the search space explored by DEE • So, we aim to restrict a DEE search to a specific type of subset of M: • Fixing a number of sites during a DEE search yields the state of lowest energy of a subset S of M characterized by the forms of the fixed sites • So, applying DEE to the subset S of those states that have form f at site s will determine the state of lowest energy with form f at site s • How do we do this?

  10. Constructing a Search Bias • The idea of X-DEE is to derive a search basis B composed of a set of search keys bS, such that L is excluded from the search and the complete set M \ L is searched. • The authors present a recursive procedure “CreateSearchBias” which given the list of states L to be excluded, constructs a search bias keys • Initial conditions • List L of states to be excluded from the search • Associated list vector T that contains an element for each site which keeps track of the sites which are already fixed to specific forms • Initially, all sites are unfixed (i.e., undefined)

  11. Constructing a Search Bias: Overview • With each recursion, L is divided into sublists and one additional site is fixed in the associated list vectors. CreateSearchBasis terminates when all sites of a list vectorare fixed. • With each recursion, search keys can be generated that differ from the list vector in one form. The search keys are added to the search basis B. • CreateSearchBasis generates a set of search keys bS characterizing subsets S whose union represent M\ L.

  12. Introducing Search Keys • This subset S can be represented by a so-called search key bS = (h1, ∗2, · · · , ∗μ, · · · , ∗N), where: • h is the specified form of site 1 and ∗ indicates that this site is undefined (the idea being undefined sites will be determined during the DEE search) • For each site μ of the system, these search keys have a component bμ which is either fixed to a specific form or undefined. • X-DEE will define search keys bS = (b1, · · · , bμ, · · · , bN) such that the subsets S represented by the individual search keys together represent M \ L. • Determining the state of lowest energy of all subsets via the DEE algorithm yields the desired state of lowest energy of M \ L.

  13. Recursive CreateSearchBias (L, T) Base case: Return if T does not contain any undefined sites.

  14. Recursive CreateSearchBias (L, T) Base case: Return if T does not contain any undefined sites.. Find a site μ with unused forms (i.e., forms which are not present in any of the state vectors in L). If no such site exists, choose the first undefined site and jump to step 4.

  15. Recursive CreateSearchBias (L, T) Create a search key: For each unused form h of site μ, a search key b is defined by copying the list vector t to b and fixing site μ to form h in b; bμ = h. So, each search key differs from the current list vector only at site μ. Fixing site μ to forms h not occurring in , guarantees that the subset represented by b and L are disjoint, i.e., b represents a subset of M \ L. Now add b to the search basis B.

  16. Recursive CreateSearchBias (L, T) Divide the vectors L into sublists such that site μ has form g in all state vectors x in Lsub, i.e., xμ = g for all states in Lsub. To each sublistLsub, a separate list vector tsub is assigned by copying list vector t to tsub and fixing site μ to the form g common to all state vectors in Lsub; tμ = g.

  17. Recursive CreateSearchBias (L, T) Divide the vectors L into sublists such that site μ has form g in all state vectors x in Lsub, i.e., xμ = g for all states in Lsub. To each sublistLsub, a separate list vector tsub is assigned by copying list vector t to tsub and fixing site μ to the form g common to all state vectors in Lsub; tμ = g. 5. Recurseon each sublist Ls and its list vector t

  18. Using the Search Keys • All search keys in B are subjected to a DEE search yielding the states of lowest energies of the represented subsets S. • These states include the state of lowest energy of M \ L. • The completeness of the Search Bias B is provable • Basic idea is to show (i) all subsets of states S represented by the search keys are subsets of M\ L and that (ii) the union of all subsets S represent the complete set M\ L

  19. X-DEE Application Domain • On the right: light absorption triggers Bacteriorhodopsin’s pumping cycle during which a proton is transferred from the cytoplasm to the extracellularspace. • Basic idea: Proteins contain protonatable residues whose charged state depends on their interaction with the protein environment. • These protonatable residues are treated as sites and each site with each site adopting one of two forms (protonated, unprotonated).

  20. X-DEE Application Domain • Charge distribution of a protein is essential to its function • In proteins, not only the state of lowest energy but also the next higher protonation states are commonly significantly populated and often play a functional role

  21. X-DEE Performance Characteristics • Total search keys generated depends approximately linearly on the number of states in L, which influence the number of search keys in two different ways: • Each additional state in L increases the number of states to be excluded from the search and thereby tends to increase the number of generated keys • Each additional state in L decreases the search space M \ L and thereby tends to decrease the number of generated keys • Ultimately, the number of search keys will decrease with the number of states in L. However, as long as L is small compared to M \ L, an approximately linear increase of the total number of search keys can be observed

  22. X-DEE Performance Characteristics • Computational cost of X-DEE depends approximately linearly on the size of the system and the number of states to be excluded from the search • For low energy states which are built up one after the other, the computational cost to determine an additional state remains on average constant.

  23. Improved Pruning algorithms and Divide-and-Conquer strategiesfor Dead-End Elimination, with application to protein design Ivelin Georgiev1, Ryan H. Lilien, Bruce R. Donald 2006

  24. DACS Motivation • DACS: a provably-accurate divide-and-conquer enhancement to traditional-DEE. • Protein design for a rigid backbone and using rotamers and a pairwise energy function is provably NP-hard • Desire for provable, deterministic algorithms which make real guarantees (as opposed to heuristic methods, Monte Carlo, genetic algorithms, etc)

  25. Traditional DEE • The DEE criterion uses rotameric energy interactions to identify and prune rotamers that are provably not part of the GMEC • A target rotamer is pruned if a competitor rotamer is found such that the lowest possible energy among conformations containing the competitor rotamer is higher than the worst possible energy among conformations containing the target • DEE does not guarantee a unique solution: multiple unpruned conformations may remain after pruning with DEE is exhausted. • If this happens, the DEE pruning stage is be followed by an enumeration stage, in which the remaining conformations are examined and the GMEC is identified – exponential time • One improvement is to partition the search space

  26. split-DEE and DACS • By partitioning the conformational search space, split-DEE enhances the pruning efficiency of traditional-DEE • In split-DEE, the conformational space can be divided into several partitions, such that for each partition, there is some competitor that has better conformational energies than a rotamer within that partition • The advantage of split-DEE is that no single competitor is required to outperform a rotamer for every conformation as long as there exists a different dominant competitor for each partition, a rotamer can be pruned • We can still do better: • DACS enhances split-DEE by performing DEE pruning within individual partitions

  27. DACS as an enhancement to split-DEE(Divide-And-Conquer Splitting) Like in split-DEE, the conformational space is divided into partitions

  28. DACS as an enhancement to split-DEE(Divide-And-Conquer Splitting) Like in split-DEE, the conformational space is divided into partitions Within each partition, DEE pruning is applied to determine if there is a competitor rotamer at a residue that always outperforms our original rotamer

  29. DACS as an enhancement to split-DEE(Divide-And-Conquer Splitting) Like in split-DEE, the conformational space is divided into partitions Within each partition, DEE pruning is applied to determine if there is a competitor rotamer at a residue that always outperforms our original rotamer If DEE pruning does not produce a unique solution, enumeration of the conformations in the current partition must be performed by A*

  30. DACS as an enhancement to split-DEE(Divide-And-Conquer Splitting) Like in split-DEE, the conformational space is divided into partitions Within each partition, DEE pruning is applied to determine if there is a competitor rotamer at a residue that always outperforms our original rotamer If DEE pruning does not produce a unique solution, enumeration of the conformations in the current partition must be performed by A*. The lowest-energy conformation among the local rigid-GMECs for all partitions is the overall rigid-GMEC

  31. split-Flags • The general advantage of DACS over split-DEE is the ability to prune an additional combinatorial subset of the conformational space by exploiting partition-specific prunings • The DEE pruning stage in DACS can incorporate any combination of the available provably-accurate traditional-DEE techniques • The split-flags (Gordon et al., 2003) algorithm has similar intent • If a target rotamer cannot be pruned for all partitions, the partitions in which it can be pruned are flagged as dead-ending. • Like DACS, split-flags uses pruning information discarded by split-DEE

  32. split-Flags vs DACS • One advantage of DACS over split flags stems from the divide-and-conquer paradigm. • The cost of expanding the A search tree depends combinatorially on the number of rotamers for each residue position • A divide-and-conquer approach (which reduces the number of rotamers in each partition) is more efficient than directly finding the global solution • A bonus of divide and conquer approaches is that they are naturally parallelizable, reducing real-world running time

  33. min-DEE Overview • Used when the protein design process incorporates rotameric energy minimization (DEE no longer provably-accurate) • MinDEE is similar to traditional-DEE in that rotameric energy interactions are used to determine which rotamers are provably not part of the minGMEC and can be pruned. • MinDEE guarantees that no rotamers are pruned which belong to the conformation with the lowest energy among all energy-minimized conformations • Since rotamers are allowed to energy-minimize, lower and upper bounds on the self- and pairwise rotamer energies must be used, instead of the rigid-energy terms

  34. min-DEE vs. DEE • Without energy minimization, a rotamer stays in the same rigid conformation, independent of the rotamer identities for the remaining residues. • With energy minimization, a rotamer may minimize from its initial conformation in order to accommodate a change in another rotamer • So that one rotamer does not minimize into another, rotameric movement is constrained to a voxel of conformation space • The most significant difference between traditional-DEE and MinDEE is the accounting for possible energy changes during minimization

  35. DACS and minDEE • It’s straightforward to modify DACS to incorporate energy minimization • To only prune rotamers that are provably not part of the minGMEC, the traditional-DEE criteria in the DEE cycle of DACS must be discarded and their MinDEE equivalents used instead

  36. MinDEE/A* • Incorporates splitting, MinBounds (a provably-correct with energy minimization approach analogous to (Gordon et al., 2003) for traditional-DEE), and DACS for MinDEE • A* is then applied in the enumeration stage to extract the minGMEC from the set of remaining conformations. • Similar to DACS, the lowest-energy conformation among the rigid-GMECs for all mutation sequences is identified as the overall rigid-GMEC

  37. DACS / MinDEE-A*Performance • Partition specific prunings • By using a divide-and-conquer approach to partition the conformational space and identify partition-specific prunings, DACS allows for additional elimination, after pruning with the original split-DEE and split flags techniques is exhausted. • Reduced cost of expending A* search trees • The improved execution times of DACS stems from the reduced cost of expanding the A search trees for each partition, resulting from the divide-and-conquer approach as opposed to expanding the single A tree for the full conformational space. • Increased pruning efficiency • MinDEE benefits from increased pruning efficiency, and so works best on MinDEE/A larger systems where the cost of expanding the search tree in the enumeration stage dominates the computation (rather than the energy minimization).

More Related