An Extended Dead-End Elimination Algorithm to Determine Gap-Free Lists of Low Energy States

An Extended Dead-End Elimination Algorithm toDetermine Gap-Free Lists of Low Energy States EDDA KLOPPMANN, G. MATTHIAS ULLMANN, TORSTEN BECKER

Improved Pruning algorithms and Divide-and-Conquer strategiesfor Dead-End Elimination, with application to protein design Ivelin Georgiev1, Ryan H. Lilien, Bruce R. Donald 2006

Dead End Elimination Motivation • Structure determines function • Lowest free energy state is most probable by laws of thermodynamics • Direct calculation rarely possible So: • Conformation space is discretized • Allows for exhaustive search • Desire for an algorithm which deterministically finds the lowest energy state while circumventing combinatorial exhaustion

DEE Overview(Desment, et al, 1992) • Originally applied to predict side chain positions in homology modeling • Views proteins as a set of residues (sites), each of which may adopt a finite number of rotamers (forms) • DEE identifies the highest energy forms of sites which are incompatible with the state of lowest energy • High energy forms are considered dead-ends and pruned from consideration

DEE Overview Continued • DEE solves the combinatorial problem of identifying the global energy minimum for discrete pairwise system • Energy is expressed in terms of intrinsic energies of sites and pairwise interactions between sites • Each site adopts a discrete form that determines its contribution to the total energy

DEE Theory • DEE identifies and eliminate forms of sites which cannot contribute to the lowest energy conformation in order to circumvent an exhaustive search • The DEE criterion employs rotameric energy interactions to identify and prune rotamers that are provably not part of the GMEC. • DEE criterion compares the energy of two forms of a site μ, dμ and cμ • If all states that contain dμ are higher in energy than the corresponding states that contain cμ, dμis a dead end and removed from consideration

Motivation for X-DEE(Kloppmann, et al, 2007) • Proteins are flexible systems which may adapt several functionally relevant states • Preference for a more complete picture of the available low energy states • X-DEE produces a gap-free list of low energy states (i.e., complete up to a given distance from the global energy minimum) • Implemented to determine the lowest energy protonation states of proteins

X-DEE Intuition • General idea is to exclude a list of states from the search space explored by DEE in order to construct a gap-free list • Basic idea: If a gap-free list of k low energy states {x1, · · ·, xk} is already known, the (k + 1)th state can be found by restricting the search for the lowest energy state to the set of all states M excluding the set of already known states • General idea: restrict the DEE search space to a set M (complete set of states) \ L (list of states to be excluded) for any given list L of states. • In case L is not gap-free, identify the state of lowest energy not included in L until a gap-free list of low energy states is obtained.

Excluding a list of states from consideration • There is no straightforward way to exclude an arbitrary list of states L from the search space explored by DEE • So, we aim to restrict a DEE search to a specific type of subset of M: • Fixing a number of sites during a DEE search yields the state of lowest energy of a subset S of M characterized by the forms of the fixed sites • So, applying DEE to the subset S of those states that have form f at site s will determine the state of lowest energy with form f at site s • How do we do this?

Constructing a Search Bias • The idea of X-DEE is to derive a search basis B composed of a set of search keys bS, such that L is excluded from the search and the complete set M \ L is searched. • The authors present a recursive procedure “CreateSearchBias” which given the list of states L to be excluded, constructs a search bias keys • Initial conditions • List L of states to be excluded from the search • Associated list vector T that contains an element for each site which keeps track of the sites which are already fixed to specific forms • Initially, all sites are unfixed (i.e., undefined)

Constructing a Search Bias: Overview • With each recursion, L is divided into sublists and one additional site is fixed in the associated list vectors. CreateSearchBasis terminates when all sites of a list vectorare fixed. • With each recursion, search keys can be generated that differ from the list vector in one form. The search keys are added to the search basis B. • CreateSearchBasis generates a set of search keys bS characterizing subsets S whose union represent M\ L.

Introducing Search Keys • This subset S can be represented by a so-called search key bS = (h1, ∗2, · · · , ∗μ, · · · , ∗N), where: • h is the specified form of site 1 and ∗ indicates that this site is undefined (the idea being undefined sites will be determined during the DEE search) • For each site μ of the system, these search keys have a component bμ which is either fixed to a specific form or undefined. • X-DEE will define search keys bS = (b1, · · · , bμ, · · · , bN) such that the subsets S represented by the individual search keys together represent M \ L. • Determining the state of lowest energy of all subsets via the DEE algorithm yields the desired state of lowest energy of M \ L.

Recursive CreateSearchBias (L, T) Base case: Return if T does not contain any undefined sites.

Recursive CreateSearchBias (L, T) Base case: Return if T does not contain any undefined sites.. Find a site μ with unused forms (i.e., forms which are not present in any of the state vectors in L). If no such site exists, choose the first undefined site and jump to step 4.

Recursive CreateSearchBias (L, T) Create a search key: For each unused form h of site μ, a search key b is defined by copying the list vector t to b and fixing site μ to form h in b; bμ = h. So, each search key differs from the current list vector only at site μ. Fixing site μ to forms h not occurring in , guarantees that the subset represented by b and L are disjoint, i.e., b represents a subset of M \ L. Now add b to the search basis B.

Recursive CreateSearchBias (L, T) Divide the vectors L into sublists such that site μ has form g in all state vectors x in Lsub, i.e., xμ = g for all states in Lsub. To each sublistLsub, a separate list vector tsub is assigned by copying list vector t to tsub and fixing site μ to the form g common to all state vectors in Lsub; tμ = g.

Recursive CreateSearchBias (L, T) Divide the vectors L into sublists such that site μ has form g in all state vectors x in Lsub, i.e., xμ = g for all states in Lsub. To each sublistLsub, a separate list vector tsub is assigned by copying list vector t to tsub and fixing site μ to the form g common to all state vectors in Lsub; tμ = g. 5. Recurseon each sublist Ls and its list vector t

Using the Search Keys • All search keys in B are subjected to a DEE search yielding the states of lowest energies of the represented subsets S. • These states include the state of lowest energy of M \ L. • The completeness of the Search Bias B is provable • Basic idea is to show (i) all subsets of states S represented by the search keys are subsets of M\ L and that (ii) the union of all subsets S represent the complete set M\ L

X-DEE Application Domain • On the right: light absorption triggers Bacteriorhodopsin’s pumping cycle during which a proton is transferred from the cytoplasm to the extracellularspace. • Basic idea: Proteins contain protonatable residues whose charged state depends on their interaction with the protein environment. • These protonatable residues are treated as sites and each site with each site adopting one of two forms (protonated, unprotonated).

X-DEE Application Domain • Charge distribution of a protein is essential to its function • In proteins, not only the state of lowest energy but also the next higher protonation states are commonly significantly populated and often play a functional role

X-DEE Performance Characteristics • Total search keys generated depends approximately linearly on the number of states in L, which influence the number of search keys in two different ways: • Each additional state in L increases the number of states to be excluded from the search and thereby tends to increase the number of generated keys • Each additional state in L decreases the search space M \ L and thereby tends to decrease the number of generated keys • Ultimately, the number of search keys will decrease with the number of states in L. However, as long as L is small compared to M \ L, an approximately linear increase of the total number of search keys can be observed

X-DEE Performance Characteristics • Computational cost of X-DEE depends approximately linearly on the size of the system and the number of states to be excluded from the search • For low energy states which are built up one after the other, the computational cost to determine an additional state remains on average constant.

Improved Pruning algorithms and Divide-and-Conquer strategiesfor Dead-End Elimination, with application to protein design Ivelin Georgiev1, Ryan H. Lilien, Bruce R. Donald 2006

DACS Motivation • DACS: a provably-accurate divide-and-conquer enhancement to traditional-DEE. • Protein design for a rigid backbone and using rotamers and a pairwise energy function is provably NP-hard • Desire for provable, deterministic algorithms which make real guarantees (as opposed to heuristic methods, Monte Carlo, genetic algorithms, etc)

Traditional DEE • The DEE criterion uses rotameric energy interactions to identify and prune rotamers that are provably not part of the GMEC • A target rotamer is pruned if a competitor rotamer is found such that the lowest possible energy among conformations containing the competitor rotamer is higher than the worst possible energy among conformations containing the target • DEE does not guarantee a unique solution: multiple unpruned conformations may remain after pruning with DEE is exhausted. • If this happens, the DEE pruning stage is be followed by an enumeration stage, in which the remaining conformations are examined and the GMEC is identified – exponential time • One improvement is to partition the search space

split-DEE and DACS • By partitioning the conformational search space, split-DEE enhances the pruning efficiency of traditional-DEE • In split-DEE, the conformational space can be divided into several partitions, such that for each partition, there is some competitor that has better conformational energies than a rotamer within that partition • The advantage of split-DEE is that no single competitor is required to outperform a rotamer for every conformation as long as there exists a different dominant competitor for each partition, a rotamer can be pruned • We can still do better: • DACS enhances split-DEE by performing DEE pruning within individual partitions

DACS as an enhancement to split-DEE(Divide-And-Conquer Splitting) Like in split-DEE, the conformational space is divided into partitions

DACS as an enhancement to split-DEE(Divide-And-Conquer Splitting) Like in split-DEE, the conformational space is divided into partitions Within each partition, DEE pruning is applied to determine if there is a competitor rotamer at a residue that always outperforms our original rotamer

DACS as an enhancement to split-DEE(Divide-And-Conquer Splitting) Like in split-DEE, the conformational space is divided into partitions Within each partition, DEE pruning is applied to determine if there is a competitor rotamer at a residue that always outperforms our original rotamer If DEE pruning does not produce a unique solution, enumeration of the conformations in the current partition must be performed by A*

DACS as an enhancement to split-DEE(Divide-And-Conquer Splitting) Like in split-DEE, the conformational space is divided into partitions Within each partition, DEE pruning is applied to determine if there is a competitor rotamer at a residue that always outperforms our original rotamer If DEE pruning does not produce a unique solution, enumeration of the conformations in the current partition must be performed by A*. The lowest-energy conformation among the local rigid-GMECs for all partitions is the overall rigid-GMEC

split-Flags • The general advantage of DACS over split-DEE is the ability to prune an additional combinatorial subset of the conformational space by exploiting partition-specific prunings • The DEE pruning stage in DACS can incorporate any combination of the available provably-accurate traditional-DEE techniques • The split-flags (Gordon et al., 2003) algorithm has similar intent • If a target rotamer cannot be pruned for all partitions, the partitions in which it can be pruned are flagged as dead-ending. • Like DACS, split-flags uses pruning information discarded by split-DEE

split-Flags vs DACS • One advantage of DACS over split flags stems from the divide-and-conquer paradigm. • The cost of expanding the A search tree depends combinatorially on the number of rotamers for each residue position • A divide-and-conquer approach (which reduces the number of rotamers in each partition) is more efficient than directly finding the global solution • A bonus of divide and conquer approaches is that they are naturally parallelizable, reducing real-world running time

min-DEE Overview • Used when the protein design process incorporates rotameric energy minimization (DEE no longer provably-accurate) • MinDEE is similar to traditional-DEE in that rotameric energy interactions are used to determine which rotamers are provably not part of the minGMEC and can be pruned. • MinDEE guarantees that no rotamers are pruned which belong to the conformation with the lowest energy among all energy-minimized conformations • Since rotamers are allowed to energy-minimize, lower and upper bounds on the self- and pairwise rotamer energies must be used, instead of the rigid-energy terms

min-DEE vs. DEE • Without energy minimization, a rotamer stays in the same rigid conformation, independent of the rotamer identities for the remaining residues. • With energy minimization, a rotamer may minimize from its initial conformation in order to accommodate a change in another rotamer • So that one rotamer does not minimize into another, rotameric movement is constrained to a voxel of conformation space • The most significant difference between traditional-DEE and MinDEE is the accounting for possible energy changes during minimization

DACS and minDEE • It’s straightforward to modify DACS to incorporate energy minimization • To only prune rotamers that are provably not part of the minGMEC, the traditional-DEE criteria in the DEE cycle of DACS must be discarded and their MinDEE equivalents used instead

MinDEE/A* • Incorporates splitting, MinBounds (a provably-correct with energy minimization approach analogous to (Gordon et al., 2003) for traditional-DEE), and DACS for MinDEE • A* is then applied in the enumeration stage to extract the minGMEC from the set of remaining conformations. • Similar to DACS, the lowest-energy conformation among the rigid-GMECs for all mutation sequences is identified as the overall rigid-GMEC

DACS / MinDEE-A*Performance • Partition specific prunings • By using a divide-and-conquer approach to partition the conformational space and identify partition-specific prunings, DACS allows for additional elimination, after pruning with the original split-DEE and split flags techniques is exhausted. • Reduced cost of expending A* search trees • The improved execution times of DACS stems from the reduced cost of expanding the A search trees for each partition, resulting from the divide-and-conquer approach as opposed to expanding the single A tree for the full conformational space. • Increased pruning efficiency • MinDEE benefits from increased pruning efficiency, and so works best on MinDEE/A larger systems where the cost of expanding the search tree in the enumeration stage dominates the computation (rather than the energy minimization).

An Extended Dead-End Elimination Algorithm to Determine Gap-Free Lists of Low Energy States

An Extended Dead-End Elimination Algorithm to Determine Gap-Free Lists of Low Energy States

Presentation Transcript

Low Energy Housing An Overview

Dead End Filtration

Cutting an In-Line Dead-end Single Phase

Low Energy Buildings An Overview

An Extended GHKM Algorithm for Inducing λ -SCFG

Dead Code Elimination

Bayesian Networks Bucket Elimination Algorithm

Gap-filling algorithm

Finding Dead-end Metabolites

Dead-End Metabolites

EndRE : An End-System Redundancy Elimination Service

Extended Euclidean Algorithm

Lock-Free concurrent algorithm for Linked lists: Verification

A Halting Algorithm to Determine the Existence of Decoder

In silico Protein Design: Implementing Dead-End Elimination algorithm

Dead-End Elimination for Protein Design with Flexible Rotamers

The Elimination Algorithm

Extended Baum-Welch algorithm

Extended Access Control Lists

Extended Euclidean Algorithm

End to End Encryption using QKD Algorithm