1 / 79

Systematic Conformational Search with Constraint Satisfaction

Systematic Conformational Search with Constraint Satisfaction. Lisa Tucker Kellogg Ph.D Thesis Massachusetts Institute of Technology June 2002. Protein Conformation. Proteins can have more than one shape Three rotatable bonds per residue Conformations:

dorcas
Download Presentation

Systematic Conformational Search with Constraint Satisfaction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Systematic Conformational Search with Constraint Satisfaction Lisa Tucker Kellogg Ph.D Thesis Massachusetts Institute of Technology June 2002 Systematic Conformational Search with Constraint Satisfaction

  2. Protein Conformation • Proteins can have more than one shape • Three rotatable bonds per residue • Conformations: • Set of possible 3-D arrangements of atoms • Related to protein folding problem: • Find Conformation with lowest Gibbs Free Energy • Finding conformations is high scientific priority Systematic Conformational Search with Constraint Satisfaction

  3. NMR Constraints • Two types: • Distances between particular pairs of atoms • Dihedral Angles for rotatable Bonds • The ability to determine all conformations consistent with constraints aids in NMR error analysis and confidence measures • NMR is an important tool for solving structures • Algorithms can be used with docking, homology modeling as well Systematic Conformational Search with Constraint Satisfaction

  4. This is Hard! • Proteins can have 103-104 degrees of freedom • Conformation space has exponential number of degrees of freedom • Non-linear changes • Searching whole space may be as hard as folding • GOAL: Provide methods to guarantee minimum interval of coverage Systematic Conformational Search with Constraint Satisfaction

  5. Conformational Search Algorithms • Usually consist of two parts: • Engine that generates conformations • Module to evaluate conformations • Systematic (exhaustive) • Search using predefined intervals, order • Guaranteed coverage • Slow • Stochastic • Simulated Annealing, Distance Geometry algorithms • Currently more popular • Creates good hypotheses quickly • Some algorithms rank conformations Systematic Conformational Search with Constraint Satisfaction

  6. Cool Stuff in this Thesis • Improvements to Systematic Conformational Search • Voxel Model • Divide & Conquer • OmniMerge • Propagation • A* • Uses Systematic Conformational Search to solve a new structure • Goals: • Enumerate all conformations that satisfy set of constraints • Use a systematic method to guarantee interval of coverage • Invest in up-front computations to save time later Systematic Conformational Search with Constraint Satisfaction

  7. Systematic Conformational Search • Basic Algorithms • Voxel Model • Minimization • Divide-and-Conquer Systematic Conformational Search with Constraint Satisfaction

  8. Gridsearch Systematic Conformational Search with Constraint Satisfaction

  9. TreeSearch

  10. Comparing Gridsearch & TreeSearch Systematic Conformational Search with Constraint Satisfaction

  11. Comparing Gridsearch & TreeSearch Systematic Conformational Search with Constraint Satisfaction

  12. Search Completeness • Want to rule out higher dimension slab of space based on evaluating lower-dimensional fragment • However, conformations that respond to regularly spaced grid points don’t capture everything Systematic Conformational Search with Constraint Satisfaction

  13. Rotamers • Certain angles of rotation cause steric clashes • Causes set of torsional angles to be divided into ranges • Rotamers to refer to likely neighborhood of conformations for any molecular fragment • Patterns of low-energy conformations • Ranges of 60±ø, 180 ±ø, 300 ±ø • Have option of searching rotamers first or calibrating resolution to match regular intervals of rotamers • Treesearch uses [60°,180°, 300°] • “Succeed first” approach Systematic Conformational Search with Constraint Satisfaction

  14. Voxels • Unit of higher-dimensional volume • Like pixel is to 2-D • New idea • This thesis is first application of voxel model • Evaluate voxels instead of points on grid • Ask whether there exists any conformation in voxel that satisfies constraints • No general, perfectly accurate way to test if conformation exists in volume. • But we can do much better than just one point • Try and center voxels on rotamers Systematic Conformational Search with Constraint Satisfaction

  15. Voxelized Treesearch

  16. Minimization as Search • Convert constraints to objective function • Use minimization of constraint violation function as a heuristic for searching within voxel for satisfying conformations • Always start in same place • Search for points that minimize objective function • If sufficiently close to zero: SUCCESS • If max iterations reached: FAILURE • Choose local minimization for finding conformations in voxel Systematic Conformational Search with Constraint Satisfaction

  17. Minimization within Treesearch • First voxels will have 1 dimension, then two dimensions, etc. • Initialize starting conformations so first d torsions have same solution found previously • Only choose d+1 torsion arbitrarily • If first pass fails, start at midpoint • If performing more than two passes, start at random points • This only works with minimization, as grid-based algorithms always start in the same place Systematic Conformational Search with Constraint Satisfaction

  18. Multi-Resolution Search • Evaluating voxels using minimization followed by gridsearch • If low resolution search fails, search systematically at higher resolution • Start at 120º, then go to 30º • Much faster than performing gridsearch on entire space • Allows for stochastic nature of minimization Systematic Conformational Search with Constraint Satisfaction

  19. Results • Tetrapeptide of polyalanine • Two tests with treesearch • Easily satisfied distance constraints • Extensive set of distance constraints • Random sampling & gridsearch on both problems • Minimization used on both • Compare different methods of voxel evaluation Systematic Conformational Search with Constraint Satisfaction

  20. Voxel Evaluation Minimization with voxels finds the most conformations. Systematic Conformational Search with Constraint Satisfaction

  21. Evaluating Voxels with Tighter Constraints Systematic Conformational Search with Constraint Satisfaction

  22. Divide-and-Conquer • Want to prune regions of conformational space based on evaluation of low dimensional pieces • Improves on treesearch • Evaluates every piece before adding • Once subproblem is solved, answer saved • Can define subproblems so average size smaller than treesearch Systematic Conformational Search with Constraint Satisfaction

  23. Active Constraints • Constraints on a subchain denote set of constraints active when the torsions in only that subchain are instantiated • Satisfying conformation is one that satisfies all active constraints • Example: • Inter-atomic distance constraint on residue 6 • Applies to subchain 1-99 • Not subchain 3-5 Systematic Conformational Search with Constraint Satisfaction

  24. Combine Operation • Use satisfying conformations for two pieces of subchain to define possible candidates for the whole chain • For residues x-z • with subchains x-y, y+1-z • Evaluate individually as voxel with minimization • Higher-dimension voxels are significantly more difficult • Each candidate is unique voxel Systematic Conformational Search with Constraint Satisfaction

  25. Merge-Trees • Each node corresponds to subchain • Root is whole molecule • Traverse starting at leaves • Legal Tree • Leaves have one-to-one correspondence with residues • Left-to-right order of leaves is same as residues • Each internal node has two children Systematic Conformational Search with Constraint Satisfaction

  26. D&C as Merge-Tree Traversal • Algorithm • Create default merge tree based on number of residues • Traverse tree from bottom to top • Solve subproblem at each node • Subproblem: • Enumerate satisfying voxels for subchain at node • Leaf node: use treesearch w/voxels, minimization • Internal node: combine operation on child subchains • Tree should be as balanced as possible Systematic Conformational Search with Constraint Satisfaction

  27. Treesearch vs D & C (a) Treesearch (b) D&C, linear merge tree (c) D&C, balanced merge tree • Treesearch is like divide-and-conquer with a linear tree Systematic Conformational Search with Constraint Satisfaction

  28. Effect of Divide-and-Conquer 1. Polyalanine tetrapeptide 2. 16 residue Polyalanine helix Systematic Conformational Search with Constraint Satisfaction

  29. Divide-and-Conquer Results • 1RST Peptide • 9-residue Strep-tag peptide from peptide-streptavidin complex • Long, flexible side chains • 40 rotatable torsions • Searched sidechains at 120° vs 40° for backbone Systematic Conformational Search with Constraint Satisfaction

  30. Systematic Conformational Search with Constraint Satisfaction

  31. Real World Results Solving a new structure! Systematic Conformational Search with Constraint Satisfaction

  32. Structure Experiment • Used systematic conformational search to solve new tripeptide • N-formyl-L-Met-L-Leu-L-Phe-OH • (f-MLF-OH) • NMR Data • 16 distances • 18 Torsion angle constraints • Simulated annealing found 24 conformations • New algorithm found 56,975 conformations Systematic Conformational Search with Constraint Satisfaction

  33. …And the Rest of the Story • First fMLF structure different from the published one. • The constraints were slightly different • The "correct" structure was disallowed by the constraints. • Another completely different structure was allowed. • NMR folks tried some fixes: • Added error padding to all the constraints to allow the "right" answer • Came up with one more new constraint to rule out the "wrong" answer • Rethought their methods for processing the raw data • New analysis yielded constraints that were satisfied by the "right" answer and didn't allow any other answers. Systematic Conformational Search with Constraint Satisfaction

  34. The Need for Systematic Search • If NMR folks used simulated annealing/distance geometry: • They never would have discovered the flaw in their data processing • This is greatest real-world triumph of systematic search methods that author knows of • Many of the better structural biologists are aware of the potential for this sort of thing and are therefore primed to embrace systematic methods Systematic Conformational Search with Constraint Satisfaction

  35. Break! • When we return: • Merge Strategy Optimization Systematic Conformational Search with Constraint Satisfaction

  36. Merge Strategy Optimization • How does choice of Merge-Tree influence runtime? • Low-dimensional searches are so much faster than high-dimensional searches, practically free • Do extra searches • Search all possible subchains of size n before searching size n+1 • Choose which subchains to merge • Adapt divide-and-conquer for constraint satisfaction • Invest computational time to save time later Systematic Conformational Search with Constraint Satisfaction

  37. Innovations • Omni-Merge • Search all possible subchains • Merge-Tree cost functions • Find optimal merge-tree • Propagation • Enforces compatibility between overlapping subchains • Share information to filter out bad conformations • A* • Order subchain searches based on costs Systematic Conformational Search with Constraint Satisfaction

  38. Importance of Choosing a Good Merge-Tree • Divide-and-conquer works best with equal sized subproblems • Not all subproblems have equal runtime • Constraints are not uniformly distributed • Default balanced-tree • Not always optimal • Can be exponentially worse than optimal Systematic Conformational Search with Constraint Satisfaction

  39. Example • Polyalanine alpha-helice, N=2k residues • 2k ideal for default trees. k+1 levels • Constrain atom i and atom j if: • Residue of i is <= N/2 • Residue of j > n/2 • No constraints in either half • Won’t appear until final combine operation • High Dimensional, unconstrained subchains • Slow! Systematic Conformational Search with Constraint Satisfaction

  40. Counter-Example • Same molecule • Same Constraints • Residue of i is <= N/2 • Residue of j > n/2 • No constraints in either half • Won’t appear until final combine operations • All merges cross boundary • Usually this tree is bad! Systematic Conformational Search with Constraint Satisfaction

  41. Example - Results • Number of good conformations for a subchain is exponential in the length of the subchain • So cost of searching this molecule with default mergetree tree is O(e^N) Systematic Conformational Search with Constraint Satisfaction

  42. Choosing Good Merge-Trees • Considerations: • Small subchains can be searched more quickly than large ones • Constraints not distributed evenly • Constraints more important consideration than the fullness of tree • Want to construct merge-trees that reflect this • Divide and Conquer still useful for isolating highly constrained parts of molecule • Exploit Locality and Ordering Systematic Conformational Search with Constraint Satisfaction

  43. Locality • Good merge-trees define sub-problems with as few satisfying conformations as possible • Want to define subchains with lots of constraints • Few satisfying conformations • If there is a constraint between atoms i and j • It is more likely that atoms near i and j will also be constrained • Subchains with short length are not necessarily the ones with the least number of satisfying conformations Systematic Conformational Search with Constraint Satisfaction

  44. Ordering • Search order has a big effect on treesearch • We want to search subchains in similar manner • Add poorly-constrained subchains as late as possible • Put unconstrained chains near root Systematic Conformational Search with Constraint Satisfaction

  45. Cost Function • Provide lower bound on run time • Cost of non-leaf from scratch is cost of all nodes in subtree Systematic Conformational Search with Constraint Satisfaction

  46. Computing the Optimum Cost • Root just another internal node • TreeCost • Enumerate all possible merge-trees Systematic Conformational Search with Constraint Satisfaction

  47. Computing the Optimum Merge Tree • Use dynamic programming • Computing table of BestTreeCosts gives optimal merge tree • Start at top, work down Systematic Conformational Search with Constraint Satisfaction

  48. Omnimerge • Final combine operation consumes a large part of the total runtime • Can we benefit from extra lower-dimensional searches? • Search all subchains Systematic Conformational Search with Constraint Satisfaction

  49. Locally Optimal Merges • Perform all possible dipeptide merges • (1-2, 2-3, 3-4, etc...) • Assume cost of extra work is insignificant • Won't help if we're making tetrapeptides • Gives us choices in making tripeptides • Choose cheaper merge: • 1 & 2-3 or 1-2 & 3 Systematic Conformational Search with Constraint Satisfaction

  50. Omnimerge • Search all possible subchains in order of increasing size • Choose combinations based on minimizing node cost • Add cheapest merge to D.P. table • Trivially fills in successive rows of BestTreeCosts table Systematic Conformational Search with Constraint Satisfaction

More Related