1 / 64

Robotics Algorithms for the Study of Protein Structure and Motion

Robotics Algorithms for the Study of Protein Structure and Motion. Jean-Claude Latombe Computer Science Department Stanford University. Protein. Long sequence of amino-acids (dozens to thousands), from a dictionary of 20 distinct amino-acids. Central Dogma of Molecular Biology.

dagmar
Download Presentation

Robotics Algorithms for the Study of Protein Structure and Motion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Robotics Algorithms for the Study of Protein Structure and Motion Jean-Claude Latombe Computer Science DepartmentStanford University

  2. Protein Long sequence of amino-acids (dozens to thousands), from a dictionary of 20 distinct amino-acids

  3. Central Dogma of Molecular Biology Physiological conditions: aqueous solution, 37°C, pH 7, atmospheric pressure

  4. Why Proteins? • They are the workhorses of living organisms • They perform many vital functions, e.g.: • catalysis of reactions • storage of energy • transmission of signals • building blocks of muscles • They raise challenging computational issues • Large molecules (100s to several 1000s of atoms) • Made of building blocks drawn from a small “dictionary” • Unusual kinematic structure • They are associated with many critical problems • Folded structure determination • Global and local structural similarities • Prediction of folding and binding motions

  5. peptide group side-chain group f-y Kinematic Linkage Model

  6. Molecule and Robot

  7. Two problems • Structure determination from electron density maps • Inverse kinematics techniques [Itay Lotan, Henry van den Bedem, Ashley Deacon (Joint Center for Structural Genomics)] • Energy maintenance during Monte Carlo simulation • Collision detection techniques [Itay Lotan, Fabian Schwarzer, and Danny Halperin (Tel Aviv University)]

  8. NMR spectrometry Structure Determination/Prediction • Experimental tools • Computational tools • Homology, threading • Molecular dynamics X-ray crystallography

  9. Protein Data Bank Only about 10% of structures have been determined for known protein sequences  Protein Structure Initiative (PSI) 1990  250 new structures 1999  2500 new structures 2000  >20,000 structures total 2004  ~30,000 structures total

  10. X-Ray Crystallography

  11. Automated Model Building • Software systems: RESOLVE, TEXTAL, ARP/wARP, MAID • 1.0Å < d < 2.3Å ~ 90% completeness • 2.3Å ≤ d < 3.0Å ~ 67% completeness (varies widely)1 1.0Å 3.0Å JCSG: 43% of data sets  2.3Å •  Manually completing a model: • Labor intensive, time consuming • Existing tools are highly interactive  Model completion is high-throughput bottleneck 1Badger (2003) Acta Cryst. D59

  12. The Completion Problem • Input: • Electron-density map • Partial structure • Two anchor residues • Amino-acid sequence of missing fragment (typically 4 – 15 residues long) • Output: • Few candidate conformation(s) of fragment that • Respect the closure constraint (IK) • Maximize match with electron-density map

  13. T IK Problem • Input: • Closed kinematic chain with n > 6 degrees of freedom • Relative positions/orientations X of end frames • Target functionT(Q) → R • Output: • Joint angles Q that • Achieve closure • OptimizeT

  14. Related Work Biology/Crystallography • Exact IK solvers • Wedemeyer & Scheraga ’99 • Coutsias et al. ’04 • Optimization IK solvers • Fine et al. ’86 • Canutescu & Dunbrack Jr. ’03 • Ab-initio loop closure • Fiser et al. ’00 • Kolodny et al. ’03 • Database search loop closure • Jones & Thirup ’86 • Van Vlijman & Karplus ’97 • Semi-automatic tools • Jones & Kjeldgaard ’97 • Oldfield ’01 Robotics/Computer Science • Exact IK solvers • Manocha & Canny ’94 • Manocha et al. ’95 • Optimization IK solvers • Wang & Chen ’91 • Redundant manipulators • Khatib ’87 • Burdick ’89 • Motion planning for closed loops • Han & Amato ’00 • Yakey et al. ’01 • Cortes et al. ’02, ’04

  15. Two-Stage IK Method • Candidate generations Closed fragments • Candidate refinement Optimize fit with EDM

  16. Stage 1: Candidate Generation • Generate random conformation of fragment (only one end attached to anchor) • Close fragment (i.e., bring other end to second anchor) using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)

  17. moving end fixed end Closure Distance Closure Distance: A.A. Canutescu and R.L. Dunbrack Jr.Cyclic coordinate descent: A robotics algorithm for protein loop closure. Prot. Sci. 12:963–972, 2003. Compute + bias toward EDM + avoid steric clashes

  18. dq3 dq2 (q1,q2,q3) dq1 Stage 2: Candidate Refinement • Target function T (Q)measuring quality of the fit with the EDM • Minimize T while retaining closure • Closed conformations lie on a self-motion manifold of lower dimension Null space 1-D manifold

  19. y = T(Q) Closure and Null Space • dX = J dQ, where J is the 6n Jacobian matrix (n > 6) • Null space {dQ | J dQ = 0} has dim = n – 6 • N: orthonormal basis of null space • Pseudo-inverse J+ such that JJ+ = I • dQ = J+dX + NNTy

  20. 0 NT (n-6) basis N of null space s1 s2 Gram-Schmidt orthogonalization s6 Computation of J+ and N SVD of J S66 dX U66 VT6n dQ = J+ = V S+ UT where S+=diag[1/si]

  21. Refinement Procedure Repeat until minimum is reached: • Compute J, J+ and N at current Q • Compute T at current Q(analytical expression of T + linear-time recursive computation [Abe et al., Comput. Chem., 1984]) • Move along dQ = J+dX + NNT T until minimum is reached or closure is broken + Monte Carlo + simulated annealing protocol to deal with local minima

  22. Monte Carlo Optimization Repeat: • Perform a random move of the fragment: • either by picking a random direction in null space • or by using an exact IK solver over 6 dofs [Coutsias et al, 2004] ( big jumps) • Minimize T(Q) • Accept move with Metropolis-criterion probability ~exp(-DT/Temp)

  23. Tests #1: Artificial Gaps • TM1621 (234 residues) and TM0423 (376 residues), SCOP classification a/b • Complete structures (gold standard) resolved with EDM at 1.6Å resolution • Compute EDM at 2, 2.5, and 2.8Å resolution • Remove fragments and rebuild

  24. TM1621 103 Fragments from TM1621 at 2.5Å Short Fragments: 100% < 1.0Å aaRMSD Long Fragments: 12: 96% < 1.0Å aaRMSD 15: 88% < 1.0Å aaRMSD Produced by H. van den Bedem

  25. Comparison Across Resolutions Resolution = 2.0Å Resolution = 2.5Å Resolution = 2.8Å

  26. Example: TM0423 PDB: 1KQ3, 376 res. 2.0Å resolution 12 residue gap Best: 0.3Å aaRMSD

  27. Tests #2: True Gaps • Structure computed by RESOLVE • Gaps completed independently (gold standard) • Example: TM1742 (271 residues) • 2.4Å resolution; 5 gaps left by RESOLVE Produced by H. van den Bedem

  28. TM0813 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap GLU-83 GLY-96

  29. TM0813 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap Best 0.6Å aaRMSD GLU-83 GLY-96

  30. TM1621 • Green: manually completed conformation • Cyan: conformation computed by stage 1 • Magenta: conformation computed by stage 2 • The aaRMSD improved by 2.4Å to 0.31Å

  31. Alr1529 D72-D78 resolution: 2.0Å initial model: ARP/wARP contour: 1.0s PDB: 1VJG aaRMSD: 0.33Å

  32. TM0542 • Top-scoring fragment in cyan • Manually completed fragment in green • Residues A259 and A260 are flipped

  33. A B Current/Future Work • Software actively being used at the JCSG • What about multi-modal loops?

  34. A323 Hist A316 Ser • TM0755: data at 1.8Å • 8-residue fragment crystallized in 2 conformations • Overlapping density: Difficult to interpret manually Algorithm successfully identified and built both conformations

  35. A B Current/Future Work • Software actively being used at the JCSG • What about multi-modal loops? • Fuzziness in EDM can then be exploited • Use EDM to infer probability measure over the conformation space of the loop

  36. Amylosucrase J. Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran. J. Comp. Chemistry, 25:956-967, 2004

  37. Energy maintenance during Monte Carlo simulation joint work with Itay Lotan, Fabian Schwarzer, and Dan Halperin11 Computer Science Department, Tel Aviv University

  38. Monte Carlo Simulation (MCS) • Random walk through conformation space • At each attempted step: • Perturb current conformation at random • Accept step with probability: • The conformations generated by an arbitrarily long MCS are Boltzman distributed, i.e., #conformations in V ~

  39. Monte Carlo Simulation (MCS) • Used to: • sample meaningful distributions of conformations • generate energetically plausible motion pathways • A simulation run may consist of millions of steps  energy must be evaluated frequently Problem: How to maintain energy efficiently?

  40. Energy Function • E = S bonded terms + S non-bonded terms+S solvation terms • Bonded terms- O(n) • Non-bonded terms- E.g., e.g. Van der Waals and electrostatic- Depend on distances between pairs of atoms-O(n2)  Expensive to compute • Solvation terms-Mayrequire computing molecular surface

  41. Non-Bonded Terms • Energy terms go to 0 when distance increases  Cutoff distance (6 - 12Å) • vdW forces prevent atoms from bunching up Only O(n) interacting pairs[Halperin&Overmars 98] Problem: How to find interacting pairswithout enumerating all atom pairs?

  42. dcutoff Grid Method • Subdivide 3-space into cubic cells • Compute cell that contains each atom center • Represent grid as hashtable

  43. dcutoff Grid Method • Θ(n) time to build grid • O(1) time to find interactive pairs for each atom • Θ(n) to find all interactive pairs of atoms [Halperin&Overmars, 98] • Asymptotically optimal in worst-case

  44. 0 Number k of DOF changes 20 5 30 10 Can we do better on average? • Few DOFs are changed at each MC step simulationof 100,000 attempted steps

  45. Can we do better on average? • Few DOFs are changed at each MC step • Proteins are long chain kinematics • Long sub-chains stay rigid at each step • Many partial energy sums remain constant Problem: How to retrieve the unchanged partial sums?

  46. Hierarchical Collision Checking • Widely used technique in robotics/graphics to approximate distances between objects • Pre-computation of bounding-volume hierarchy • How to update this hierarchy if the objects deform

  47. Two New Data Structures • ChainTree Fast detection of interacting atom pairs • EnergyTree Retrieval of unchanged partial energy sums

  48. ChainTree(Twofold Hierarchy: BVs + Transforms) links

  49. TNO TJK TAB ChainTree(Twofold Hierarchy: BVs + Transforms) joints

  50. Updating the ChainTree Update path to root: • Recompute transforms that “shortcut” the DOF change • Recompute BVs that contain the DOF change • O(k log(n/k)) work for k changes

More Related