1 / 52

Application of Probabilistic Roadmaps to the Study of Protein Motion

Application of Probabilistic Roadmaps to the Study of Protein Motion. Proteins. Proteins are the workhorses of all living organisms They perform many vital functions, e.g: Catalysis of reactions Transport of molecules Building blocks of muscles Storage of energy Transmission of signals

dewei
Download Presentation

Application of Probabilistic Roadmaps to the Study of Protein Motion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application of Probabilistic Roadmaps to the Study of Protein Motion

  2. Proteins • Proteins are the workhorses of all living organisms • They perform many vital functions, e.g: • Catalysis of reactions • Transport of molecules • Building blocks of muscles • Storage of energy • Transmission of signals • Defense against intruders • They are large molecules (few 100s to several 1000s of atoms) • They are made of building blocks (amino acids) drawn from a small “library” of 20 amino-acids • They have an unusual kinematic structure: long serial linkage (backbone) with short side-chains

  3. O N N N N O O O Protein Sequence (residue i-1) • Long sequence of amino-acids (dozens to thousands), also called residues • Dictionary of 20 amino-acids (several billion years old)

  4. Central Dogma of Molecular Biology Physiological conditions: aqueous solution, 37°C, pH 7, atmospheric pressure

  5. Molecular motion is an essential process of life Mad cow disease is caused by mis-folding Drug molecules act bybinding to proteins

  6. Stanford BioX cluster NMR spectrometer So, studying molecular motion is of critical importance in molecular biology However, few tools are available • Computer simulation: • Monte Carlo simulation • Molecular Dynamics

  7. Motion occurs at very different frequencies HIV-1 protease Low-frequency motions (diffusive motions) are more directly related to protein functions

  8. Two Major Drawbacks of MD and MC Simulation • Each simulation run yields a single pathway, while molecules tend to move along many different pathways  Interest in ensemble properties

  9. Two Major Drawbacks ofMD and MC Simulation • Each simulation run yields a single pathway, while molecules tend to move along many different pathways • Each simulation run tends to waste much time in local minima

  10. (x4,y4,z4) (x5,y5,z5) (x6,y6,z6) (x8,y8,z8) (x7,y7,z7) (x1,y1,z1) Kinematic Models • Atomistic model: The position of each atom is defined by its coordinates in 3-D space (x3,y3,z3) (x2,y2,z2) p atoms  3p parameters Drawback: The bond structure is not taken into account

  11. Kinematic Models • Linkage model: The protein consists of atoms connected by rotatable bonds

  12. Roadmap-Based Representation • Compact representation of many motion pathways • Coarse resolution relative to MC and MD simulation ( only low-frequency motions are represented) • Efficient algorithms for analyzing multiple pathways

  13. Initial WorkA.P. Singh, J.C. Latombe, and D.L. Brutlag. A Motion Planning Approach to Flexible Ligand Binding. Proc. 7th ISMB, pp. 252-261, 1999 • Study of ligand-protein binding • The ligand is a small flexible molecule, but the protein is assumed rigid • A fixed coordinate system P is attached to the protein and a moving coordinate system L is defined using three bonded atoms in the ligand • A conformation of the ligand is defined by the position and orientation of L relative to P and the torsional angles of the ligand

  14. Roadmap Construction (Node Generation) • The nodes of the roadmap are generated by sampling conformations of the ligand uniformly at random in the parameter space (around the protein) • The energy E at each sampled conformation is computed: E = Einteraction + Einternal Einteraction = electrostatic + van der Waals potential Einternal = Snon-bonded pairs of atoms electrostatic + van der Waals

  15. P = if Emin E  Emax Roadmap Construction (Node Generation) • The nodes of the roadmap are generated by sampling conformations of the ligand uniformly at random in the parameter space (around the protein) • The energy E at each sampled conformation is computed: E = Einteraction + Einternal Einteraction = electrostatic + van der Waals potential Einternal = Snon-bonded pairs of atoms electrostatic + van der Waals • A sampled conformation is retained as a node of the roadmap with probability:0 if E > Emax Emax-E Emax-Emin 1 if E < Emin  Denser distribution of nodes in low-energy regions of conformational space

  16. qi qi+1 q q’ Emax Roadmap Construction (Edge Generation) • Each node is connected to its closest neighbors by straight edges • Each edge is discretized so that between qi and qi+1 no atom moves by more than some ε (= 1Å) • If any E(qi) > Emax , then the edge is rejected E

  17. qi qi+1 q q’ Heuristic measureof energetic difficultyor moving from q to q’ Roadmap Construction (Edge Generation) • Any two nodes closer apart than some threshold distance are connected by a straight edge • Each edge is discretized so that between qi and qi+1 no atom moves by more than some ε (= 1Å) • If all E(qi)  Emax , then the edge is retained and is assigned two weights w(qq’) and w(q’q) where: (probability that the ligand moves from qi to qi+1 when it is constrained to move along the edge)

  18. Querying the Roadmap • For a given goal node qg (e.g., binding conformation), the Dijkstra’s single-source algorithm computes the lowest-weight paths from qg to each node (in either direction) in O(NlogN) time, where N = number of nodes • Various quantities can then be easily computed in O(N) time, e.g., average weights of all paths entering qg and of all paths leaving qg(~ binding and dissociation rates Kon and Koff) Protein: Lactate dehydrogenase Ligand: Oxamate (7 degrees of freedom)

  19. active site Computation of Potential Binding Conformations • Sample many (several 1000’s) ligand’s conformations at random around protein • Repeat several times: • Select lowest-energy conformations that are close to protein surface • Resample around them • Retain k (~10) lowest-energy conformations whose centers of mass are at least 5Å apart lactate dehydrogenase

  20. Experiments on 3 Complexes • PDB ID: 1ldm Receptor: Lactate Dehydrogenase (2386 atoms, 309 residues) Ligand: Oxamate (6 atoms, 7 dofs) • PDB ID: 4ts1 Receptor: Mutant of tyrosyl-transfer-RNA synthetase (2423 atoms, 319 residues) Ligand: L- leucyl-hydroxylamine (13 atoms, 9 dofs) • PDB ID: 1stp Receptor: Streptavidin (901 atoms, 121 residues) Ligand: Biotin (16 atoms, 11 dofs)

  21. Results for 1ldm • Some potential binding sites have slightly lower energy than the active site  Energy is not a discriminating factor • Average path weights (energetic difficulty) to enter and leave binding site are significantly greater for the active site  Indicates that the active site is surrounded by an energy barrier that “traps” the ligand

  22. Energy Conformation Potential bindingsite Potential bindingsite Active site

  23. Application of Roadmaps to Protein FoldingN.M. Amato, K.A. Dill, and G. Song. Using Motion Planning to Map Protein Folding Landscapes and Analyze Folding Kinetics of Known Native Structures. J. Comp. Biology, 10(2):239-255, 2003 • Known native state • Degrees of freedom: φ-ψ angles • Energy: van der Waals, hydrogen bonds, hydrophobic effect • New idea: Sampling strategy • Application: Finding order of SSE formation

  24. Sampling Strategy(Node Generation) • High dimensionality  non-uniform sampling • Conformations are sampled using Gaussian distribution around native state • Conformations are sorted into bins by number of native contacts (pairs of C atoms that are closeapart in native structure) • Sampling ends when all bins have minimum number of conformations  “good” coverage of conformational space

  25. Application: Order of Formation of Secondary Structures • The lowest-weight path is extracted from each denatured conformation to the folded one • The order of formation of SSE’s is computed along each path • The formation order that appears the most often over all paths is considered the SSE formation order of the protein

  26. Method • The contact matrix showing the time step when each native contact appears is built

  27. Protein CI2 (1a + 4 b)

  28. 60 5 The native contact between residues 5 and 60 appears at step 216 Protein CI2 (1a + 4 b)

  29. Method • The contact matrix showing the time step when each native contact appears is built • The time step at which a structure appears is approximated as the average of the appearance time steps of its contacts

  30. a forms at time step 122 (II) b3 and b4 come together at 187 (V) b2 and b3 come together at 210 (IV) b1 and b4 come together at 214 (I) a and b4 come together at 214 (III) Protein CI2 (1a + 4 b)

  31. Method • The contact matrix showing the time step when each native contact appears is built • The time step at which a structure appears is approximated as the average of the appearance time steps of its contacts

  32. SSE’s roadmap size 1a+4b 5126, 70k 5471, 104k 3a 7975, 104k 1a+4b 8357, 119k 1a+5b CI2 Comparison with Experimental Data

  33. vi Pij vj Stochastic RoadmapsM.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, J.C. Latombe and C. Varma. Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion. J. Comp. Biol., 10(3-4):257-281, 2003 New Idea:Capture the stochastic nature of molecular motion by assigning probabilities to edges

  34. Pii vi Pij Edge probabilities Follow Metropolis criteria: Self-transition probability: vj [Roadmap nodes are sampled uniformly at random and energy profilealong edges is not considered]

  35. Stochastic roadmap simulation and Monte Carlo simulation converge to the Boltzmann distribution, i.e., the number of times SRS is at a node in V converges towardwhen the number of nodes grows (and they are uniformly distributed) Stochastic Roadmap Simulation V Pij

  36. Roadmap as Markov Chain j Pij i • Transition probability Pij depends only on i and j

  37. 1- pfold pfold Example #1: Probability of Folding pfold Unfolded state Folded state

  38. U: Unfolded state F: Folded state =1 =1 First-Step Analysis • One linear equation per node • Solution gives pfold for all nodes • No explicit simulation run • All pathways are taken into account • Sparse linear system l k j Pik Pil Pij m Pim i Pii Let fi = pfold(i) After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm

  39. Number of Self-Avoiding Walks on a 2D Grid 1, 2, 12, 184, 8512, 1262816, 575780564, 789360053252, 3266598486981642, (10x10) 41044208702632496804, (11x11) 1568758030464750013214100, (12x12) 182413291514248049241470885236 > 1028 http://mathworld.wolfram.com/Self-AvoidingWalk.html

  40. In contrast … Computing pfold with MC simulation requires: Foreveryconformation q of interest • Perform many MC simulation runs from q • Count number of times F is attained first

  41. Computational Tests • 1HDD (Engrailed homeodomain) • 3 a helices • 12 DOF • 1ROP (repressor of primer) • 2 a helices • 6 DOF H-P energy model with steric clash exclusion [Sun et al., 95]

  42. Correlation with MC Approach 1ROP

  43. pfold for ß hairpin Immunoglobin binding protein (Protein G) Last 16 amino acids Cα based representation Go model energy function 42 DOFs [Zhou and Karplus, `99]

  44. Computation Times (ß hairpin) Monte Carlo (30 simulations): Over 107energy computations ~10 hours of computer time 1 conformation Roadmap: ~50,000energy computations 23 seconds of computer time 2000 conformations ~6 orders of magnitude speedup!

  45. Using Path Sampling to Construct RoadmapsN. Singhal, C.D. Snow, and V.S. Pande. Using Path Sampling to Build Better Markovian State Models: Predicting the Folding Rate and Mechanism of a Tryptophan Zipper Beta Hairpin, J. Chemical Physics, 121(1):415-425, 2004 New idea: Paths computed with Molecular Dynamics simulation techniques are used to create the nodes of the roadmap More pertinent/better distributed nodes  Edges are labeled with the time needed to traverse them

  46. ~dt Sampling Nodes from Computed Paths (Path Shooting) F U

  47. j tij pij i Example: Langevin dynamics equation of motion is where R is a Gaussian random force Sampling Nodes from Computed Paths (Path Shooting) F U

  48. 3 3 1 1 P12, t12 2 2’ P12’, t12’ P14, t14 4 P12’ = P12 + P14 t12’ = P12xt12 + P14xt14 5 5 Node Merging If two nodes are closer apart than some e, they are merged into one and merging rules are applied to update edge probabilities and times

  49. 3 3 1 1 P12, t12 2 2’ P12’, t12’ P14, t14 4 P12’ = P12 + P14 t12’ = P12xt12 + P14xt14 5 5 Node Merging If two nodes are closer apart than some e, they are merged into one and merging rules are applied to update edge probabilities and times  Approximately uniform distribution of nodes over the reachable subset of conformational space

  50. Application: Computation of MFPT • Mean First Passage Time: the average time when a protein first reaches its folded state • First-Step Analysis yields: • MPFT(i) = SjPijx(tij + MPFT(j)) • MPFT(i) = 0 if i  F • Assuming first-order kinetics, the probability that a protein folds at time t is: where r is the folding rate • MFPT = =1/r

More Related