The Probabilistic Roadmap Approach to Study Molecular Motion

The Probabilistic Roadmap Approach to Study Molecular Motion Jean-Claude Latombe Kwan Im Thong Hood Cho Temple Visiting Professor, NUS Kumagai Professor, Computer Science, Stanford

Molecular motion is an essential process of life CspA

Understanding molecular motion could help cure many diseases Mad cow disease is caused by misfolding Drug molecules act bybinding to proteins

Stanford BioX cluster NMR spectrometer As few experimental tools are available, computational tools are critical • Computer simulation: • Monte Carlo simulation • Molecular Dynamics

But MD and MC simulation have two major drawbacks • Each simulation run yields a single pathway, while molecules tend to move along many different pathways

Intermediate states But MD and MC simulation have two major drawbacks • Each simulation run yields a single pathway, while molecules tend to move along many different pathways

But MD and MC simulation have two major drawbacks • Each simulation run yields a single pathway, while molecules tend to move along many different pathways Interest in ensemble properties

1- pfold pfold Unfolded state Folded state Example of Ensemble Property: Probability of Folding pfold Measure kinetic distance to folded state

Other Examples of Ensemble Properties • Order of formation of secondary structure elements • Average time for a ligand to escape a binding site • Folding rate of a protein • Key intermediates along folding pathways • Etc ...

But MD and MC simulation have two major drawbacks • Each simulation run yields a single pathway, while molecules tend to move along many different pathways Interest in ensemble properties • Each simulation run tends to waste much time in local minima

Roadmap-Based Representation • Network of conformations connected by local motion pathways • Compact representation of huge number of motion pathways • Coarse resolution relative to MC and MD simulation • Efficient algorithms for analyzing multiple pathways

free space [Kavraki, Svetska, Latombe,Overmars, 95] Roadmaps for Robot Motion Planning

Initial Work:Application ofRoadmaps to Ligand BindingA.P. Singh, J.C. Latombe, and D.L. Brutlag. A Motion Planning Approach to Flexible Ligand Binding. Proc. 7th Int. Conf. on Intelligent Syst. for Molecular Biology (ISMB), pp. 252-261, 1999 • The ligand is modeled as a flexible molecule, but the protein is assumed rigid • A conformation of the ligand is defined by the position and orientation of a group of 3 atoms relative to the proteinand by the torsional angles of the ligand

P = if Emin E  Emax Roadmap Construction (Node Generation) • Conformations of the ligand are sampled at random around the protein • The energy E at each sampled conformation is computed: E = Einteraction + Einternal Einteraction = electrostatic + van der Waals potential Einternal = Snon-bonded pairs of atoms electrostatic + van der Waals • A sampled conformation is retained as a node with probability:0 if E > Emax Emax-E Emax-Emin 1 if E < Emin  Denser distribution of nodes in low-energy regions of conformational space

ε qi qi+1 q q’ Emax Roadmap Construction (Edge Generation) • Each node is connected to each of its closest neighbors by a straight edge • Each edge is discretized at some resolution ε (= 1Å) • If any E(qi) > Emax , then the edge is rejected E

ε qi qi+1 q q’ Heuristic measureof energetic difficultyof moving from q to q’ Roadmap Construction (Edge Generation) • Each node is connected to each of its closest neighbors by a straight edge • Each edge is discretized at some resolution ε (= 1Å) • If all E(qi)  Emax , then the edge is retained and is assigned two weights w(qq’) and w(q’q) where: (probability that the ligand moves from qi to qi+1 when it is constrained to move along the edge)

Querying the Roadmap • For a given goal node qg (e.g., binding conformation), the Dijkstra’s single-source algorithm computes the lowest-weight paths from qg to each node (in either direction) in O(NlogN) time, where N = number of nodes • Various quantities can then be easily computed in O(N) time, e.g., average weights of all paths entering qg and of all paths leaving qg(~ binding and dissociation rates Kon and Koff) Protein: Lactate dehydrogenase Ligand: Oxamate (7 degrees of freedom)

Experiments on 3 Complexes • PDB ID: 1ldm Receptor: Lactate Dehydrogenase (2386 atoms, 309 residues) Ligand: Oxamate (6 atoms, 7 dofs) • PDB ID: 4ts1 Receptor: Mutant of tyrosyl-transfer-RNA synthetase (2423 atoms, 319 residues) Ligand: L- leucyl-hydroxylamine (13 atoms, 9 dofs) • PDB ID: 1stp Receptor: Streptavidin (901 atoms, 121 residues) Ligand: Biotin (16 atoms, 11 dofs)

active site Computation of Potential Binding Conformations • Sample many (several 1000’s) ligand’s conformations at random around protein • Repeat several times: • Select lowest-energy conformations that are close to protein surface • Resample around them • Retain k (~10) lowest-energy conformations whose centers of mass are at least 5Å apart lactate dehydrogenase

Results for 1ldm • Some potential binding sites have slightly lower energy than the active site  Energy is not a discriminating factor for recognizing active site • Average path weights (energetic difficulty) to enter and leave binding site are significantly greater for the active site  Indicates that the active site is surrounded by an energy barrier that “traps” the ligand

Application of Roadmaps to Protein FoldingN.M. Amato, K.A. Dill, and G. Song. Using Motion Planning to Map Protein Folding Landscapes and Analyze Folding Kinetics of Known Native Structures. J. Comp. Biology, 10(2):239-255, 2003 • Known native state • Degrees of freedom: φ-ψ angles • Energy: van der Waals, hydrogen bonds, hydrophobic effect • New idea: Sampling strategy

Sampling Strategy(Node Generation) • High dimensionality  non-uniform sampling • Conformations are sampled using Gaussian distribution around native state • Conformations are sorted into bins by number of native contacts (pairs of C atoms that are closeapart in native structure) • Sampling ends when all bins have minimum number of conformations  “good” coverage of conformational space

Application: Order of Formation of Secondary Structure Elements • The lowest-weight path is extracted from each denatured conformation to the folded one • The order of formation of SSE’s is computed along each path • The formation order that appears the most often over all paths is considered the SSE formation order of the protein

Order of Formation of Secondary Structures along a Path • The contact matrix showing the time step when each native contact appears is built

Protein CI2 (1a + 4 b)

The native contact between residues 5 and 60 appears at step 216 60 5 Protein CI2 (1a + 4 b)

Order of Formation of Secondary Structures along a Path • The contact matrix showing the time step when each native contact appears is built • The time step at which a structure appears is approximated as the average of the appearance time steps of its contacts

a forms at time step 122 (II) b3 and b4 come together at 187 (V) b2 and b3 come together at 210 (IV) b1 and b4 come together at 214 (III) Protein CI2 (1a + 4 b)

Application: Order of Formation of Secondary Structure Elements • The lowest-weight path is extracted from each denatured conformation to the folded one • The order of formation of SSE’s is computed along each path • The formation order that appears the most often over all paths is considered the SSE formation order of the protein

SSE’s roadmap size 1a+4b 5126, 70k 5471, 104k 3a 7975, 104k 1a+4b 8357, 119k 1a+5b Comparison with Experimental Data

vi Pij vj Stochastic RoadmapsM.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, J.C. Latombe and C. Varma. Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion. J. Comp. Biol., 10(3-4):257-281, 2003 New Idea:Capture the stochastic nature of molecular motion by assigning probabilities to edges

Pii vi Pij Edge Probabilities Follow Metropolis criteria: Self-transition probability: vj

Stochastic roadmap simulation and Monte Carlo simulation converge to the Boltzmann distribution, i.e., the number of times SRS is at a node in V converges toward Zwhen the number of nodes grows (and they are uniformly distributed) Stochastic Roadmap Simulation V Pij

Roadmap as Markov Chain j Pij i • Transition probability Pij depends only on i and j

1- pfold pfold Probability of Folding pfold Unfolded state Folded state

U: Unfolded state F: Folded state l k j Pik Pil Pij m Pim i Pii First-Step Analysis Let fi = pfold(i) After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm

U: Unfolded state F: Folded state =1 =1 First-Step Analysis • One linear equation per node • Solution gives pfold for all nodes • No explicit simulation run • All pathways are taken into account • Sparse linear system l k j Pik Pil Pij m Pim i Pii Let fi = pfold(i) After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm

Number of Self-Avoiding Walks on a 2D Grid 1, 2, 12, 184, 8512, 1262816, 575780564, 789360053252, 3266598486981642, (10x10) 41044208702632496804, (11x11) 1568758030464750013214100, (12x12) 182413291514248049241470885236 > 1028 http://mathworld.wolfram.com/Self-AvoidingWalk.html

In contrast … Computing pfold with MC simulation requires: Foreveryconformation q of interest • Perform many MC simulation runs from q • Count number of times F is attained first

Computational Tests • 1HDD (Engrailed homeodomain) • 3 a helices • 12 DOF • 1ROP (repressor of primer) • 2 a helices • 6 DOF H-P energy model with steric clash exclusion [Sun et al., 95]

pfold for ß hairpin Immunoglobin binding protein (Protein G) Last 16 amino acids Cα based representation Go model energy function 42 DOFs [Zhou and Karplus, `99]

Correlation with MC Approach 1ROP

Computation Times (ß hairpin) Monte Carlo (30 simulations): Over 107energy computations ~10 hours of computer time 1 conformation Roadmap: ~50,000energy computations 23 seconds of computer time 2000 conformations ~6 orders of magnitude speedup!

Using Path Sampling to Construct RoadmapsN. Singhal, C.D. Snow, and V.S. Pande. Using Path Sampling to Build Better Markovian State Models: Predicting the Folding Rate and Mechanism of a Tryptophan Zipper Beta Hairpin, J. Chemical Physics, 121(1):415-425, 2004 New idea: Paths computed with Molecular Dynamics simulation techniques are used to create the nodes of the roadmap More pertinent/better distributed nodes  Edges are labeled with the time needed to traverse them

~dt Sampling Nodes from Computed Paths (Path Shooting) F U

j tij pij i Sampling Nodes from Computed Paths (Path Shooting) F U

3 3 1 1 P12, t12 2 2’ P12’, t12’ P14, t14 4 P12’ = P12 + P14 t12’ = P12xt12 + P14xt14 5 5 Node Merging • If two nodes are closer apart than some e, they are merged into one  roadmap • Rules are applied to update edge probabilities and times

Application: Computation of MFPT • Mean First Passage Time: the average time when a protein first reaches its folded state • First-Step Analysis yields: • MPFT(i) = SjPijx(tij + MPFT(j)) • MPFT(i) = 0 if i  F • Assuming first-order kinetics, the probability that a protein folds at time t is: where r is the folding rate • MFPT = =1/r

Computational Test • 12-residue tryptophan zipper beta hairpin (TZ2) • Folding@Home used to generate trajectories (fully atomistic simulation) ranging from 10 to 450 ns • 1750 trajectories (14 reaching folded state) •  22,400-node roadmap • MFPT ~ 2-9 ms, which is similar to experimental measurements (from fluorescence and IR)

Conformational Analysis of Protein LoopsJ. Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran. Geometric Algorithms for the Conformational Analysis of Long Protein Loops. J. Comp. Chemistry, 25:956-967, 2004 New idea: Explore the clash-free subset of the conformational space of a loop, by building a tree-shaped roadmap Kinematic model: f-y angles on the backbone + ci torsional angles in side-chains

The Probabilistic Roadmap Approach to Study Molecular Motion

The Probabilistic Roadmap Approach to Study Molecular Motion

Presentation Transcript

Probabilistic Roadmap Methods (PRMs)

A Probabilistic Approach to Logic Equivalence Checking

A Probabilistic Approach to Personalized Tag Recommendation

Probabilistic Robotics: Motion and Sensing

Visibility based Probabilistic Motion Planning

A Probabilistic Approach to Personalized Tag Recommendation

Motion Planning for Car-like Robots using a Probabilistic Learning Approach

Molecular Motion Pathways: Computation of Ensemble Properties with Probabilistic Roadmaps

Application of Probabilistic Roadmaps to the Study of Protein Motion

Probabilistic Roadmap

A probabilistic XML approach to data integration

A probabilistic approach to microRNA-target binding

Standard: SPS5a - Molecular Motion

A probabilistic approach to exploring global dynamics

Molecular Motion

A Probabilistic Approach to Semantic Representation

A Probabilistic Approach to Vieta’s Formula

The Proper Approach To Bible Study

The Probabilistic Roadmap Approach to Study Molecular Motion

The Roadmap Approach to Regulating Digital Financial Services

Probabilistic Motion Planning