Exploring Folding Landscapes with Motion Planning Techniques

Exploring Folding Landscapes with Motion Planning Techniques Bonnie Kirkpatrick Montana State University Dr. Nancy Amato Guang Song Xinyu Tang Texas A&M University Parallel Architectures, Algorithms, and Optimizations Laboratory

Outline • Motivation: Biopolymers • Goal: Folding Landscapes • Method: Motion Planning • Application 1: Protein Folding • Application 2: RNA Folding

Motivation: Biopolymers

Protein and RNA Molecules • Protein and RNA molecules are a complex 3-dimensional folding of a sequence of bases. • Primary Structure • Sequence of bases • Each base is represented by a letter of the alphabet • i.e. ACGUGCCAUCG • Obtained from experiment • Tertiary Structure • The sequence loops back on itself and folds in 3-dimensions. Tertiary representation of an RNA molecule.

Tertiary Structure • Chemical bonds (or contacts) form between complementary bases in close proximity. • There are many possible conformationsof the primary sequence. • Example sequence: CACAGAGUGU • Two possible conformations are shown. • Potential energy calculations based on number and types of bonds are used to classify conformations. • The lowest energy conformation is known as the native structure. • Conformations with few bonds and high energy are referred to as unfolded. Two possible conformations of the sequence. Bonds are blue.

Goal: Folding Landscapes

The Folding Process (The Black Box) Unfolded Conformation (high energy) AGGCUACUGGGAGCCUUCUCCCC Physical Laws cause folding Native Conformation (low energy)

Folding Landscapes • Description of the “black box” • A space in which every point corresponds to a conformation (or set of conformations) and its associated potential energy value (C-space). • A complete folding landscape contains a point for every possible conformation of a given sequence. Conformation Space Potential Native State Tetrahymena Ribozyme Landscape [Russell, Zhuang, Babcock, Millett, Doniach, Chu, and Herschlag, 2002]

Folding Landscapes (cont.) • Conformational changesdescribe how a molecule changes physically to fold from one conformation to another • Continuous • Protein Folding Model • Bond angles change with continuous rotations • Discrete • RNA Folding Model • Bonds either exist or do not exist

Features of Folding Landscapes • Folding pathways consist of the set of conformational changes a molecule is likely to fold though when moving from one conformation to another. • N to X to Y • Energy barriers are areas of the landscape with high energy that separate groups of conformations. • Y is separated from X and N • Intermediate states are conformations lying on the folding pathway represent local minimums of potential. • Y and X Native State Mutant α mRNA fragment [Chen and Dill, 2000]

unfolded folded A Protein Folding Pathway

A RNA Folding Pathway Unfolded Energy Barrier Native State Phenylalanine tRNA [Hofacker, 1998]

Mapping Folding Landscapes • Existing techniques for mapping landscapes are limited to relatively short sequences (~200 nucleotides). • A robotics motion planning technique called PRM has successfully been applied to protein folding.

Method: Motion Planning

(Basic) Motion Planning (in a nutshell): Given amovableobject, find asequence of valid configurationsthat moves the object from the start to the goal. start goal obstacles Motion Planning Motion Planning for Foldable Objects: Given a foldable object, find a valid folding sequence that transforms the object from one folded state to another.

[Kavraki, Svestka, and Latombe, 1996] A conformation Probabilistic Roadmap Method (PRM) Conformation space Construct the roadmap: 1. Generate nodes. 2. Connect to form roadmap The Roadmap is like a net being laid down on protein’s potential landscape. Potential • Now the roadmap can be used: • To find a path • To extract multiple paths Native state

Application 1: Protein Folding

Outline • Probabilistic Roadmap Methods (PRMs) for Protein Folding • the native fold is known • bias sampling around native fold • Results • Protein folding landscapes • Secondary structure formation order and validation • timed contact map • Folding kinetics

[Song and Amato, 2001] • amino acid: pair of phi/psi angles • protein: a sequence of amino acids. • conformation node is: Model of a protein

PRM: Node Generation N • Take advantage of the known native state. • map the potential landscape/funnel leading to it. • sample around it and gradually grow out. • generate conformations by randomly selecting phi/psi angles • Criterion for accepting a node: Compute potential energy E of each node and retain it with probability P(E):

Native state PRM: Node Generation • Start with native structure. • Gradually grow out. Denser distribution around native state

Find k closest nodes for each roadmap node • Assign edge weight to reflect energetic feasibility: • lower weight  more feasible • [Singh, Latombe, Brutlag, 1999] Native state PRM: Roadmap Connection

where Energy Computation • Potential (ref. Levitt’83) • van der Waals + hydrogen bonds + disulphide bonds + hydrophobic effect • All-atom model • Free Energy (ref. Fiebig & Dill ’93, Munoz & Eaton’99)

PRMs for Protein Folding: Key Issues • Validation • In RECOMB ‘01 (Song & Amato), our results validated with hydrogen exchange experiments. [Li & Woodward 1999] • Energy Functions • The degree to which the roadmap accurately reflects folding landscape depends on the quality of energy calculation.

Analysis of Landscape • Folding Potential Landscape • Secondary structure formation order • timed contact map • experimental validation • Studying Folding Kinetics • 2-state folding kinetics • calculation of folding rates • identifying 2-state, 3-state, … k-state kinetics

Distributions for different types:Potential Energy vs. RMSD for roadmap nodes all alpha alpha + beta all beta

protein G (domain B1) 135 142 (IV:  1-4)   1-2 1-4 140 143 114 140 143 140 141 142 144 139 143 143 131  3-4 Timed Contact Map:formation order for a Path residue # residue #  Formation order: ,  3-4,  1-2,  1-4 Average T = 142

Protein GB1(56 amino acids) • 1 alpha helix & 4 beta-strands Hydrogen Exchange Results first helix, and beta 3-4 Our Paths 80%: helix, beta 3-4, beta 1-2, beta 1-4 20%: helix, beta 1-2, beta 3-4, beta 1-4 Validating Folding Pathways [Li & Woodward 1999] • our paths are: from all the nodes with little structure to the native state • secondary structure formation order checked on each path w/ timed contact map

Secondary structure formation order and validation • Proteins primarily from [Munoz & Eaton PNAS’99] for comparison purposes • Contact us if you want us to analyze your proteins!

F U N R Folding kinetics:statistical mechanical model • Proteins treated as statistical system • Define interactions => partition function • Free energy as a function of reaction coordinate (R) • Then decide folding kinetics and folding rate • [Munoz & Eaton, Alm & Baker. PNAS’99] • Assumption and limitation of statistical model • Very limited interactions to simplify partition function calculation • Assume selected reaction coordinate good (monotonically increasing) • Cannot provide folding trajectories • Strength: as a theoretical model, it is good for analysis

Free Energy Native Contacts Free Energy Landscape:2-state folding kinetics • Our method can produce similar results (plus more). • 2-state folding kinetics indicated • Both plots ‘imply’ nativelikeness should always increase • Plots like these lose info because of averaging effect our roadmap model statistical model [Munoz & Eaton PNAS’99] Blue, red and green lines are three levels of approximation Blue line: free energy average

cluster paths into several groups extract 2-state,3-state, …, k-state kinetics from same roadmap not possible with statistical mechanical models A B A B Average 3-state 2-state 2-state Studying folding kinetics at pathway level

Protein G Native contacts Free energy Studying folding kinetics at pathway level • trajectories not available from statistical model • native contact not monotonically increasing • Diverse free energy profiles

Protein Folding:Conclusion & Future Work • PRM roadmaps approximate folding landscapes • Efficiently produce multiple folding pathways • Secondary structure formation order • better than trajectory-based simulation methods, such as Monte Carlo, molecular dynamics • Provide a good way to study folding kinetics • multiple folding kinetics in same landscape (roadmap) • natural way to study the statistical behavior of folding • more realistic than statistical models (e.g. Lattice models, Baker’s model PNAS’99, Munoz’s model, PNAS’99)

Application 2: RNA Folding

Outline • RNA Model using secondary structure • Conformation Space • Node Generation • Node Evaluation • Node Connection • Edge Weights

RNA Secondary Structure • Two-dimensional representation of the tertiary structure • Planar representation • Sufficient structural information • Pseudo knots are considered a tertiary structure, rather than a secondary structure

Violates criteria (1) Violates criteria (2) Secondary Structure Formalized • A secondary structure conformation is specified by a set of intra-chain contacts (bonds between base pairs) that follow certain rules. • Given any two intra-chain contacts [i, j] with i < j and [k,l] with k < l, then: • If i = k, then j = l • Each base can appear in only one contact pair • If k < j, then i < k < l < j • No pseudo-knots

C-space Where n is the number of possible contact pairs PRM: Conformation Space • Let U be the set of every possible combination of contact pairs. • Let C-space (the conformation space), C, be the sub-set of U containing only valid secondary structures. • C-space is smaller than U, but is still very large. • Sequence: (ACGU)10 • Length: 40 nucleotides • C-Space: 1.6x108 structures • Purpose: generate nodes in C-space that describe the space without covering it

C-space PRM: Node Generation • Random Node Generation Algorithm • Starting with an empty configuration, c, random contacts are added to c one at a time. • Each step preserves the condition that c contains a valid set of base pair contacts. • Contacts are added until there are no more contacts that do not conflict with the contact set of c. • Every node generated has valid secondary structure and is a member of C-space. • Since every generated node has the maximal number of contacts, the sampling is biased toward the area of C-space near the native state.

PRM: Node Evaluation • Evaluation of Nodes • Potential energy determines how good a node is. • Only add a node to the roadmap if it has a low energy. • Probability of a node q being added to the roadmap:

… C1 = S1 S2 Sn-1 Sn = C2 PRM: Node Connection • Given two nodes in C-Space, C1 and C2, find a path between them consisting of a sequence of nodes: { C1 = S1, S2, …, Sn-1, Sn = C2 } • The path must have the property that for each i, 1 < i<n, the set of contact pair of Si differs from that of Si-1 by the application of one transformation operation: (1) open or (2) close a single contact pair.

Node Connection (cont.) • There exists a path between any two nodes in C-Space. • Not just any path will do; we want a good one. • Bad paths have high energy nodes in them. • How do we find the lowest energy path?

Node Connection (cont.) • more contacts  less potential energy • Heuristic: if a contact is opened by the transition from one node to another, try to close a contact in the next transition c1 = s1: ..(.((..))).. open s2: ..(.(....)).. close s3: ..(.((.).)).. open s4: ..(..(.)..).. close C2 = s5: ..(.((.)).)..

Edge Weight • Depends on the nodes generated in the node connection phase. • Difference in potential energy • ΔEi = E(si+1) – E(si)

Future Work • Analysis of the roadmap • Finding the low energy folding pathways • Shortest path algorithm • Validation • How do we know if our results agree with experimental results? • Proposal • Compare a fully enumerated roadmap to experimental folding rates • Compare a more sparse roadmap with a fully enumerated roadmap • Proposal • Solve the master equation using stacking pairs (they are representative of all the dynamics) and our model • Compare our results with results from the Zhang and Chen’s statistical mechanical model [2002]

References Shi-Jie Chen and Ken A. Dill. Rna folding energy landscapes. PNAS, 97:646-651, 2000. Ivo L. Hofacker. Rna secondary structures: A tractable model of biopolymer folding. J.Theor.Biol., 212:35-46, 1998. Ivo L. Hofacker Jan Cupal and Peter F. Stadler. Dynamic programming algorithm for the density of states of rna secondry structures. Computer Science and Biology 96, 96:184-186, 1996. L. Kavraki, P. Svestka, J. C. Latombe, and M. Overmars. Probabilistic roadmaps for path planning in high-dimensional conguration spaces. IEEE Trans. Robot. Automat., 12(4):566-580, August 1996. J. C. Latombe. Robot Motion Planning. Kluwer Academic Publishers, Boston, MA, 1991. R. Li and C. Woodward. The hydrogen exchange core and protein folding. Protein Sci., 8:1571-1591, 1999. John S. McCaskill. The equilibrium partition function and base pair binding probabilities for rna secondary structure. Biopolymers, 29:1105-1119, 1990. Ruth Nussinov, George Piecznik, Jerrold R. Griggs, and Danel J. Kleitman. Algorithms for loop matching. SIAM J. Appl. Math., 35:68-82, 1972. R. Russell, X. Zhuang, H. Babcock, I. Millet, S. Doniach, S. Chu, and D. Herschlag. Exploring the folding landscape of a structured RNA. Proc. Natl. Acad. Sci., 99:155-60., 2002. Proc. Natl. Acad. Sci. U.S.A. 99, 155-60. D. Sanko and J.B. Kruskal. Time warps, string edits and macromolecules: the theory and practice of sequence comparison. Addison Wesley, London, 1983. A.P. Singh, J.C. Latombe, and D.L. Brutlag. A motion planning approach to exible ligand binding. In 7th Int. Conf. on Intelligent Systems for Molecular Biology (ISMB), pages 252-261, 1999. G. Song and N. M. Amato. Using motion planning to study protein folding pathways. In Proc. Int. Conf. Comput. Molecular Biology (RECOMB), pages 287-296, 2001. Stefan Wuchty. Suboptimal secondary structures of rna. Master Thesis, 1998.

Exploring Folding Landscapes with Motion Planning Techniques

Exploring Folding Landscapes with Motion Planning Techniques

Presentation Transcript

Motion Planning

Exploring landscapes . . .

Using Motion Planning to Study Protein Folding Pathways

Motion Planning

Motion Planning with Visibility Constraints

Energy landscapes and folding dynamics

Motion Planning:

Using Motion Planning to Study Protein Folding Pathways

Motion Planning

Motion Planning

Napkin Folding Techniques

Motion Planning

Motion Planning

V5: Exploring energy landscapes for protein folding: by all-atom MD simulations

Exploring landscapes . . .

Using Motion Planning to Map Protein Folding Landscapes

Exploring Folding Landscapes with Motion Planning Techniques

Motion Planning

Motion Planning

Motion Planning