1 / 30

Presented by: Michelle Cavallo I.B. PhD Student Advisor: Dr. R. Narayanan

Predicting Protein Folding Pathways Zaki, Nadimpally, Bardhan, and Bystroff Data Mining in Bioinformatics. Presented by: Michelle Cavallo I.B. PhD Student Advisor: Dr. R. Narayanan. Overview. Problem:

Download Presentation

Presented by: Michelle Cavallo I.B. PhD Student Advisor: Dr. R. Narayanan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting Protein Folding PathwaysZaki, Nadimpally, Bardhan, and BystroffData Mining in Bioinformatics Presented by: Michelle Cavallo I.B. PhD Student Advisor: Dr. R. Narayanan

  2. Overview • Problem: • Identify a time-ordered sequence of folding events that make up a structured protein folding pathway • Solution: • Novel “unfolding” approach for predicting the folding pathway • Apply graph-based methods on a weighted secondary structure graph of a protein to predict the sequence of unfolding events • Reverse the event sequence to see the folding pathway • Experiments: • Successful predictions for proteins with partially known folding pathways

  3. Introduction • Proteins fold spontaneously and reproducibly in an aqueous solution • Structure is determined by sequence • Function is determined by structure Hemoglobin: a globular protein

  4. Protein Problems • Two major protein problems for bioinformatics: • The Structure Prediction Problem • Determine 3D 3º structure from linear amino acid sequence • The Pathway Prediction Problem • Given an amino acid sequence and its 3D structure, determine the folding pathway that leads from the linear structure to the 3º structure • Major focus has been on structure prediction

  5. Structure Prediction • Traditional approaches to structure prediction have focused on: • Evolutionary homology • Fold recognition (goodness of fit score for sequence-structure alignment) • Ab initio simulations (conformational search for the lowest energy state) • Conformational search space is huge • Proteins fold in milliseconds—a structured folding pathway must play an important role in this conformational search • Experimental evidence does indicate that certain events always occur early in the folding process and certain others always occur later

  6. Towards Pathway-based Structure Prediction • To make pathway-based approaches to structure prediction a reality, plausible protein folding pathways need to be predicted. • The ability to predict folding pathways can greatly enhance structure prediction methods.

  7. Studying Folding Pathways • One approach to studying folding pathways is to identify folding possibilites in an unfolded protein. • This is infeasible—there are too many possibilities. • The approach used in this study is to start with a folded protein in its final state and learn how to “unfold” the protein. • The reversed unfolding sequence could then be a plausible protein folding pathway. • The solution: Use minimum cuts on weighted graphs to determine a plausible sequence of unfolding steps.

  8. Protein Contact Maps • A protein contact map represents the distance between every two residues of a 3D protein structure in a 2D matrix. • Represented in a symmetrical, square Boolean matrix of pairwise interresidue contacts • “A contact map for a protein with N residues is an N x N binary matrix C whose element C (i, j) = 1 if residues i and j are in contact and C (i, j) = 0 otherwise” • Protein contact maps can be created using different tools, e.g. BioPython, Structer

  9. Protein Contact Maps Figure 7.2 shows the 3º structure and contact map for IgG-binding protein from PDB

  10. Graphs and Minimum Cuts • A protein can be represented as a weighted secondary structure element graph (WSG) • Vertices = the SSEs that make up the protein • β-strands represented as triangles • α-helices represented as circles • Edges denote proximity relationships between SSEs • Edges weighted by strength of interactions between SSEs • Edge construction and weights are determined from the contact map

  11. Graphs and Minimum Cuts

  12. Solution/Approach Outline • Approach to predicting a folding pathway using the idea of “unfolding” • “Use a graph representation of a protein, where a vertex denotes a 2º structure and an edge denotes the interactions between the two SSEs” (2º structure elements).” • Unfold the protein through a series of mincuts

  13. Unfolding via Mincuts • Unfold one piece at a time, each time choosing the cut which will have the least impact on the remaining structure • The sequence can then be reversed to identify plausible pathways for protein folding • This series of mincuts predicts the most likely sequence of unfolding events

  14. Unfolding via Mincuts • A mincut represents the set of edges that partition a WSG into two components with the smallest number of bonds between them • Stoer-Wagner (SW) deterministic polynomial-time mincut algorithm was used since it is simple and fast. • “The SW algorithm works iteratively by merging the vertices until only one unmerged vertex remains”

  15. Unfolding via Mincuts • “SW starts with an arbitrary vertex and adds the most highly connected vertex to the current set” • This process is repeated until all vertices have been added in order of decreasing attraction to the first

  16. Unfolding via Mincuts “An unfolding event is a set of edges that form a mincut in the WSG for a protein.”

  17. The UNFOLD Algorithm • Determine mincut for initial WSG • Break ties arbitrarily • Delete edges forming this cut from WSG • This yields two new connected subgraphs • Recursively process each subgraph to yield a sequence of mincuts corresponding to the unfolding events • Reverse this sequence to obtain predicted folding pathway

  18. The UNFOLD Algorithm • Sequence of mincuts that can be visualized as a tree • Nodes represent sets of vertices (graphs) produced by mincuts • Children of a node represent partitions resulting from the mincut

  19. Consideration • “Allowance should be made for several folding events to take place simultaneously. • However, there may be intermediate stages that must happen before higher order folding can take place.” • “The results should not be taken to imply a strict folding timeline, but rather as a way to understand major events that are mandatory in the folding pathway.”

  20. Experimentation • No one has determined a complete protein folding pathway • However, there is evidence supporting intermediate pathway stages for several well-studied proteins • Proteins with known intermediate pathway stages were analyzed with UNFOLD

  21. Detailed Test Case: 4DFR • Dihydrofolate Reductase (PDB ID: 4DFR) • Involved in nucleotide metabolism • Has an adenine binding domain which is formed (folded) early on in the folding pathway • An α1 and β2 interaction • 4DFR has four α-helices and eight β-strands.

  22. 4DFR Detailed Test Case Continued • Shown below are the WSG, unfolding sequence, and a series of intermediate stages in the folding pathway • “According to the mincut-based UNFOLD algorithm, the vertex set {β2α2β3β1} lies on the folding pathway in agreement with the experimental results.”

  23. 4DFR Detailed Test Case Continued Predicted folding sequence for 4DFR

  24. Pathways for Other Proteins • Several other proteins with known protein folding pathway intermediate stages were UNFOLDed • Bovine Pancreas Trypsin Inhibitor, Chymotrypsin Inhibitor 2, Human Procarboxypeptidase A2, Cell Cycle Protein p13suc1, β-lactoglobulin, Interleukin-1β, Protein Acylphosphatase, Twitchin Ig Superfamily Domain Protein, Myoglobin and leghemoglobin • UNFOLD results reflected experimental results

  25. Conclusion • A repeat mincut approach (UNFOLD algorithm) can be used for automated prediction of protein folding pathways

  26. Future Perspectives • Plan to test UNFOLD on the entire collection of proteins in the PDB • Want to study proteins from the same family to look for prediction of consistent pathways • Similarities and dissimilarities are both of interest

  27. Limitations • “UNFOLD arbitrarily picks only one micut out of perhaps several mincuts that have the same capacity” • Constructing all possible pathways might provide stronger evidence of intermediate states • “All native interactions are considered energetically equivalent, and thus larger stabilizing interactions are not differentiated.” • Simplified model based on topology • Folding mechanism inferred from native structure alone • May be ok, because investigations indicate folding mechanisms are largely determined by topology

  28. Biology Perspectives • “The ability to predict folding pathways can greatly enhance structure prediction methods” • We want to predict structures to assign putative functions to novel genes!

  29. Biology Perspectives Continued • It is very difficult to determine a protein structure in the lab • X-ray crystallography • Technique is difficult to perform • Results are difficult to interpret • We would like to have fast, easy methods for predicting structure in silico.

  30. Biology Perspectives Continued • Protein folding pathway prediction is of particular interest in prion research • Prions = misfolded proteins which cause transmissible spongiform encephalopathy • Creutzfeldt-Jakob Disease • Gerstmann-Sträussler-Scheinker Syndrome (GSS) • Fatal Familial Insomnia (FFI) • Kuru

More Related