1 / 27

Protein Structure Prediction

Protein Structure Prediction. Samantha Chui Oct. 26, 2004. DNA sequence. Protein sequence. Protein structure. transcription & translation. folding. Central Dogma of Biology. Question: Given a protein sequence, to what conformation will it fold?. How does nature do it?.

verda
Download Presentation

Protein Structure Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Structure Prediction Samantha Chui Oct. 26, 2004

  2. DNA sequence Protein sequence Protein structure transcription & translation folding Central Dogma of Biology • Question: Given a protein sequence, to what conformation will it fold?

  3. How does nature do it? • Hydrophobicity vs. hydrophilicity • Van der Waals interaction • Electrostatic interaction • Hydrogen bonds • Disulfide bonds

  4. Current Approaches • Experimental Methods • X-ray crystallography • NMR spectroscopy • Computational Methods • Homology modeling • Similar sequences fold into similar structures • Threading • Dissimilar sequences may fold into similar structures • Ab initio • No similarity assumptions • Conformational search

  5. protein sequence fragment library … Assembly of sub-structural units predicted structure known structures

  6. “Small Libraries of Protein Fragments Model Native Protein Structures Accurately”Rachel Kolodny, Patrice Koehl, Leonidas Guibas, and Michael Levitt, 2002 • Goal: Find finite set of protein fragments that can be used to construct accurate discrete conformations for any protein 1. Generate fragments from known proteins 2. Cluster fragments to identify common structural motifs 3. Test library accuracy on proteins not in the initial set

  7. f Datasets of protein fragments • 200 unique protein domains from Protein Data Bank (PDB) • 36,397 residues • Four sets of backbone fragments • 4, 5, 6, and 7-residue long fragments • Divide each protein domain into consecutive fragments beginning at random initial position

  8. Fragment structural similarity • Coordinate root-mean-square (cRMS) deviation of Cα atoms • cRMS(A,B) = sqrt(Σdi2/N) • one to one mapping between atoms in structure A and structure B • Translate and rotate to find best alignment • 0 if superimpose perfectly

  9. Pruning and clustering • Outliers have large cRMS deviation from all other fragments • Discard according to some fragment-length specific threshold • k-means simulated annealing clustering • Repeatedly run k-means clustering, merge nearby clusters and split disperse clusters • Scoring function: total variance = Σ (x – μ)2 • Less sensitive to initial choice of cluster centers than k-means

  10. Compiling the libraries • Select cluster centroids as library entries • Minimum sum of cRMS deviations from all the other cluster fragments • Form representative set of protein fragments • Library contents highly dependent upon clustering procedure • For each set of fragments, start with 50 random seeds and choose library with minimal total variance score

  11. Evaluating quality of a library • Local-fit • How well library fits local conformation of all proteins in test set. • Global-fit • How well library fits global three-dimensional conformation of all proteins in test set

  12. Local-fit method • Protein structures broken into set of all overlapping fragments of length f • Find for each protein fragment the most similar fragment in the library (cRMS) • Score = Average cRMS value over all fragments in all proteins in the test set

  13. Local-fit results

  14. Global-fit method • Concatenate best local-fit library fragments just found • Determine fragment’s orientation by superimposing its first three Cα atoms onto last three Cα atoms of preceding fragment

  15. Global-fit method • Number of possible sequences of fragments exponential in protein’s length • Greedy algorithm finds good rather than best global-fit approximation • Start at N terminus, approximate increasingly larger segments of the protein • Concatenate library fragment which will yield structure of minimal cRMS deviation from corresponding segment • Deterministic, linear time

  16. Global-fit results 0.91 Å 1.85 Å 2.78 Å 50 fragments 7 residues 2.66 states/residue 100 fragments 5 residues 10 states/residue 20 fragments 5 residues 4.47 states/residue

  17. protein sequence fragment library … Assembly of sub-structural units predicted structure known structures

  18. “Protein structure prediction via combinatorial assembly of sub-structural units”Yuval Inbar, Hadar Benyamini, Ruth Nussinov, and Haim J. Wolfson, 2003

  19. CombDock • Input: structural units (SUs) with known 3D conformations • SUs considered rigid bodies • rotated and translated with respect to each other • Goal: predict overall structure • Constraints • Penetration: avoid steric clashes • Backbone: restriction on maximum distance between consecutive SUs

  20. All pairs docking • N(N-1)/2 pairs of SUs • Calculate candidate transformations according to matching complementary local features on surface of SUs • Apply transformation on 2nd SU of pair • Keep K best for each • Clustering to ensure all K transformations yield significantly different complexes

  21. i Transformation between i and k induced by transformations (ij, jk) … 1 2 K j k Combinatorial assembly • Multigraph representation • Vertices = SUs • Edges = transformations between two SUs • K parallel edges between any two vertices • Final protein conformation = spanning tree • N SUs, one connectivity component, no cycles

  22. Combinatorial Assembly • NN-2KN-1 different spanning trees • Not all spanning trees are valid complexes • Use heuristical algorithm • Two subtrees adjacent iff there exists an index i so that vertex i is in one subtree and i+1 is in the other • Sequential tree: recursive definition • One vertex • Tree with edge that connects two adjacent sequential trees

  23. Combinatorial Assembly • Hierarchical algorithm of N stages • ith stage: generate sequential trees with i vertices • Construct trees by connecting adjacent sequential trees of smaller sizes generated earlier • Keep D best sequential trees at each step • Discard trees which do not meet backbone and penetration constraints • Score = sum of scores of transformations

  24. Combinatorial Assembly

  25. CombDock Results

  26. Conclusion • Experimental Methods • X-ray crystallography • NMR spectroscopy • Computational Methods • Homology modeling • Similar sequences fold into similar structures • Threading • Dissimilar sequences may fold into similar structures • Ab initio • No similarity assumptions • Conformational search protein sequence predicted structure known structures fragment library …

  27. References • Kolodny et al., “Small libraries of protein fragments model protein structures accurately” • Inbar et al., “Protein structure prediction via combinatorial assembly of sub-structural units”

More Related