Protein Structural Prediction - PowerPoint PPT Presentation

paul2
protein structural prediction l.
Skip this Video
Loading SlideShow in 5 Seconds..
Protein Structural Prediction PowerPoint Presentation
Download Presentation
Protein Structural Prediction

play fullscreen
1 / 39
Download Presentation
Protein Structural Prediction
325 Views
Download Presentation

Protein Structural Prediction

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Protein Structural Prediction

  2. Protein Structure is Hierarchical

  3. Structure Determines Function The Protein Folding Problem • What determines structure? • Energy • Kinematics • How can we determine structure? • Experimental methods • Computational predictions

  4. Primary Structure: Sequence • The primary structure of a protein is the amino acid sequence

  5. Primary Structure: Sequence • Twenty different amino acids have distinct shapes and properties

  6. Primary Structure: Sequence A useful mnemonic for the hydrophobic amino acids is "FAMILY VW"

  7. Secondary Structure: , , & loops •  helices and  sheets are stabilized by hydrogen bonds between backbone oxygen and hydrogen atoms

  8. Secondary Structure:  helix

  9. Secondary Structure:  sheet b sheet b buldge

  10. Second-and-a-half-ary Structure: Motifs beta helix beta barrel beta trefoil

  11. Tertiary Structure: Domains

  12. Mosaic Proteins

  13. Tertiary Structure: A Protein Fold

  14. Protein Folds Composed of , , other

  15. Quaternary Structure: Multimeric Proteins or Functional Assemblies • Multimeric Proteins • Macromolecular Assemblies Ribosome:Protein Synthesis Hemoglobin: A tetramer Replisome: DNA copying

  16. Protein Folding • The amino-acid sequence of a protein determines the 3D fold [Anfinsen et al., 1950s] Some exceptions: • All proteins can be denatured • Some proteins have multiple conformations • Some proteins get folding help from chaperones • The function of a protein is determined by its 3D fold • Can we predict 3D fold of a protein given its amino-acid sequence?

  17. The Leventhal Paradox • Given a small protein (100aa) assume 3 possible conformations/peptide bond • 3100 = 5 × 1047 conformations • Fastest motions 10- 15 sec so sampling all conformations would take 5 × 1032 sec • 60 × 60 × 24 × 365 = 31536000 seconds in a year • Sampling all conformations will take 1.6 × 1025 years • Each protein folds quickly into a single stable native conformation ­ the Leventhal paradox

  18. Quick Overview of Energy

  19. The Hydrophobic Effect • Important for folding, because every amino acid participates! Fauchere and Pilska (1983). Eur. J. Med. Chem. 18, 369-75. Experimentally Determined Hydrophobicity Levels

  20. Protein Structure Determination • Experimental • X-ray crystallography • NMR spectrometry • Computational – Structure Prediction (The Holy Grail) Sequence implies structure, therefore in principle we can predict the structure from the sequence alone

  21. Protein Structure Prediction • ab initio • Use just first principles: energy, geometry, and kinematics • Homology • Find the best match to a database of sequences with known 3D-structure • Threading • Meta-servers and other methods

  22. Ab initio Prediction • Sampling the global conformation space • Lattice models / Discrete-state models • Molecular Dynamics • Pre-set libraries of fragment 3D motifs • Picking native conformations with an energy function • Solvation model: how protein interacts with water • Pair interactions between amino acids • Predicting secondary structure • Local homology • Fragment libraries

  23. Lattice String Folding • HP model: main modeled force is hydrophobic attraction • NP-hard in both 2-D square and 3-D cubic • Constant approximation algorithms • Not so relevant biologically

  24. Lattice String Folding

  25. ? ? ? ROSETTAhttp://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php http://depts.washington.edu/bakerpg/papers/Bonneau-ARBBS-v30-p173.pdf • Monte Carlo based method • Limit conformational search space by using sequence—structure motif I-Sites library (http://isites.bio.rpi.edu/Isites/) • 261 patterns in library • Certain positions in motif favor certain residues • Remove all sequences with <25% identity • Find structures of the 25 nearest sequence neighbors of each 9-mer Rationale • Local structures often fold independently of full protein • Can predict large areas of protein by matching sequence to I-Sites

  26. Non polar helix Abundance of alanine at all positions Non-polar side chains favored at positions 3, 6, 10 (methionine, leucine, isoleucine) I-Sites Examples • Amphipathic helix • Non-polar side chains favored at positions 6, 9, 13, 16 (methionine, leucine, isoleucine) • Polar side chains favored at positions 1, 8, 11, 18 (glutamic acid, lysine)

  27. ? ? ? ROSETTA Method • New structures generated by swapping compatible fragments • Accepted structures are clustered based on energy and structural size • Best cluster is one with the greatest number of conformations within 4-Å rms deviation structure of the center • Representative structures taken from each of the best five clusters and returned to the user as predictions

  28. Robetta & Rosetta

  29. Rosetta results in CASP

  30. Rosetta Results • In CASP4, Rosetta’s best models ranged from 6–10 Å rmsd C • For comparison, good comparative models give 2-5 Å rmsd C • Most effective with small proteins (<100 residues) and structures with helices

  31. Only a few folds are found in nature

  32. The SCOP Database Structural Classification Of Proteins FAMILY: proteins that are >30% similar, or >15% similar and have similar known structure/function SUPERFAMILY: proteins whose families have some sequence and function/structure similarity suggesting a common evolutionary origin COMMON FOLD: superfamilies that have same secondary structures in same arrangement, probably resulting by physics and chemistry CLASS: alpha, beta, alpha–beta, alpha+beta, multidomain

  33. Status of Protein Databases PDB SCOP: Structural Classification of Proteins. 1.67 release24037 PDB Entries (15 May 2004). 65122 Domains. EMBL

  34. Evolution of Proteins – Domains • #members in different families obey power law • 429 families common in all 14 eukaryotes; • 80% of animal domains, 90% of fungi domains • 80% of proteins are multidomain in eukaryotes; • domains usually combine pairwise in same order --why? Chothia, Gough, Vogel, Teichmann, Science 300:1701-17-3, 2003 Evolution of proteins happens mainly through duplication, recombination, and divergence

  35. Homology-based Prediction • Align query sequence with sequences of known structure, usually >30% similar • Superimpose the aligned sequence onto the structure template, according to the computed sequence alignment • Perform local refinement of the resulting structure in 3D The number of unique structural folds is small (possibly a few thousand) 90% of new structures submitted to PDB in the past three years have similar folds in PDB

  36. Examples of Fold Classes

  37. Raw model Loop modeling Side chain placement Refinement Homology-based Prediction

  38. Homology-based Prediction