Protein structure prediction: The holy grail of bioinformatics - PowerPoint PPT Presentation

protein structure prediction the holy grail of bioinformatics n.
Skip this Video
Loading SlideShow in 5 Seconds..
Protein structure prediction: The holy grail of bioinformatics PowerPoint Presentation
Download Presentation
Protein structure prediction: The holy grail of bioinformatics

play fullscreen
1 / 97
Protein structure prediction: The holy grail of bioinformatics
Download Presentation
Download Presentation

Protein structure prediction: The holy grail of bioinformatics

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Protein structure prediction:The holy grail of bioinformatics

  2. Proteins: Four levels of structural organization: Primary structure Secondary structure Tertiary structure Quaternary structure

  3. Primary structure = the linear amino acid sequence

  4. Secondary structure = spatial arrangement of amino-acid residues that are adjacent in the primary structure

  5. a helix = A helical structure, whose chain coils tightly as a right-handed screw with all the side chains sticking outward in a helical array. The tight structure of the a helix is stabilized by same-strand hydrogen bonds between -NH groups and -CO groups spaced at four amino-acid residue intervals.

  6. The b-pleated sheet is made of loosely coiled b strands are stabilized by hydrogen bonds between -NH and -CO groups from adjacent strands.

  7. An antiparallel β sheet. Adjacent β strands run in opposite directions. Hydrogen bonds between NH and CO groups connect each amino acid to a single amino acid on an adjacent strand, stabilizing the structure.

  8. A parallel β sheet. Adjacent β strands run in the same direction. Hydrogen bonds connect each amino acid on one strand with two different amino acids on the adjacent strand.

  9. Silk fibroin

  10. a helix b sheet (parallel and antiparallel) tight turns flexible loops irregular elements (random coil)

  11. Tertiary structure = three-dimensional structure of protein

  12. The tertiary structure is formed by the folding of secondary structures by covalent and non-covalent forces, such ashydrogen bonds,hydrophobic interactions,salt bridgesbetween positively and negatively charged residues, as well asdisulfide bondsbetween pairs of cysteines.

  13. Quaternary structure = spatial arrangement of subunits and their contacts.

  14. Holoproteins & Apoproteins Holoprotein Prosthetic group Apoprotein Holoprotein Prosthetic group

  15. Apohemoglobin = 2a + 2b

  16. Prosthetic group Heme

  17. Hemoglobin = Apohemoglobin + 4Heme

  18. Christian B. Anfinsen 1916-1995 Sela M, White FH, & Anfinsen CB. 1959. The reductive cleavage of disulfide bonds and its application to problems of protein structure. Biochim. Biophys. Acta. 31:417-426.

  19. Not all proteins fold independently. Chaperones.

  20. The denaturation and renaturation of proteins

  21. Reducing agents: Ammonium thioglycolate (alkaline) pH 9.0-10 Glycerylmonothioglycolate (acid) pH 6.5-8.2

  22. Oxidant

  23. What do we need to know in order to state that the tertiary structure of a protein has been solved? Ideally: We need to determine the position of all atoms and their connectivity. Less Ideally: We need to determine the position of all Cbackbone structure).

  24. Protein structure: Limitations and caveats • Not all proteins or parts of proteins assume a well-defined 3D structure in solution. • Protein structure is not static, there are various degrees of thermal motion for different parts of the structure. • There may be a number of slightly different conformations in solution. • Some proteins undergo conformational changes when interacting with STUFF.

  25. Experimental Protein Structure Determination • X-ray crystallography • most accurate • in vitro • needs crystals • ~$100-200K per structure • NMR • fairly accurate • in vivo • no need for crystals • limited to very small proteins • Cryo-electron-microscopy • imaging technology • low resolution

  26. Why predict protein structure? • Structural knowledge = some understanding of function and mechanism of action • Predicted structures can be used in structure-based drug design • It can help us understand the effects of mutations on structure and function • It is a very interesting scientific problem (still unsolved in its most general form after more than 50 years of effort)

  27. Secondary structure prediction

  28. Secondary structure prediction • Historically first structure prediction methods predicted secondary structure • Can be used to improve alignment accuracy • Can be used to detect domain boundaries within proteins with remote sequence homology • Often the first step towards 3D structure prediction • Informative for mutagenesis studies

  29. Protein Secondary Structures (Simplifications) -HELIX -STRAND COIL (everything else)

  30. Assumptions • The entire information for forming secondary structure is contained in the primary sequence • side groups of residues will determine structure • examining windows of 13-17 residues is sufficient to predict secondary structure • a-helices 5–40 residues long • b-strands 5–10 residues long

  31. Predicting Secondary Structure From Primary Structure • accuracy 64-75% • higher accuracy for a-helices than for b-sheets • accuracy is dependent on protein family • predictions of engineered (artificial) proteins are less accurate

  32. A surprising result! Chameleon sequences

  33. The “Chameleon” sequence sequence 1 sequence 2 TEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEK Replace both sequences with an engineered peptide (“chameleon”) TEAVDAWTVEKAFKTFANDNGVDGAWTVEKAFKTFTVTEK a -helix b-strand Source: Minor and Kim. 1996. Nature 380:730-734

  34. Measures of prediction accuracy • Qindex and Q3 • Correlation coefficient

  35. Qindex Qindex: (Qhelix, Qstrand, Qcoil, Q3) • percentage of residues correctly predicted as a-helix, b-strand, coil, or for all 3 conformations. Drawbacks: - even a random assignment of structure can achieve a high score (Holley & Karpus 1991)

  36. Correlation coefficient Ca= 1 (=100%)

  37. Methods of secondary structure prediction

  38. First generation methods: single residue statistics Chou & Fasman (1974 & 1978) : Some residues have particular secondary-structure preferences. Based on empirical frequencies of residues in -helices, -sheets, and coils. Examples: Glu α-helix Val β-strand

  39. Chou-Fasman method

  40. Chou-Fasman Method • Accuracy: Q3 = 50-60%

  41. Second generation methods: segment statistics • Similar to single-residue methods, but incorporating additional information (adjacent residues, segmental statistics). • Problems: • Low accuracy - Q3 below 66% (results). • Q3 of -strands (E) : 28% - 48%. • Predicted structures were too short.

  42. The GOR method • developed by Garnier, Osguthorpe & Robson • build on Chou-Fasman Pij values • evaluate each residue PLUS adjacent 8 N-terminal and 8 carboxyl-terminal residues • sliding window of 17 residues • underpredicts b-strand regions • GOR method accuracy Q3 = ~64%

  43. Third generation methods • Third generation methods reached 77% accuracy. • They consist of two new ideas: 1. A biological idea – Using evolutionary information based on conservation analysis of multiple sequence alignments. 2. A technological idea – Using neural networks.

  44. Artificial Neural Networks An attempt to imitate the human brain (assuming that this is the way it works).