1 / 97

Protein structure prediction: The holy grail of bioinformatics

Protein structure prediction: The holy grail of bioinformatics. Proteins: Four levels of structural organization: Primary structure Secondary structure Tertiary structure Quaternary structure. Primary structure = the linear amino acid sequence.

Download Presentation

Protein structure prediction: The holy grail of bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Protein structure prediction:The holy grail of bioinformatics

  2. Proteins: Four levels of structural organization: Primary structure Secondary structure Tertiary structure Quaternary structure

  3. Primary structure = the linear amino acid sequence

  4. Secondary structure = spatial arrangement of amino-acid residues that are adjacent in the primary structure

  5. a helix = A helical structure, whose chain coils tightly as a right-handed screw with all the side chains sticking outward in a helical array. The tight structure of the a helix is stabilized by same-strand hydrogen bonds between -NH groups and -CO groups spaced at four amino-acid residue intervals.

  6. The b-pleated sheet is made of loosely coiled b strands are stabilized by hydrogen bonds between -NH and -CO groups from adjacent strands.

  7. An antiparallel β sheet. Adjacent β strands run in opposite directions. Hydrogen bonds between NH and CO groups connect each amino acid to a single amino acid on an adjacent strand, stabilizing the structure.

  8. A parallel β sheet. Adjacent β strands run in the same direction. Hydrogen bonds connect each amino acid on one strand with two different amino acids on the adjacent strand.

  9. Silk fibroin

  10. a helix b sheet (parallel and antiparallel) tight turns flexible loops irregular elements (random coil)

  11. Tertiary structure = three-dimensional structure of protein

  12. The tertiary structure is formed by the folding of secondary structures by covalent and non-covalent forces, such ashydrogen bonds,hydrophobic interactions,salt bridgesbetween positively and negatively charged residues, as well asdisulfide bondsbetween pairs of cysteines.

  13. Quaternary structure = spatial arrangement of subunits and their contacts.

  14. Holoproteins & Apoproteins Holoprotein Prosthetic group Apoprotein Holoprotein Prosthetic group

  15. Apohemoglobin = 2a + 2b

  16. Prosthetic group Heme

  17. Hemoglobin = Apohemoglobin + 4Heme

  18. Christian B. Anfinsen 1916-1995 Sela M, White FH, & Anfinsen CB. 1959. The reductive cleavage of disulfide bonds and its application to problems of protein structure. Biochim. Biophys. Acta. 31:417-426.

  19. Not all proteins fold independently. Chaperones.

  20. The denaturation and renaturation of proteins

  21. Reducing agents: Ammonium thioglycolate (alkaline) pH 9.0-10 Glycerylmonothioglycolate (acid) pH 6.5-8.2

  22. Oxidant

  23. What do we need to know in order to state that the tertiary structure of a protein has been solved? Ideally: We need to determine the position of all atoms and their connectivity. Less Ideally: We need to determine the position of all Cbackbone structure).

  24. Protein structure: Limitations and caveats • Not all proteins or parts of proteins assume a well-defined 3D structure in solution. • Protein structure is not static, there are various degrees of thermal motion for different parts of the structure. • There may be a number of slightly different conformations in solution. • Some proteins undergo conformational changes when interacting with STUFF.

  25. Experimental Protein Structure Determination • X-ray crystallography • most accurate • in vitro • needs crystals • ~$100-200K per structure • NMR • fairly accurate • in vivo • no need for crystals • limited to very small proteins • Cryo-electron-microscopy • imaging technology • low resolution

  26. Why predict protein structure? • Structural knowledge = some understanding of function and mechanism of action • Predicted structures can be used in structure-based drug design • It can help us understand the effects of mutations on structure and function • It is a very interesting scientific problem (still unsolved in its most general form after more than 50 years of effort)

  27. Secondary structure prediction

  28. Secondary structure prediction • Historically first structure prediction methods predicted secondary structure • Can be used to improve alignment accuracy • Can be used to detect domain boundaries within proteins with remote sequence homology • Often the first step towards 3D structure prediction • Informative for mutagenesis studies

  29. Protein Secondary Structures (Simplifications) -HELIX -STRAND COIL (everything else)

  30. Assumptions • The entire information for forming secondary structure is contained in the primary sequence • side groups of residues will determine structure • examining windows of 13-17 residues is sufficient to predict secondary structure • a-helices 5–40 residues long • b-strands 5–10 residues long

  31. Predicting Secondary Structure From Primary Structure • accuracy 64-75% • higher accuracy for a-helices than for b-sheets • accuracy is dependent on protein family • predictions of engineered (artificial) proteins are less accurate

  32. A surprising result! Chameleon sequences

  33. The “Chameleon” sequence sequence 1 sequence 2 TEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEK Replace both sequences with an engineered peptide (“chameleon”) TEAVDAWTVEKAFKTFANDNGVDGAWTVEKAFKTFTVTEK a -helix b-strand Source: Minor and Kim. 1996. Nature 380:730-734

  34. Measures of prediction accuracy • Qindex and Q3 • Correlation coefficient

  35. Qindex Qindex: (Qhelix, Qstrand, Qcoil, Q3) • percentage of residues correctly predicted as a-helix, b-strand, coil, or for all 3 conformations. Drawbacks: - even a random assignment of structure can achieve a high score (Holley & Karpus 1991)

  36. Correlation coefficient Ca= 1 (=100%)

  37. Methods of secondary structure prediction

  38. First generation methods: single residue statistics Chou & Fasman (1974 & 1978) : Some residues have particular secondary-structure preferences. Based on empirical frequencies of residues in -helices, -sheets, and coils. Examples: Glu α-helix Val β-strand

  39. Chou-Fasman method

  40. Chou-Fasman Method • Accuracy: Q3 = 50-60%

  41. Second generation methods: segment statistics • Similar to single-residue methods, but incorporating additional information (adjacent residues, segmental statistics). • Problems: • Low accuracy - Q3 below 66% (results). • Q3 of -strands (E) : 28% - 48%. • Predicted structures were too short.

  42. The GOR method • developed by Garnier, Osguthorpe & Robson • build on Chou-Fasman Pij values • evaluate each residue PLUS adjacent 8 N-terminal and 8 carboxyl-terminal residues • sliding window of 17 residues • underpredicts b-strand regions • GOR method accuracy Q3 = ~64%

  43. Third generation methods • Third generation methods reached 77% accuracy. • They consist of two new ideas: 1. A biological idea – Using evolutionary information based on conservation analysis of multiple sequence alignments. 2. A technological idea – Using neural networks.

  44. Artificial Neural Networks An attempt to imitate the human brain (assuming that this is the way it works).

More Related