Protein structure prediction: The holy grail of bioinformatics. Proteins: Four levels of structural organization: Primary structure Secondary structure Tertiary structure Quaternary structure. Primary structure = the linear amino acid sequence.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Proteins: Four levels of structural organization:
Primary structure = the linear amino acid sequence
Secondary structure = spatial arrangement of amino-acid residues that are adjacent in the primary structure
a helix = A helical structure, whose chain coils tightly as a right-handed screw with all the side chains sticking outward in a helical array. The tight structure of the a helix is stabilized by same-strand hydrogen bonds between -NH groups and -CO groups spaced at four amino-acid residue intervals.
The b-pleated sheet is made of loosely coiled b strands are stabilized by hydrogen bonds between -NH and -CO groups from adjacent strands.
An antiparallel β sheet. Adjacent β strands run in opposite directions. Hydrogen bonds between NH and CO groups connect each amino acid to a single amino acid on an adjacent strand, stabilizing the structure.
A parallel β sheet. Adjacent β strands run in the same direction. Hydrogen bonds connect each amino acid on one strand with two different amino acids on the adjacent strand.
b sheet (parallel and antiparallel)
irregular elements (random coil)
Tertiary structure = three-dimensional structure of protein
The tertiary structure is formed by the folding of secondary structures by covalent and non-covalent forces, such ashydrogen bonds,hydrophobic interactions,salt bridgesbetween positively and negatively charged residues, as well asdisulfide bondsbetween pairs of cysteines.
Quaternary structure = spatial arrangement of subunits and their contacts.
Holoproteins & Apoproteins
Apohemoglobin = 2a + 2b
Hemoglobin = Apohemoglobin + 4Heme
Christian B. Anfinsen
Sela M, White FH, & Anfinsen CB. 1959. The reductive cleavage of disulfide bonds and its application to problems of protein structure. Biochim. Biophys. Acta. 31:417-426.
Not all proteins fold independently.
The denaturation and
renaturation of proteins
Ammonium thioglycolate (alkaline) pH 9.0-10
Glycerylmonothioglycolate (acid) pH 6.5-8.2
What do we need to know in order to state that the tertiary structure of a protein has been solved?
Ideally: We need to determine the position of all atoms and their connectivity.
Less Ideally: We need to determine the position of all Cbackbone structure).
Secondary structure prediction
Secondary structure prediction
COIL (everything else)
sequence 1 sequence 2
Replace both sequences with
an engineered peptide (“chameleon”)
a -helix b-strand
Source: Minor and Kim. 1996. Nature 380:730-734
Qindex: (Qhelix, Qstrand, Qcoil, Q3)
- even a random assignment of structure can achieve a high score (Holley & Karpus 1991)
Ca= 1 (=100%)
Chou & Fasman (1974 & 1978) :
Some residues have particular secondary-structure preferences. Based on empirical frequencies of residues in -helices, -sheets, and coils.
Examples: Glu α-helix
1. A biological idea –
Using evolutionary information based on conservation analysis of multiple sequence alignments.
2. A technological idea –
Using neural networks.
An attempt to imitate the human brain (assuming that this is the way it works).
Exploit evolutionary information. Based on conservation analysis of multiple sequence alignments.
Rost B, Sander, C. (1993) J. Mol. Biol. 232, 584-599.
Jones, D. T. (1999) J. Mol. Biol. 292, 195-202.
Arguably remains the top secondary structure prediction method(won all CASP competitions since 1998).
Secondary Structure Prediction
September 13, 2011
More than 13,137,813known protein sequences, 76,495experimentally determined structures.
The gap is getting bigger.
COIL (everything else)
Folds: Compact folding arrangements of a polypeptide chain (a protein or part of a protein).
The terms “domain” and “fold” are sometimes used interchangeably.
Found in Calcium binding proteins such as Calmodulin
Four helix bundle
The structure of a protein consists of the 3D (X,Y,Z) coordinates of each non-hydrogen atom of the protein.
Some protein structure also include coordinates of covalently linked prosthetic groups, non-covalently linked ligand molecules, or metal ions.
For some purposes (e.g. structural alignment) only the Cα coordinates are needed.
Example of PDB format: X Y Z occupancy / temp. factor
ATOM 18 N GLY 27 40.315 161.004 11.211 1.00 10.11
ATOM 19 CA GLY 27 39.049 160.737 10.462 1.00 14.18
ATOM 20 C GLY 27 38.729 159.239 10.784 1.00 20.75
ATOM 21 O GLY 27 39.507 158.484 11.404 1.00 21.88
Note: the PDB format provides no information about connectivity between atoms. The last two numbers (occupancy, temperature factor) relate to disorders of atomic positions in crystals.
Goal: 3d structure from 1d sequence
An existing fold
A new fold
Based on the two major observations (and some simplifications):
Homology Modeling: How it works
[Rost, Protein Eng. 1999]
Which of the known folds is likely to be similar to the (unknown) fold of a new protein when only its amino-acid sequence is known?
MTYKLILN …. NGVDGEWTYTE
The number of unique structural (domain) folds in nature is fairly small (possibly a few thousand)
90% of new structures submitted to PDB in the past three years have similar structural folds in PDB
Goal: Predict structure from “first principles”
Qian et al. (Nature: 2007) used distributed computing* to predict the 3D structure of a protein from its amino-acid sequence. Here, their predicted structure (grey) of a protein is overlaid with the experimentally determined crystal structure (color) of that protein. The agreement between the two is excellent.
*70,000 home computers for about two years.
3-D Protein Model
links to lots of protein prediction resources
The root mean square deviation (RMSD) is the measure of the average distance between the backbones of superimposed proteins. In the study of globular protein conformations, one customarily measures the similarity in three-dimensional structure by the RMSD of the Cα atomic coordinates after optimal rigid body superposition.
A widely used way to compare the structures of biomolecules or solid bodies is to “translate” or rotate one structure with respect to the other to minimize the RMSD. This RMSDmin can be used as a distance measure between two proteins.