An Optimization Approach to Protein Structure Prediction. Richard Byrd Betty Eskow Robert Schnabel Brett Bader Lianjun Jiang University of Colorado Teresa Head-Gordon Univ. of California, Berkeley Silvia Crivelli Lawrence Berkeley Laboratory. Problem Definition.
University of Colorado
Univ. of California, Berkeley
Lawrence Berkeley Laboratory
Predict the 3-dimensional shape, or
native state, of a protein given its
sequence of constituent amino acids.
Assuming the native state of a protein corresponds to its minimumfree energy state, use a global optimization method to find the minimum energy configuration of the target protein.
Proteins consist of a long chain of
amino acids called the primary structure.
The constituent amino acids may encourage hydrogen bonding and form regular structures, called secondary structures.
The secondary structures fold
together to form a compact
3-dimensional or tertiary structure.
RChemistry of Proteins
Hydrogen bonds strongly influence a protein’s shape. They largely occur in secondary structures and help hold the protein together.
e.g., modestly sized protein
The 3-dimensional structure of the protein found in nature is
believed to minimize potential energy:
where x = atom coordinates
(b = bond length)
(q = bond angle)
cd[1 + cos(n +)]
(w = dihedral angle)
(rij = distance)
(j = Lennard-Jones potential)
Internalcoordinates are determined using bonds, bond angles
and dihedral angles.
Internal coordinates are determined using bonds, bond angles and dihedralangles
i,j are aliphatic carbons, M Gaussians with position(ck ), depth(hk) and width(wk) describe 2 minima: (1) molecules in contact and (2)mol-ecules separated by a distance of 1 water molecule.
Given the amino acid sequence of a
protein, find the 3-dimensional
structure likely to be found in nature.
Simplify problem by utilizing domain-specific knowledge
Select a protein
and a subset of
Cluster minima and test stopping criteria
E= dihedrals kf[1 – cos(f - f0)] + k[1 – cos( - 0)]
EHB=wiwi+4 / Dri,i+4 (w’s are weights from the server for residues i and i+4 in the helix)
EHB= wiwj / Dri,j
Neural nets trained on a large database of proteins can predict secondary structure likely to be in a target protein.
BBBB B AAAAAAA BBBBB
13552 6789992 56673
Forming β-sheets from the predicted aqueous environment-strands is a combinatorial problem.
Which strands are paired?
Which residues are paired?
Distribution of Beta Sheets in Proteins with Applications to Structure Prediction
Ruckzinski, Kooperberg, Bonneau, and Baker
Proteins 48, 2002
Massively parallel exploration of optimization space
2UTG_A: 7.5Å R.M.S.D. from Crystal Structure Prediction
1POU: 6.3Å R.M.S.D. from NMR structure
Results on Phospholipase C beta C-terminus, turkey (containing 242 amino acids). Ribbon structure comparison between experiment (center), submitted M1 prediction (right), our lowest energy submission, had an RMSD with experiment of 8.46Å, and next generation run of the global optimization algorithm (left). This new run lowered the energy of our previous best minimizer, resulting in a new structure with an RMSD of 7.7Å.
Best structure predicted on one of the hardest targets
Our method is more effective than some knowledge-based methods on targets for which less information from known proteins is available.
Global optimization algorithm is very effective at improving structures from a small initial population.