Bioinformatics . Ayesha M. Khan Spring 2013. Protein M odelling. What is protein modelling ?. Suppose we have no resources or expertise of X-ray crystallography or NMR, but only the protein sequence (target) available and we would like to know its 3D structure.
Ayesha M. Khan
Many programs for automated homology modeling are now available, so anyone can construct a homology model on a regular PC. However, construction of a “good” homology model (at least for sequences that are not highly similar) usually requires some expertise and usually should be done with human intervention, rather than in a fully automated fashion.
A few of the freely available programs for homology modeling:
SWISS-MODEL– Produces accurate models; fast; good tutorials available. http://swissmodel.expasy.org/
I-TASSER– Produces accurate models; easy to use, but slow
Modeller– must be downloaded and installed locally http://salilab.org/modeller/modeller.html
The rate of new protein sequence determination is far outpacing the rate of structure determination by X-ray crystallography and NMR. Therefore, initiatives are underway to automatically generate homology models for large numbers of new protein sequences.
One database of automatically generated homology models is SWISS-MODEL Repository: http://swissmodel.expasy.org/repository/
Is a homology model CORRECT?
Since the actual (experimentally determined) structure of the target is not known, there is no way to say whether or not the homology model is “correct.”
The best a researcher can do is compare the homology model to the structure of the template from which it was derived. If the atom positions in the model do not deviate very much from those of the template, the homology model is said to be “accurate.” The greater the deviation between model and template, the lower the accuracy of the model.
When is a homology model definitely INCORRECT?
A homology model has regions that are incorrect if it contains structural features that do not occur in native proteins, such as:
• Hydrophobic side chains on the surface of the model (these side chains should be buried)
• Buried polar or ionic groups that do not have their hydrogen-bonding or ionic-bonding capabilities “satisfied” by neighboring groups
• Unreasonable bond lengths or angles
• Unfavorable noncovalent contacts between atoms (clashes)
• Unreasonable dihedral angles
Threading for tertiary structure prediction
Structure is more conserved than sequence, so many proteins share similar folds, even in the absence of sequence similarity.
If a suitable template does not exist for homology modeling of a target sequence, threading can be used to identify a potential structure for the target from among known structures of proteins that do not share significant sequence similarity with the target sequence.
Threading predicts the structural fold of a protein by fitting its sequence into a structural database and selecting the best-fitting fold. Essentially, the target sequence is tested for compatibility with all structures in the database. Various methods are used to compare the target sequence to the known structures and determine which one, if any, it fits best.
Unlike homology modeling, threading does not result in an all-atom structural model for the target sequence. Nevertheless, these relatively poor models can still potentially provide insight into the function of a new protein.
There is a high rate of false positives when using threading.
GenTHREADER– another version called pGenTHREADER makes use of profiles and predicted secondary structure to increase accuracy.
3D-PSSM– beware: template library may be outdated
Structures A-C are homology models based on about 60% (A), 40% (B), and 30% (C) sequence identity to their template structure.
Structures D and E are ab initio predictions using a program called Rosetta.
Predicted structures are in red, and actual structures are in blue.
The accuracy of the models decrease significantly in going from A to E, but the overall structure is still roughly correct.
CASP: Critical Assessment of Techniques for Protein Structure Prediction
CASP is an international contest held every two years in which scientists try to predict the structures of proteins using methods they have developed that include homology modeling, threading, and ab initio techniques.
Contestants are given the sequences of proteins whose structures have been determined by x-ray crystallography or NMR but have not yet been made public. After contestants have made and submitted their predictions, the actual structures are released, the predictions are compared to the actual structures, and the predictions are assessed for accuracy.
The CASP contest is a major driving force in the development of tertiary structure prediction methods.
CASP began in 1994; CASP10 was held in 2012.
When dealing with predicted protein structures, it is important to remember:
“Models are not molecules observed”
No matter how they are obtained, before we ask what they tell us, we must ask how well macromolecular models fit with other things we already know.
A model is like any scientific theory: it is useful only to the extent that it supports predictions that we can test by experiment. Our initial confidence in it is justified only to the extent that it fits what we already know. Our confidence can grow only if its predictions are verified.