Protein structure prediction. Protein domains can be defined based on:. Geometry: group of residues with the high contact density, number of contacts within domains is higher than the number of contacts between domains. - chain continuous domains
- chain continuous domains
- chain discontinous domains
Protein fold – arrangement of secondary structures into a unique topology/tertiary structure.
Example of alpha+beta proteins:
Unsolved problem: direct prediction of protein structure from the physico-chemical principles.
Solved problem: to recognize, which of known folds are similar to the fold of unknown protein.
Fold recognition is based on observations/assumptions:
Does sequence align with a protein of known structure?
Database similarity search
Protein family analysis
Relationship to known structure?
Predicted three-dimensional structural model
Three-dimensional comparative modeling
Three-dimensional structural analysis in laboratory
Is there a predicted structure?
Prediction of three-dimensional structure from its protein sequence. Different approaches:
Aims to produce protein models with accuracy close to experimental and is used for:
Recognition of similarity between the target and template.
Target – protein with unknown structure.
Template – protein with known structure.
Main difficulty – deciding which template to pick, multiple choices/template structures.
Template structure can be found by searching for structures in PDB using sequence-sequence alignment methods.
Two zones of sequence alignment.Two sequences are guaranteed to fold into the same structure if their length and sequence identity fall into “safe” zone.
Homology modeling zone
If alignment between target and template is ready, copy the backbone coordinates of those template residues that are aligned.
If two aligned residues are the same, copy their side chain coordinates as well.
Occur mostly between secondary structures, in the loop regions. Loop conformations – difficult to predict.
Approaches to loop modeling:
Side chain conformations – rotamers. In similar proteins - side chains have similar conformations.
If % identity is high - side chain conformations can be copied from template to target. If % identity is not very high - modeling of side chains using libraries of rotamers and different rotamers are scored with energy functions.
Problem: side chain configurations depend on backbone conformation which is predicted, not real
E = min(E1, E2, E3)
Energy optimization of entire structure.
Since conformation of backbone depends on conformations of side chains and vice versa - iteration approach:
Shift in backbone
Goal: to find protein with known structure which best matches a given sequence.
Since similarity between target and the closest to it template is not high, sequence-sequence alignment methods fail.
Solution: threading – sequence-structure alignment method.
Sequence-structure alignment, target sequence is compared to all structural templates from the database.
w is calculated from the frequency of amino acid contacts in PDB; ai – amino acid type of target sequence aligned with the position “i” of the template; N- number of contacts
“frozen approximation”: traceback in the alignment matrix is not possible for interactions between two amino acids, so that:
>> 3.8 Angstroms
Prediction of structure of methylglyoxal synthase based on the template of carabamoyl phosphate synthase
Protein engineering – altering protein sequence to change protein function or structure
Protein design – designing de novo protein which satisfies a given requirement
- in vivo (measuring effect on the whole cell)
- in vitro (phage display, gene is inserted into phage DNA, expressed, selected if it binds immobilized target protein)
B. Mathews, 1989:
As a result: three disulfide bonds were introduced through mutagenesis experiments in lysozyme
From B. Mathews et al