CSCE555 Bioinformatics. Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: http://www.scigen.org/csce555. University of South Carolina Department of Computer Science and Engineering 2008 www.cse.sc.edu. Outline.
Meeting: MW 4:00PM-5:15PM SWGN2A21
Instructor: Dr. Jianjun Hu
Course page: http://www.scigen.org/csce555
University of South Carolina
Department of Computer Science and Engineering
Computationally it is an exceedingly difficult problem!
We assume that there are three conformations for each amino acid (ex. α-helix, β-sheet and random coil). If a protein is made up of 100 amino acid residues, a total number of conformations is
3100 = 515377520732011331036461129765621272702107522001≒5 x 1047.
If 100 psec (10-10 sec) were required to convert from a conformation to anotherone, a random search of all conformations would require
5 x 1047x 10-10 sec ≒1.6 x 1030 years.
However, folding of proteins takesplace in msec to sec order. Therefore, proteins fold not via a random search but a more sophisticated search process.
We want to watch the folding process of a protein using molecular simulation techniques.
1- "Collapse"- driving force is burial of hydrophobic aa’s
(fast - msecs)
2- Molten globule - helices & sheets form, but "loose"
(slow - secs)
3- "Final" native folded state - compaction, some
2' structures rearranged
Native state? - assumed to be lowest free energy
- may be an ensemble of structures
Knowledge based approaches
Anfinsen’s theory: Protein native structure corresponds to the state with the lowest free energy of the protein-solvent system.
Provides both folding pathway & folded structure
Can only apply to very small proteins
Empirical all-atom forcefields: CHARMM, AMBER, ECEPP-3, GROMOS, OPLS
Parameterization: Quantum mechanical calculations, experimental data
Simplified potential: UNRES (united residue)
We are interested in minimum points on Potential Energy Surface (PES)
1. Make random move and produce a new conformation
2. Calculate the energy change E for the new conformation
3. Accept or reject the move based on the Metropolis criterion
If E<0, P>1, accept new conformation;
Otherwise: P>rand(0,1), accept, else reject.
Two primary methods
1) Homology modeling
2) Threading (fold recognition)
Both rely on availability of experimentally determined structures that are "homologous" or at least structurally very similar to target
Provide folded structure only
(can combine steps 1 & 2 by using PDB-BLAST)
Identify “best” fit between target sequence & template structure
The number of unique structural folds in nature is fairly small (probably 2000-3000)
Until very recently, 90% of new structures submitted to PDB had similar structural folds in PDB
Align target sequencewith template structures
(fold library) from the Protein Data Bank (PDB)
Calculate energy score to evaluate goodness of fit between target sequence & template structure
Rank models based on energy scores
Find “correct” sequence-structure alignment of a target sequence with its native-like fold in PDB
Bad alignment Bad score!
(eg, ASTRAL domain library derived from the PDB)
Supplement with additional decoys, e.g., generated using
ab initio approach such as Rosetta (Baker)
based on contact statistics from PDB
Miyazawa & Jernigan (ISU)
What is "probability" that two specific residues are in contact?
How well does a specific residue fit structural environment?
Alignment gap penalty?
Total energy: Ep + Es + Eg
Goal: Find a sequence-structure alignment that minimizes the energy function
The goal of CAFASP is to evaluate the performance of fully automatic structure prediction servers available to the community. In contrast to the normal CASP procedure, CAFASP aims to answer the question of how well servers do without any intervention of experts, i.e. how well ANY user using only automated methods can predict protein structure. CAFASP assesses the performance of methods without the user intervention allowed in CASP.
Servers with name
in italic are
MaxSub score ranges from 0 to 1
Therefore, maximum total score is 30
(http://ww.cs.bgu.ac.il/~dfischer/CAFASP3, released in December, 2002.)
Red: true structure
Blue: correct part of prediction
Green: wrong part of prediction