csce555 bioinformatics n.
Skip this Video
Loading SlideShow in 5 Seconds..
CSCE555 Bioinformatics PowerPoint Presentation
Download Presentation
CSCE555 Bioinformatics

Loading in 2 Seconds...

play fullscreen
1 / 36

CSCE555 Bioinformatics - PowerPoint PPT Presentation

  • Uploaded on

CSCE555 Bioinformatics. Lecture 18 Protein Tertiary Structure Prediction Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: University of South Carolina Department of Computer Science and Engineering 2008 Outline.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'CSCE555 Bioinformatics' - anise

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
csce555 bioinformatics
CSCE555 Bioinformatics
  • Lecture 18 Protein Tertiary Structure Prediction

Meeting: MW 4:00PM-5:15PM SWGN2A21

Instructor: Dr. Jianjun Hu

Course page:

University of South Carolina

Department of Computer Science and Engineering


  • Experimental limitation of protein structure determination
  • Tertiary Structure Prediction
    • AB initio
    • Homology modeling
    • Threading
experimental protein structure determination
Experimental Protein Structure Determination
  • High-resolution structure determination
    • X-ray crystallography (<1A)
    • Nuclear magnetic resonance (NMR) (~1-2.5A)
  • Lower-resolution structure determination
    • Cryo-EM (electron-microscropy) ~10-15A
  • Theoretical Models?
    • Highly variable - but a few equiv to X-ray!
tertiary structure prediction
Tertiary Structure Prediction
  • Fold or tertiary structure prediction problem can be formulated as a search for minimum energy conformation
    • Search space is defined by psi/phi angles of backbone and side-chain rotamers
    • Search space is enormous even for small proteins!
    • Number of local minima increases exponentially with number of residues

Computationally it is an exceedingly difficult problem!

levinthal paradox of protein folding how nature does search
LevinthalParadox of Protein Folding: How nature does search?

We assume that there are three conformations for each amino acid (ex. α-helix, β-sheet and random coil). If a protein is made up of 100 amino acid residues, a total number of conformations is

3100 = 515377520732011331036461129765621272702107522001≒5 x 1047.

If 100 psec (10-10 sec) were required to convert from a conformation to anotherone, a random search of all conformations would require

5 x 1047x 10-10 sec ≒1.6 x 1030 years.

However, folding of proteins takesplace in msec to sec order. Therefore, proteins fold not via a random search but a more sophisticated search process.

We want to watch the folding process of a protein using molecular simulation techniques.

steps in protein folding
Steps in Protein Folding

1- "Collapse"- driving force is burial of hydrophobic aa’s

(fast - msecs)

2- Molten globule - helices & sheets form, but "loose"

(slow - secs)

3- "Final" native folded state - compaction, some

2' structures rearranged

Native state? - assumed to be lowest free energy

- may be an ensemble of structures


Protein Folding Funnel

Local mimina

Global minimum

Native Structure

protein structure prediction
Protein Structure Prediction
  • Ab initio
    • Use just first principles: energy, geometry, and kinematics
  • Homology
    • Find the best match to a database of sequences with known 3D-structure


  • Threading
  • Meta-servers and other methods

Knowledge based approaches

ab initio prediction
Ab Initio Prediction
  • Basic idea

Anfinsen’s theory: Protein native structure corresponds to the state with the lowest free energy of the protein-solvent system.

  • General procedures
    • Develop a Potential/Energy function
      • Evaluate the energy of protein conformation
      • Select native structure
    • Conformational search algorithm
      • To produce new conformations
      • Search the potential energy surface and locate the global minimum (native conformation)

Provides both folding pathway & folded structure

Can only apply to very small proteins

potential functions for psp
Potential Functions for PSP
  • Potential function
    • Physical based energy function

Empirical all-atom forcefields: CHARMM, AMBER, ECEPP-3, GROMOS, OPLS

Parameterization: Quantum mechanical calculations, experimental data

Simplified potential: UNRES (united residue)

    • Solvation energy
      • Implicit solvation model: Generalized Born (GB) model, surface area based model
      • Explicit solvation model: TIP3P (computationally expensive)
general form of all atom forcefields



General Form of All-atom Forcefields




Bond stretching term

Angle bending term

Dihedral term

The most time demanding part.

Van der Waals term

H-bonding term

Electrostatic term




search potential energy surface
Search Potential Energy Surface

We are interested in minimum points on Potential Energy Surface (PES)

  • Conformational search techniques
    • Energy Minimization
    • Monte Carlo
    • Molecular Dynamics
    • Others: Genetic Algorithm, Simulated Annealing
energy minimization
Energy Minimization
  • Energy minimization
  • Methods
    • First-order minimization: Steepest descent, Conjugate gradient minimization
    • Second derivative methods: Newton-Raphson method
    • Quasi-Newton methods: L-BFGS

Local miminum

monte carlo
Monte Carlo
  • In molecular simulations, ‘Monte Carlo’ is an importance sampling technique.

1. Make random move and produce a new conformation

2. Calculate the energy change E for the new conformation

3. Accept or reject the move based on the Metropolis criterion

Boltzmann factor

If E<0, P>1, accept new conformation;

Otherwise: P>rand(0,1), accept, else reject.

comparative modeling knowledge based approach
Comparative Modeling (Knowledge based approach)

Two primary methods

1) Homology modeling

2) Threading (fold recognition)

Both rely on availability of experimentally determined structures that are "homologous" or at least structurally very similar to target

Provide folded structure only

homology modeling
Homology Modeling
  • Identify homologous protein sequences (-BLAST)
  • Among available structures, choose the one with closest sequence match to target as template

(can combine steps 1 & 2 by using PDB-BLAST)

  • Build model by placing residues in corresponding positions of homologous structure & refine by "tweaking"
  • Homology modeling - works "well"
    • Computationally? not very expensive
    • Accuracy? higher sequence identity  better model
      • Requires ~30% sequence identity with sequence for which structure is known
homology based prediction

Raw model

Loop modeling

Side chain placement


Homology-based Prediction
threading fold recognition
Threading - Fold Recognition
  • Threading - works "sometimes"
    • Computationally? Can be expensive or cheap, depends on energy function & whether "all atom" or "backbone only" threading
    • Accuracy? in theory, should not depend on sequence identity (should depend on quality of template library & "luck")
    • Usually, higher sequence identity to protein of known structure  better model

Identify “best” fit between target sequence & template structure

threading algorithm for psp
Threading Algorithm for PSP
  • Database of 3D structures and sequences
    • Protein Data Bank (or non-redundant subset)
  • Query sequence
    • Sequence < 25% identity to known structures
  • Alignment protocol
    • Dynamic programming
  • Evaluation protocol
    • Distance-based potential or secondary structure
  • Ranking protocol
  • Basic premise:
  • Statistics from Protein Data Bank (~40,000 structures)
  • Thus, chances for a protein to have a native-like structural fold in PDB are quite good
    • Note: Proteins with similar structural folds could be either homologs or analogs

The number of unique structural folds in nature is fairly small (probably 2000-3000)

Until very recently, 90% of new structures submitted to PDB had similar structural folds in PDB


Steps in Threading

Target Sequence




Align target sequencewith template structures

(fold library) from the Protein Data Bank (PDB)

Calculate energy score to evaluate goodness of fit between target sequence & template structure

Rank models based on energy scores

threading issues
Threading Issues

Find “correct” sequence-structure alignment of a target sequence with its native-like fold in PDB

  • Structure database - must be complete: no decent model if no good template in library!
  • Sequence-structure alignment algorithm:

Bad alignment  Bad score!

  • Energy function (scoring scheme):
      • must distinguish correct sequence-fold alignment from incorrect sequence-fold alignments
      • must distinguish “correct” fold from close decoys
  • Prediction reliability assessment - How determine whether predicted structure is correct? (or even close?)
threading template database
Threading: Template database
  • Build a database of structural templates

(eg, ASTRAL domain library derived from the PDB)

Supplement with additional decoys, e.g., generated using

ab initio approach such as Rosetta (Baker)

threading energy function
Threading: Energy function
  • Two main methods (and combinations of these)
      • Structural profile (environmental)physico-chemical properties of aa’s
      • Contact potential (statistical)

based on contact statistics from PDB

Miyazawa & Jernigan (ISU)

protein threading typical energy function
Protein Threading: Typical energy function

What is "probability" that two specific residues are in contact?

How well does a specific residue fit structural environment?

Alignment gap penalty?

Total energy: Ep + Es + Eg

Goal: Find a sequence-structure alignment that minimizes the energy function



The goal of CAFASP is to evaluate the performance of fully automatic structure prediction servers available to the community. In contrast to the normal CASP procedure, CAFASP aims to answer the question of how well servers do without any intervention of experts, i.e. how well ANY user using only automated methods can predict protein structure. CAFASP assesses the performance of methods without the user intervention allowed in CASP.

performance evaluation in cafasp3
Performance Evaluation in CAFASP3

Servers with name

in italic are

meta servers

MaxSub score ranges from 0 to 1

Therefore, maximum total score is 30

(, released in December, 2002.)

one structure where raptor did best
One structure where RAPTOR did best

Red: true structure

Blue: correct part of prediction

Green: wrong part of prediction

  • Target Size:144
  • Super-imposable size within 5A: 118
  • RMSD:1.9
automated web based homology modeling
Automated Web-Based Homology Modeling
  • SWISS Model :
  • WHAT IF :
  • The CPHModels Server :
  • 3D Jigsaw :
  • SDSC1 :
  • EsyPred3D :
comparative modeling server program
Comparative Modeling Server & Program
  • InsightII