Dukka
This presentation is the property of its rightful owner.
Sponsored Links
1 / 40

Dukka PowerPoint PPT Presentation


  • 69 Views
  • Uploaded on
  • Presentation posted in: General

Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction. Dukka. Background Protein Structure Prediction and CASP TASSER algorithm MCORE algorithm. Outline.

Download Presentation

Dukka

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Dukka

Application of Monte Carlo Simulation: Removing averaging artifacts in protein structure prediction

Dukka


Dukka

Background

Protein Structure Prediction and CASP

TASSER algorithm

MCORE algorithm

Outline


Dukka

Experimental or computational method often output results as an ensemble of protein structures.

NMR, Protein Structure Prediction, Protein Docking, RNA Structure Prediction

A single representative structure is required to compare or do further analysis.

Representative structure (consensus structure) = a centroid structure by averaging the Cartesian coordinates of the ensemble of superimposed structures.

RMSD between the ‘averaged structure’ and any reference structure is always less than or equal to the average RMSD of the individual members. (Zagrovic et al.)

However, the centroid structure has averaging artifacts rendering bond angles and bond lengths to be unphysical.

Background


Dukka

Critical Assessment of Structure prediction of Proteins (CASP) is a biannual contest where different groups try to predict structure of a protein whose structure is not released to the outside world.

One of the most popular and objective contest in the bioinformatics field.

CASP8 just over.

Major observations from CASP7:

Methods are more or less ripe enough

Consensus servers usually outperform individual servers

A lot of work needed to be done in the refinement step

Protein Structure Prediction and CASP


Dukka

Refinement

  • Given a set of conformations obtain a conformation that is closest to the native structure.

  • Molecular force fields like AMBER, CHARMM can be utilized but as we know they are not perfect.

  • Furthermore, still lack of perfect definition of “closest”. Hence, CASP coming up with new ideas of other measures to measure the closeness to the native like HB score and so on.

  • Often, the ‘most closest prediction’ is not ranked top 1. Hence, ‘Refinement’ is getting a lot of attention.


Dukka

TASSER algorithm(Threading/ASSembly/Refinement)

Centroid Structure

Zhang & Skolnick, 2004


Dukka

Problem Identification

  • TASSER is one of the best prediction server in both CASP7 and CASP8.

  • A large number of conformations is generate after the assembly step. However, we can submit only a couple of models.

  • Clustering is utilized and the centroid of the largest cluster (Combo model) is predicted as the output and has proven to be successful.

  • Artifacts in ‘Tasser (combo) output’

    • Unrealistic bond lengths and bond angles due to averaging artifacts

      Scope

    • To fix these unrealistic bond lengths and bond angles

C-alpha Space

Energy Minimization!


Dukka

Combo and Closc Models

COMBO model : The centroid structure of the most dense cluster.

CLOSC model : The structure that is closest to the centroid of the most dense cluster.

Fraction of clashes


Dukka

PULCHRA

  • PULCHRA - based on steepest descent minimization and a simple force field.

  • Sometimes, can not come out of the kinetic trap.

  • Heavily distorted chain, the minimization procedure does not converge or the optimized model still exhibits irregularities.

Rotkiewicz and Skolnick, 2008


Dukka

MCORE

Start from a ‘close-by model’

Generate an extended

structure based on Combo model

Monte-Carlo Minimization

Output the best structure


Dukka

Generation of Extended Structure

Using the distance distribution from the PDB, mainly three types: x-Pro = 3.77, x-{ALA|ARG|ASN|LEU|LYS|MET} = 3.81, and x-{ASP|CYS|GLU|GLY|HIS|ILE|PHE|SER|THR|TRP|TYR|VAL} = 3.80


Dukka

Monte-Carlo

  • Two major components of any Monte-Carlo Approach

    • Energy Function

      • Can be generic force field or any combination of terms

    • Move Sets

      • Critical to the performance of the algorithm, more of an art(?)

    • Convergency Criteria

      • Naïve way (Run for certain number of steps)

      • Introduce some criteria based on the generated conformations


Dukka

Monte-Carlo: Metropolis Criteria

  • Starting from a state A, make a change in the configuration to obtain a new (nearby) configuration B.

  • Compute EB

  • If (EB < EA), assume the new configuration, since it is a desirable thing.

  • If (EB > EA), calculate the probability p

  • Draw r from uniform distribution [0,1], if r < p then accept the new configuration B else reject the new configuration B.


Dukka

Move Sets

  • Move Sets

    • Global move-set

      • Rest-all bead move

    • Local move-set

      • 1-bead move

      • 2-bead move

      • 3-bead move

      • 4-bead move

      • 5-bead move

    • End-bond move

      • 1,2,3-bead C-terminal end bond move

      • 1,2,3-bead N-terminal end bond move


Dukka

Calculate the unit vector along axis defined by i-1 and i+1

Calculate the rotation matrix around this vector

Calculate the new position of i

Important thing is to preserve the bond length i.e. to preserve the distance between consecutive C-alphas.

Move Sets

i

i+1

i-1

One bead move

Two bead move


Dukka

Three-bead move

Four-bead move

Rest-bead move

Five-bead move

Axis of rotation


Dukka

End-bond Move Sets

Axis of rotation


Dukka

Energy Function

Bond angle

Excluded volume

Closeness to target

Penalize if the difference in C-alpha position between the target and starting structure is not with-in certain cutoff

Penalize if the distance is less than 4.0A

Penalize if the angle is not between 70 and 150

N: Number of C-alpha atoms


Dukka

Assessment of Move Sets and Energy Function

  • Before doing the actual computation, have to test whether the move sets and energy function is properly working or not.

  • So, have to design some test cases. Positive test cases would be to drive extended structure to native structure.

    • Desired results:

      should be able to drive ‘very close’ to extended structure to native structure in relatively short number of steps


Dukka

Data Set

  • 1363 proteins less than 200 residues and the combo RMSD to the native is lesser than 6.5 Å.

  • 1363 Centroid structures (COMBO models)

  • 1363 CLOSC models

  • 1363 Close-by structures (CLOSC models + Pulchra Refinement)

  • 1363 Native structures.


Dukka

Driving Extended to Native

Average Energy

Average RMSD to NATIVE (Å)

0.045

0.06

0.041

Steps

Steps

10000 steps RMSD = 0.039


Dukka

Driving Extended to Native

0.033Å

Ext-refined Vs CA


Dukka

Convergency criteria

l

i

| rmsd_diff((i –l))| < Tolerance value, where l = i+j , j=1,…,L

Tried with different value of L and L=49 and Tolerance value = 0.005 seems reasonable.


Dukka

Propose two algorithms

  • MCORE: Start from a ‘close-by model and drive it towards the COMBO model.

  • CLOSC models as the close-by models.

    • When close-by model is readily available

  • MCORE-EXT: Start from an extended structure and drive it towards the COMBO model.

    • When close-by model is not readily available


Dukka

MCORE: Driving Close-by models to COMBO

Average Energy

Steps


Dukka

Why cannot go much closer to COMBO?

Fraction of Atoms Clashing in COMBO

Fraction of Atoms Clashing in MCORE

RMSD of MCORE to COMBO(Å)

RMSD of MCORE to COMBO(Å)


Dukka

MCORE Vs Combo

RMSD of MCORE to NATIVE (Å)

RMSD of COMBO to NATIVE

38 proteins had even lesser RMSD than the respective combo model


Comparison of different models

Comparison of Different Models

RMSD to Native (Å)

Fraction of Atoms in Clashes

0.010

0.065

0.000

0.63

3.35

3.36

3.54

3.28


Dukka

TM-score of four models

0.770

0.746

0.747

0.754


Results

Results


Some examples

Some Examples


Dukka

0.78Å

0.354Å

1akhA refined Vs native

1akhA com Vs refined

12 clashes

0.674Å

0.68Å

1akhA pulchra Vs native

1akhA com Vs native


Dukka

0.852Å

2.948Å

3bbn_ refined Vs combo

3bbn_ comboVs Native

2.918Å

3bbn_ pulchra Vs Native

3bbn_ refined Vs Native

3.099Å


All atom model reconstruction

All-atom model reconstruction

  • Built the main chain atoms of the refined Cα trace.

  • Rebuilt side-chains using two methods

    • Pulchra (-c)

    • Scwrl

3.92(all)

2.868(cα)

3.95(all)


Conclusion

Conclusion

  • Designed an algorithm to remove averaging artifacts and applied it to refine combo model.

  • Acknowledgments

    • Dr. Jeff Skolnick and all the members of the Skolnick Lab, especially Lila, Shashi, Hongyi, Seung Yup,…….

    • Dr. Dennis Livesay

  • Future Works

    • Refinement in All-atom space


  • Login