An optimization approach to protein structure prediction
Download
1 / 31

An Optimization Approach to Protein Structure Prediction - PowerPoint PPT Presentation


  • 109 Views
  • Uploaded on

An Optimization Approach to Protein Structure Prediction. Richard Byrd Betty Eskow Robert Schnabel Brett Bader Lianjun Jiang University of Colorado Teresa Head-Gordon Univ. of California, Berkeley Silvia Crivelli Lawrence Berkeley Laboratory. Problem Definition.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' An Optimization Approach to Protein Structure Prediction' - marion


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
An optimization approach to protein structure prediction

An Optimization Approach to Protein Structure Prediction

Richard Byrd

Betty Eskow

Robert Schnabel

Brett Bader

Lianjun Jiang

University of Colorado

Teresa Head-Gordon

Univ. of California, Berkeley

Silvia Crivelli

Lawrence Berkeley Laboratory


Problem definition
Problem Definition

Predict the 3-dimensional shape, or

native state, of a protein given its

sequence of constituent amino acids.

Approach

Assuming the native state of a protein corresponds to its minimumfree energy state, use a global optimization method to find the minimum energy configuration of the target protein.


Importance of protein folding
Importance of Protein Folding

  • 3-Dimensional structure useful in molecular drug design.

  • Laboratory experiments are expensive:

    • X-ray crystallography

    • NMR

  • Genome projects are providing sequences for many proteins whose structure will need to be determined.


Protein structures
Protein Structures

Proteins consist of a long chain of

amino acids called the primary structure.

Gly

Leu

Ser

Pro

The constituent amino acids may encourage hydrogen bonding and form regular structures, called secondary structures.

a-helix

b-sheet

The secondary structures fold

together to form a compact

3-dimensional or tertiary structure.


Chemistry of proteins

R

H

H

R

H

O

H

O

N

N

N

N

O

H

O

H

H

R

H

R

H

R

H

R

H

O

H

O

N

N

N

N

O

O

H

H

H

R

H

R

Chemistry of Proteins

Side chain

Amino acid

Backbone

H-bond

Hydrogen bonds strongly influence a protein’s shape. They largely occur in secondary structures and help hold the protein together.


Computational approaches to protein structure prediction
Computational Approaches to Protein Structure Prediction

  • Comparative Modeling

    • Compares and aligns to a known protein sequence of amino acids

  • Fold Recognition

    • Searches for the best fitting fold template from a library of known protein folds

  • New Fold Methods

    • Not based on knowledge of complete protein sequences or folds

    • e.g. energy minimization


Global optimization problem

  • O(en2) local minima

  • Very large parameter space

    e.g., modestly sized protein

    • 100-300 amino acids

    • ~ 1,600 atoms

    • ~ 4,800 variables

  • Model of the energy surface may not match nature

  • Global Optimization Problem

    The 3-dimensional structure of the protein found in nature is

    believed to minimize potential energy:

    Min V(x)

    where x = atom coordinates

    Challenges:


    Amber energy function
    Amber Energy Function

    V(x) =

    S

    cl(b-b0)2

    (b = bond length)

    

    bonds

    (q = bond angle)

     

    S

    ca(q-q0)2

    +

    bond angles

    +

    S

    cd[1 + cos(n +)]

    (w = dihedral angle)

     

     

    dihedral angles

    +

    S

    (rij = distance)

    charged pairs

    S

    +

    cwj(rij)

    (j = Lennard-Jones potential)

    nonbonded pairs

    Internalcoordinates are determined using bonds, bond angles

    and dihedral angles.

    Internal coordinates are determined using bonds, bond angles and dihedralangles


    Additional energy terms to model protein behavior in an aqueous environment
    Additional energy terms to model protein behavior in an aqueous environment

    • Formulated from simulations of pairs of hydrophobic molecules in water

    • ESOLVATION =

    • Advantages of this model:

      • Provides stabilizing force for forming hydrophobic cores.

      • Well defined model of the hydrophobic effect of small hydrophobic groups in water.

      • Computationally tractable and differentiable

    i,j are aliphatic carbons, M Gaussians with position(ck ), depth(hk) and width(wk) describe 2 minima: (1) molecules in contact and (2)mol-ecules separated by a distance of 1 water molecule.


    Global optimization approaches
    Global Optimization Approaches aqueous environment

    • Deterministic methods

      • Branch and bound, interval methods

      • Very reliable, deterministic guarantees

      • Too expensive for more than 20-50 variables

    • Stochastic methods

      • Random steps or sampling

      • Probabilistic guarantees

      • Practical for < 300 variables

    • Heuristic search

      • e.g. Simulated annealing, Tabu search, Genetic algorithms

      • Effective on some very large problems

      • No practical guarantees


    A stochastic perturbation global optimization approach
    A Stochastic-Perturbation Global Optimization Approach aqueous environment

    • Generate and maintain a pool of candidates (configurations), as in genetic algorithms.

    • Solve the full-dimensional problem as a series of small-dimensional ones.

    • Use protein database information to bias toward likely substructures.


    Algorithm phases
    Algorithm Phases aqueous environment

    Given the amino acid sequence of a

    protein, find the 3-dimensional

    structure likely to be found in nature.

    Simplify problem by utilizing domain-specific knowledge

    Generate

    Initial

    Population

    Global

    Optimization

    Phase 1

    Phase 2


    Phase 1 create initial population
    Phase 1: Create Initial Population aqueous environment

    • Submit amino acid sequence to server:

    • EFIAIYDYKAETEEDLTIKKGEKLEIIEKEGDWWKAKAIGSGEIGY

    • IPANYIAAAE

    • Use server predictions to determine the location of α-helices, β-strands, and coils :

    • CCCCHHHHHHEEEEEEEEEEEECCEEEEEEEEEEEHHHHHHHHCCC

      • HHHHHHCCCC

    • Use ProteinShop visualization tool to form configurations with secondary structure:

    • Assign ideal values to the dihedral angles in the sequence according to the predictions. Manipulate β-strands to form β-sheets.

    •  Perform Energy Minimizations


    Phase 2 improve local minima
    Phase 2:Improve Local Minima aqueous environment

    Select a protein

    and a subset of

    dihedral angles

    • Uses a combination of breadth-first and depth-first searches from initial pool

    • Dihedral angles act as “internal coordinates” and reduce the number of variables, speeding an optimization run

    Small-scale global

    optimization

    Full-dimensional

    local optimization

    iterate

    Cluster minima and test stopping criteria


    Small scale global optimization in phase 2
    Small Scale Global Optimization aqueous environment in Phase 2

    • Minimize energy over 5-20 torsion angles’

    • Use a stochastic global optimization algorithm base on sampling, sample pruning and local minimization (Rinooy-Kan et al).

    • From best start points, do local minimizations using quasi-Newton


    Full scale local minimizations
    Full-scale local minimizations aqueous environment

    • Using best points from small-scale global, do local minimizations.

    • Because of problem size we use limited-memory quasi-Newton.

    • Best local minimizers are added to pool.


    Biasing functions
    Biasing aqueous environment functions

    • Used to form secondary structure during in first phase and sometimes infull-dimensional local minimizations.

    • Dihedral angle biasing:

      E= dihedrals kf[1 – cos(f - f0)] + k[1 – cos( - 0)]

    • Hydrogen Bond biasing

      • For -helices:

        EHB=wiwi+4 / Dri,i+4 (w’s are weights from the server for residues i and i+4 in the helix)

      • To form -sheets from -strands:

        EHB= wiwj / Dri,j


    Neural network predictions
    Neural Network Predictions aqueous environment

    Sequence:

    SKIGIDGFGRIGRLVLRAALSCGAQ

    Neural nets trained on a large database of proteins can predict secondary structure likely to be in a target protein.

    Sequence:

    Type:

    Weight:

    SKIGIDGFGRIGRLVLRAALSCGAQ

    BBBB B AAAAAAA BBBBB

    13552 6789992 56673


    Forming β-sheets from the predicted aqueous environment-strands is a combinatorial problem.

    Which strands are paired?

    ?

    ?

    ?

    Which orientation?

    anti-parallel

    parallel

    Which residues are paired?

    even

    odd


    Distribution of Beta Sheets in Proteins with Applications to Structure Prediction

    Ruckzinski, Kooperberg, Bonneau, and Baker

    Proteins 48, 2002


    Parallel organization
    Parallel Structure PredictionOrganization

    • Select k subsets of dihedral angles

    • Maintain a queue of (configuration,subspace) for k optimization crews to work on

    • Each optimization crew performs a small-scaleglobal optimization of its assigned configuration and subspace.

    • Gather intermediate results and re-insert them into the work queue. Idle optimization crews do full-dimensional local minimizations oradditionalsmall-scale global optimization.

      Massively parallel exploration of optimization space

    • Automatic load balancing


    2UTG_A: 7.5Å R.M.S.D. from Crystal Structure Prediction

    1POU: 6.3Å R.M.S.D. from NMR structure


    Casp competition
    CASP competition Structure Prediction

    • Community-wide experiment on the Critical Assessment of Techniques for Protein Structure Prediction

    •  Protein crystallographers and NMR spectroscopists provide structures prior to their publication for blind prediction by participants.

    •  Biannual competition open to all computationalmethods – including servers.

    •  Difficulty of targets assessed by which type of methods work to predict the structure – CM, FR, NF.

    •  We participated in CASP4 (Dec. 2000) and CASP5 (Dec. 2002).



    Results on Phospholipase C beta C-terminus, turkey (containing 242 amino acids). Ribbon structure comparison between experiment (center), submitted M1 prediction (right), our lowest energy submission, had an RMSD with experiment of 8.46Å, and next generation run of the global optimization algorithm (left). This new run lowered the energy of our previous best minimizer, resulting in a new structure with an RMSD of 7.7Å.


    Casp4 results summary
    CASP4 Results Summary (containing 242 amino acids).

    Best structure predicted on one of the hardest targets

    Our method is more effective than some knowledge-based methods on targets for which less information from known proteins is available.

    Global optimization algorithm is very effective at improving structures from a small initial population.



    Our submitted casp5 models of targets domains that were assessed in the casp5 new fold category
    Our submitted CASP5 models of targets (domains) that were assessed in the CASP5 NEW FOLD category.


    Our submissions for casp5 target 162
    Our submissions for CASP5 Target 162 assessed in the CASP5 NEW FOLD category.


    Casp5 results summary
    CASP5 Results Summary assessed in the CASP5 NEW FOLD category.

    • Ranked ~15/165 groups in assessments of New Fold (and NF/FR) Results.

    • Our method uses less knowledge from known protein structures than most other (New Fold) methods participating in CASP5

    • More diverse starting populations (especially for -sheet proteins) using the visualization tool led to better performance in some cases.


    Future research directions
    Future Research Directions assessed in the CASP5 NEW FOLD category.

    • Simpler energy models for early stages of the algorithm, and alternative models of solvation.

    • New techniques for choosing -strand pairings.

    • Improve our techniques for maintaining existing secondary structure in our models.


    ad