Computational Protein Design

Computational Protein Design Aka, The Inverse Folding Problem Topic 18 Chapter 39, Du and Bourne “Structural Bioinformatics”

Protein Design is an Inverse Problem of Structure Prediction MDVGQAVIFLGPPGAGKGTQASRLAQELGFKKLSTGDILRDHVARGTPLGERVRPIMERGDLVPDDLILELIREELAERVIFDGFPRTLAQAEALDRLLSETGTRLLGVVLVEVPEEELVRRIL… Biology Adopted from Amy Keating’s slides at MIT.

Different Types of Protein Design Design of new proteins -- novel protein folds -- binding interfaces -- enzymatic activities -- etc. Grand challenge De novo design Protein design Redesign of existing proteins -- increased thermostability -- altered binding specificity -- improved binding affinity -- enhanced enzymatic activity -- altered substrate specificity Immediate Practical applications Current Opinion in Biotechnology 2007, 18:1-7.

Protein Design Problems Annu. Rev. Biochem. 2008. 77:363-382.

Goal: design a protein that adopts a given structure Design target Designed protein Open problems with assessment: -- What resolution is required? (fold, sidechain, loop, etc?) -- Stability of the designed protein -- Structural uniqueness -- Must solve the structure to know how you did! There are typically many sequences that adopt the fold, so you must try to find one that the most stable. That is, minimize the quantity: DGfold = Gfolded – Gunfolded Search through many possible sequences, and then pick the one with the best Gfold.

The big challenges Search Energy The search space is astronomical: 20n Except in rare subspace search problems, this is computationally intractable. It is practically impossible to DGfoldbecause… -- What is the structure of the folded state?(sidechain and loop positions) -- How do we model the unfolded state? -- Entropy?! Instead, we focus on the energy of the folded protein, meaning native structure interactions. That is, replace DGfoldwith DEfoldusing MM force fields.

Sidechain packing Design target Designed protein As we did with structure prediction in homology modeling, we will typically use a rotamerlibrary-based approach.

Search algorithms for large spaces Exhaustive search – too slow! Stochastic methods -- Monte Carlo -- Genetic algorithms Pruning algorithms (which are deterministic) -- Branch and Bound -- Dead End Elimination For all-atom protein design, some amount of stochasticism is generally required. Purely deterministic approaches rarely succeed in designing complete proteins.

Dead End Elimination Eliminate, one at a time, rotamer choices that cannot under any circumstance be part of the minimum energy solution. From Wikipedia: DEE is a method for minimizing a function over a discrete set of independent variables. The basic idea is to identify "dead ends", i.e., "bad" combinations of variables that cannot possibly yield the global minimum and to refrain from searching such combinations further. Hence, dead-end elimination is a mirror image of dynamic programming techniques in which "good" combinations are identified and explored further. Although the method itself is general, it has been developed and applied mainly to the problems of predicting and designing the structures of proteins.

Dead End Elimination Identify and eliminate rotamers that cannot be part of the best solution. Note: Cannot afford to calculate energies for all of these configurations!

Dead End Elimination What is the least energy it would cost to replaceis with ir? Note: Only need to do p x r comparisons (versus rp), where: r = average # of rotamers/residue p = # residues.

DEE algorithm applied to protein design If DE > 0, then eliminate ir. Apply iteratively to all rotamer pairs. The energy profile changes as rotamers are eliminated, leading to elimination of further rotamers.

Coiled-coil design (Mayo et al.)

Biosensor design (Hellinga et al.) The Hellinga lab has designed many different receptors based on the bPBP fold.

Protein-protein interface design (Love, Mayo, et al.)

Rosetta Design Sketch input structure (the fold) Final structure Note: this step is analogous to structure prediction! Initial sequence selection (primarily 12-6, HB, and Born terms) Monte Carlo minimization (both at rotamer and backbone levels) Sequence optimization Repeat till convergence

Top7 (Baker, Kuhlman, et al.)

Conformational switch (Kuhlman, et al.) unfolded Folded to unfolded transition as zinc is titrated in folded

The Holy Grail state of the art The ideal: Designed sequences that meet both criteria

TS: transition state

Design model: purple X-ray crystal structure: green

The dirty little secret of protein design… For every high impact success in the protein design literature, there are dozens (perhaps hundreds) of spectacular failures that go unreported. Paraphrased from S. Mayo (Protein Society Meeting, 2006).

Scientific misconduct? Design of a novel triosephosphateisomerase

Scientific misconduct? Design of a novel triosephosphateisomerase DEE repacking around catalytic site

Scientific misconduct? Design of a novel triosephosphateisomerase Lineweaver-Burke plots As do I!

Computational Protein Design

Computational Protein Design

Presentation Transcript

A computational study of protein folding pathways

Computational Analysis of Protein-DNA Interactions

Protein Design

Computational Modeling For The G-protein Cycle

Novel Protein design

Computational Design of Protein Structures and Interfaces

Computational Drug Design

Computational protein design

Computational Biophysics and Drug Design

Computational Mechanism Design

Computational Methods for Protein Structure Prediction

Protein Function Analysis using Computational Mutagenesis

Protein Folding Protein Structure Prediction Protein Design

Cognitively-Inspired Computational Design Methods

Computational Protein Design: A problem in combinatorial optimization

de novo Protein Design

Computational design of protein function

Computational Modeling of Protein-Ligand Interactions

protein rational design

Computational Protein Folding