Sidechain Placement and Protein Design

1 1 1 1 1 1 1 1 1 1 1 1 2 0 0 1 Sidechain Placement and Protein Design GCMB’07, 2 May

Protein design • Sequence  Structure  FunctionKDTIALVVST… Ribose YPVDLKLVVKQ binding protein Modify sequence TNTto change structure bindingand function [Looger03] or behavior [Ambroggio06]folding order

Protein Design or Redesign • Create an amino acid sequence that folds to a stable protein and performs a desired function • Avoid: • Sampling all sequences • Solving protein folding • Relying on molecular dynamics • A successful design strategy: build on an existing structure • Scaffold: backbone from a known folded structure • Redesign ~20 residues • Find side chains that fit

Outline • Sidechain Rotamers & Rotamer Libraries • Algorithms for Sidechain Placement • Brute Force • Dead End Elimination • Simulated Annealing • Stochastic Mean Field • Dynamic Programming • A Biased View of Protein Structure & Design • How is design done? • Why is it successful?

O OH N N H2N O O O OH N H OH Protein Structure • Chemical • 1-Dimensional: Sequence of amino acids • Two components for each amino acid • Backbone (NCαC+O) • Side chain (residue) • Placed residue: a position in an amino acid sequence S OH N N H2N MSS MSW O O

N N 2 1 Side chain geometry • Conformation flexibility from dihedral angles • Side chain internal geometry • Bond angles and bond lengths fixed • Dihedrals c1, c2, … may rotate • Rotamers: rotational isomers • Side chains have preferred conformations • Prefer dihedrals around 60o, 180o and -60o • Rotamer Library: set of dihedral angles [Ponder87, Dunbrack93, Lovel2000]

Side chain conformation side chains differ in size (# of atoms) and degrees of freedom (# of c angles) N N 2 1

H H H OH H H H H OH Cα Cα Cα Cβ Cβ Cβ Ci Ci Ci Ni Ni Ni H H OH Serine c1 distribution a chosen combination of side chain torsion angles c1, c2, etc. for a residue is known as a rotamer.

Side chain conformations--canonical staggered forms Newman projections for c1 of glutamate: glutamate t=trans, g=gauche name of conformation Side chain angles are defined moving outward from the backbone, starting with the N atom: so the c1 angle is N–Ca–Cb–Cg, the c2 angle is Ca–Cb–Cg –Cd ... IUPAC nomenclature: http://www.chem.qmw.ac.uk/iupac/misc/biop.html

2 p|1 1 p  No. 1 No.    Backbone independent rotamer library • Dunbrack & Cohen, 1997

What do rotamer libraries provide? [J. Meiler07] • Rotamer libraries significantly reduce the number of conformations that need to be evaluated during the search. • This is done with almost no risk of missing the real conformations. • Even small libraries of about 100-150 rotamers cover about 96-97% of the conformations actually found in protein structures. • The probabilities of each rotamer in the library provide estimates of the potential energy due to interactions within the side chain and with the local backbone atoms, using the Boltzmann distribution: E  ln(P)

N N 2 1 Side chain geometry • Conformation flexibility from dihedral angles • Side chain internal geometry • Bond angles and bond lengths fixed • Dihedrals c1, c2, … may rotate • Rotamers : rotational isomers • Side chains have preferred conformations • Prefer dihedrals around 60o, 180o and -60o • Rotamer Library: set of dihedral angles [Ponder87, Dunbrack93, Lovel2000] • http://dunbrack.fccc.edu/bbdep/bbdepdownload.php(Backbone dependent and independent libraries) • http://kinemage.biochem.duke.edu/databases/rotamer.html (Backbone independent library)

& modified backbone resolves clashes Rotemers in crystallographic refinement Fit structure to electron density from x-ray diffraction Red indicate clashes w/ added hydrogen atoms better choice of side chain

Outline • Sidechain Rotamers & Rotamer Libraries • Algorithms for Sidechain Placement • Brute Force Search • Dead End Elimination • Simulated Annealing • Stochastic Mean Field • Dynamic Programming

Side Chain Placement Problem • Given • A fixed protein backbone • A set of fixed (background) residues • A set of changing (molten) residues • A list of allowed amino acids for each molten residue • A rotamer library • A pairwise decomposable energy function Find the assignment of rotamers to the molten residues, S, that minimizes the energy function Kinemage: rotamers for Ubiquitin surface residues

a b __ __ - d12 d6 c __ d f= f(i,j) i<j Energy Functions • f: Protein Structure  • Lennard-Jones • van der Waals attractive energies • atom overlap repulsive overlap • Electrostatics • Solvent Effects • Hydrogen bonds • Often pairwise decomposable • sum of atom-pair or rotamer-pair interaction energies

min( Esingle(Si) + Epair(Si,Sj) ) S S i i < j Side Chain Placement Problem Find the assignment of rotamers to the molten residues, S, that minimizes the energy function Functions stated in terms of rotamer energies rotamer / background energy rotamer pair energies Esingle Epair

Side Chain Placement Problem • NP-Complete • Reduction from SAT [Pierce2002] • Techniques • Optimality Guarantee • Dead-End Elimination [Desmet92, Goldstein94, Looger2001] • Integer Linear Programming [Erickson2001] • Branch and Bound [Gordon99, Canutescu2003] • Dynamic Programming [Leaver-Fay2005] • No Optimality Guarantee • Genetic Algorithms [Jones94] • Simulated Annealing [Holm92,Hellinga94,Kuhlman03] • Self-Consistent Mean Field [Koehl96]

Dead End Elimination (DEE) • Reduce the search space without losing the Global Minimum Energy Conformation (GMEC). • Eliminates rotamers which cannot be in the GMEC, using more accurate (and more computationally expensive) upper and lower bounds. • Uses brute force search on rotamers remaining. • Typically assumes that the scoring function can be expressed as a sum of pair-wise interactions

ir score in rotamer js interacted with A first, simple condition for elimination • A rotamer can be eliminated for a residue when the minimum (best) energy it obtains by interaction with other rotamers is still higher (worse) than the maximum energy of some other rotamer:

is score it rotamer space The Goldstein improvement • A rotamer can be safely eliminated when there exists a rotamer that has lower (better) energy for each given environment. • This criteria is more powerful, and typically requires though more computational time.

rt score rs rt’ rotamer space Even more powerful criteria can be obtained with even more computation • A rotamer can be safely eliminated when, for each environment, there exists some rotamer that has lower (better) energy.

Dynamic Programming via an Interaction Graph • Surface residues on Ubiquitin’s b-sheet Interaction Graph defined by Rosetta’s energy function:

Interaction Graph G = {V, E}, a multi-hypergraph vertices  molten residuesv state space  rotamers for a residue S(v) edge  possibility of residue interaction e V scoring function  interaction energy fe: S(v) →  ve Hypergraph Graph

Esingle(Si) + Epair(Si,Sj) i i < j Interaction Graph Evaluation (Pairwise case) • For G = {V, E}, min • Each vertex, v, has a function to capture interactions with the background: f{v} : S(v) R • Each pair of interacting vertices, {u, v}, defines an edge with a function to capture pair interactions: f{u,v} : S(u) xS(v) R • Given an interaction graph, G={V,E}, find the state assignment S that minimizes SwVE fw

Bottom Up Dynamic Programming Eliminate node v • Let Ev be the edges incident upon v • Let Nvbe the neighbors of v • For each edge e Ev with scoring function fe, let fe,v=s be edge e’s scoring function with vertex v fixed in state s • Create a new hyperedge incident upon Nv. • Compute fNv = min s  S(v)e  Ev fe,v=s • Remove v from graph

Scoring Function Representation: Tables u Edge e = {u,v} S(v) S(u) v f g h i j a b c d e

S(w) S(v) S(u) o n m l k f g h i j a b c d e Scoring Function Representation : Tables w Edge e = {u,v,w} v u

Experiments and Results • “Rotamer Relaxation Task” • Sequence fixed – choose new rotamers for each residue • “Redesign Task” • Search of conformation and sequence spaces. • Ubiquitin’s 15 surface residues • Large rotamer library • Relaxation, 32 states per vertex, tw-4 interaction graph • Redesign, 680 states per vertex, tw-3 interaction graph (drop one edge)

Dynamic Programming for Hydrogen Placement • Dynamic programming (DP) limited by treewidth of graph instances • Treewidths from graphs in protein design too large for DP to be practical • Adding hydrogen atoms to PDB • Hydrogen placement via combinatorial optimization: REDUCE [Word99] • Non-pairwise decomposable energy function • Previously used brute force • Replaced with dynamic programming • Interaction graphs have low treewidth • Effective in practice: minutes to ms. • REDUCE v3.02 in Molprobity suite, and distributed from http://kinemage.biochem.duke.edu/software/reduce.php H O

_ ___ ΔE kT e ΔE > 0 1 o.w. Simulated Annealing • Stochastic optimization technique • Monte Carlo • Make a random change, determine ΔE • Metropolis criterion [Metropolis57] • accept with probability • Gradually lower temperature T • In Side Chain Placement • Assign each residue a rotamer • Repeat • Select a random residue, and a random alternate rotamer • Find ΔE induced by substituting the alternate rotamer • Accept/Reject substitution according to Metropolis criterion

Self-consistent mean field • I planned to cull a description from Patrice’s BioEbook sections: • http://nook.cs.ucdavis.edu:8080/~koehl/BioEbook/design_scmf.html • http://nook.cs.ucdavis.edu:8080/~koehl/BioEbook/scmf.html but didn’t have time in class.

The practical problem of side chain modeling [M07] • The way we deal today with the problem of protein structure prediction is very different from the way nature deals with it. • Due to technical issues such as computation time we are usually forced to accept a fixed backbone and only then put the side chains on it. • The quality of the side chain modeling is therefore heavily dependent on the position of the backbone. If the initial backbone conformation is wrong, the side chain modeling quality will be accordingly bad. • What is really needed is a “combined” algorithm that optimizes backbone conformation simultaneously with side chain modeling.

Protein Design or Redesign • Create an amino acid sequence that folds to a stable protein and performs a desired function • Avoid: • Sampling all sequences • Solving protein folding • Relying on molecular dynamics • A successful design strategy: build on an existing structure • Scaffold: backbone from a known folded structure • Redesign ~20 residues • Find side chains that fit

Why Design Proteins? • Nature uses proteins • to signal events • to catalyze reactions • to move cells (motors) • to bear weight (I-beams) • Design is an experiment to help understand folding/binding • Industrial biosynthesis • Proteins are both efficient and specific • Cure disease • Antibodies • Inhibition peptides as drugs • Perturb cell signaling pathways

Why do RosettaDesign, Dezymer, … work? • Geometric approximations (3d jigsaw puzzles) are surprisingly effective in design. • They mine PDB structures for behaviors of native proteins and fragments. • They precompute energies for pairwise interactions. • They use many fast computers to allow detailed sampling of discrete conformations. • Fast optimization algorithms • Competition

How do RosettaDesign, Dezymer, … fail? • Computationally difficult to achieve good packing and hydrogen bond satisfaction in protein core: • Scores for packing, solvation and hydrogen bond satisfaction cannot be pairwise additive. • Scores often used as filters; we’d prefer to optimize. • Stability of designed proteins • Multistate or negative design

U G ΔG F Protein Stability • A naturally occurring protein adopts a compact geometry when placed in water • Stability is difference in free energies of the folded and unfolded states

> > ΔG ΔG ΔG G Protein Stability • A naturally occurring protein adopts a compact geometry when placed in water • Different proteins have different free energies in their unfolded states

Challenges in Protein Design • Side chain placement is hard • The complexities of individual instances of SCPP are related to the treewidth of their interaction graphs. • Tight, collision-free packing is often impossible on the input scaffold • The interaction graph to allow simultaneous optimization of side chain and backbone structures • Protein stability is not well captured by pairwise decomposable energy functions • The interaction graph supports using non-pairwise decomposable energy functions during side chain placement

Sidechain Placement and Protein Design

Sidechain Placement and Protein Design

Presentation Transcript

Computer Literacy Placement Exam Design and Assessment

Protein Complex and Protein-protein Interaction

Work Placement Overview, Statistics, Podcasts and Design Competition

Physical Design Automation Placement and Routing

Design and Integration in Renewable Energy Technology Placement

Protein Design

Novel Protein design

Computational Design of Protein Structures and Interfaces

Packing and Placement

Computational Protein Design

Computational protein design

Selection and Placement

Placement and Transition

PHEBUS PLACEMENT AND DESIGN

Microphone Physical Design and Placement

Protein Folding Protein Structure Prediction Protein Design

Protein and Protein Supplements

de novo Protein Design

Protein-protein and Protein-ligand Docking

protein rational design

Protein and Protein Supplements