Computational Protein Design: A problem in combinatorial optimization

Computational Protein Design:A problem in combinatorial optimization CSE 549 Guest Lecture September 17, 2009 David Green Applied Mathematics & Statistics

What is a protein? • Polymers (chains) of amino acids. • There are 20 different amino acids that can be part of the chain. • Machines of the cell. • It’s proteins that do most of the work involved in life!

Polymers of amino acids. • Amino acids link to form polypeptides. • There is a backbone of constant composition. • There are side chains that vary.

The twenty amino acids. • AA side chains vary from: • Big to small. • Non-polar (all C and H) to polar. • Positive to negative. • Flexible to rigid.

The machinery of life. • Protein sensors (receptors) are responsible for all the senses (sight, smell, taste, touch, hearing). • Enzymes are proteins the catalyze chemical reactions, like the ones that convert food to energy. • Specialized structural proteins make skin elastic, and make the lens of the eye work. • Muscles are primarily composed of proteins that combine structural and enzymatic parts to make a machine.

Why design proteins? • New sensors based on biology. • Proteins have been engineered to detect TNT (explosive) and sarin (nerve gas). • Proteins are used as treatments for many diseases. • Protein engineering has helped improve proteins that are given to cancer patients on radiation or chemo-therapy. • Work in the Green lab is on-going to design proteins for use as anti-HIV prophylatics. • Many nanotechnology applications that haven’t even been considered yet!

Where do proteins come from? • The genome contains instructions for every protein in a cell. • A few HUGE molecules of DNA. • Each gene is the code for one protein. • There are ~30,000 genes in humans. • Genes are expressed through an intermediate molecule, RNA. • Many copies of each protein can be made.

The Central Dogma of Molecular Biology. • Then proteins do the work!

How do proteins work? • Proteins fold into a unique 3-dimensional structure. • The amino acid sequence of a protein dictates it’s structure. • The function of a protein is controlled by it’s structure.

Many polymers are long, unstructured chains. • Polyethylene • Is made of long chains of the same monomers. • Adopts a random mesh of inter-weaving strands. • This structure gives us PLASTIC!

DNA has the same structure for every sequence. • The “double-helix” is a great structure for storing and replication information.

Protein structures are well-defined and diverse! • One chain or many. • Elongated or globular. • Many forms of symmetry (or none).

What does a protein look like? • Cyanovirin – A protein that inhibits the entry of HIV into human cell.

What does a protein look like? • The atoms of a protein form a compact, well-packed cluster.

What does a protein look like? • A protein can be thought of as a nearly solid object.

What does a protein look like? • Simplified cartoons make the structure easier to see.

What does a protein look like? • The path of the backbone of a protein is called it’s “fold”.

What does a protein look like? • Different types of amino acids are found all along the protein chain.

What does a protein look like? • Each amino acid has a side chain that protrudes from the backbone.

What does a protein look like? • Many proteins bind other molecules, like the sugar molecules here.

What does a protein look like? • Binding interfaces are usually a close fit of two complementary surfaces.

What does a protein look like? • The core of a protein is key in keeping a stable structure.

Many side chains fill the core.

The core is well packed …

… with groups from all along the chain.

Each side chain fits perfectly.

What is a protein? • A protein is a complicated three-dimensional structure, made up by an amazing 3-D jigsaw puzzle of interlocking amino acids. • Amino acids pack together not just geometrically, but with complementary chemical groups as well. • Proteins move too, but we’ll ignore that for now.

How can we design one?!? • Choose a fold (path of the backbone). • Pack the core with the right set of amino acids to achieve the desired fold. • Choose other amino acids to achieve the desired function (such as binding to a target molecule, or getting the right molecular motions).

Structure prediction is a forward problem. • Given a protein sequence, what is the structure that it will adopt (fold to)? • This is a VERY hard problem, and it not yet fully solved. • Prediction is difficult because you are stuck with what nature gives you.

Protein design is an inverse problem. • Given a desired 3-dimensional protein structure, what is a sequence that will fold to that structure? • We have the freedom to add constraints that simplify the problem. • As a result, methods for protein design have had many successes. • Pabo. Nature301: 200 (1981). • Drexler. PNAS 78: 5275-5278 (1981).

A designed sequence should fold according to design. • ANY sequence which folds to the correct target structure (and carries out the desired function) can be considered a successful design • There is more than one right answer, unlike in prediction!

Choosing a backbone fold. • The structure dictates the function, and a big part of structure is the fold. • We still don’t really know how to choose the “best” fold. • Instead, we just borrow from nature – redesign a natural protein to do something new.

Zinc finger proteins bind DNA.

A Zinc ion holds them together. • The protein will not fold if zinc is not present. • The protein only binds DNA when it is folded. • A group at Caltech set out to design a zinc finger that doesn’t need zinc!

1997: The first fully automated protein design! • Dahiyat and Mayo. Science287: 82-87 (1997).

Designing function. • Making a molecule bind is like designing a the core – we want to make the interface between the two pieces complementary. • Other functions are a lot trickier … and we don’t have good ways to solve them yet, but we’re on our way.

2003: A Duke group designs a set of protein sensors. • Looger, Dwyer, Smith and Hellinga. Nature423: 185-190 (2003).

Protein design is a BIG problem. • The zinc finger is one of the smallest protein domains … about 30 amino acids long. • How many different 30 amino acid polypeptides are there? • Choose from any of 20 amino acids at each position. • Total sequences = 2030 = 1x1039 • Mass of earth = 6x1027 g • Mass of a grain of sand ~ 1x10-3 g • A billion earths’ worth of sand grains • Enumeration of possible states is beyond impossible — must take advantage of need to achieve complementary interactions between amino acids.

Many different structures are possible. • An arginine and a glutamate interact.

Many different structures are possible. • An arginine and a glutamate interact in several different conformations.

Really Big!!! • Amino-acid side chains are flexible. • But not every shape (conformation) is equal. • Each amino acid has a set of preferred conformations (rotamers). • 1 to 80 per amino acid. • Instead of choosing from 20 amino acids … we need to choose from ~400 (at least) amino acid rotamers! • Total structures = 40030 = 1x1078 • (approx. number of atoms in the universe!!!!!)

Packing side chains – a puzzle. • How do you solve a jigsaw puzzle? • Impossible to try all combinations of piece placement • Unique ways of placing N pieces on a grid is (4N)(N!) • For N=100, (1.6x1060)(9.3x10157)= 1.5x10218 • Trying each piece one by one is better, but still infeasible • Number of iterative tries for a N piece puzzle is: • For N=100, 1.37x106

Packing side chains – be smart. • How do you solve a jigsaw puzzle? • Group pieces by colors and patterns. • Iterate over matching of pieces that are complementary • Shape is important. • The pattern must also match.

Pattern matching in proteins? • What does it mean for two amino acids in the core of a protein to “match”? • Must fit close together (but not too close) Steric complementarity. • Neighboring atoms must have complementary charges (neutral likes neutral, positive likes negative)  Electrostatic complementarity.

Steric fit: Lennard-Jones potential. • Van der Waals attraction between atoms at moderate distances. • Repulsion of atoms from one another at short distances. • If atoms are not nearby, the energy between them will be very close to zero. • The total score of the “goodness” of fit in a molecule is the sum of the energy for every pair of atoms.

Electrostatic fit: Coulomb’s Law • Atoms in molecules can be thought of as having tiny charges on them, even if the total charge on a molecule is zero. • Coulomb’s Law describes the energy of how two charges interact. • The overall electrostatic fit is calculated by adding up the energy of all pairs of atoms. • Like charges give a positive value. • Opposite charges give a negative. • Neutral (zero charge) groups don’t matter.

The total energy describes the fitness of a structure. • Van der Waals + Coulomb’s Law, for every pair of atoms, and all added up. • Negative energies are favorable, positive energies unfavorable. • Nature works to MINIMIZE energy.

Protein Design as a Discrete Conformational Search Position 1 Position 2 Position 3 Conformational states of system

Computational Protein Design: A problem in combinatorial optimization