1 / 25

THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

Putting Engineering back into Protein Engineering Jun Liao, UC Santa Cruz Manfred K. Warmuth, UC Santa Cruz Jeremy Minshull, DNA 2.0. THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU. Protein Engineering Current Paradigms. Mechanism-based (Rational) detailed structural analysis Empiricism-based

radha
Download Presentation

THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Putting Engineering back into Protein EngineeringJun Liao, UC Santa CruzManfred K. Warmuth, UC Santa CruzJeremy Minshull, DNA 2.0 THE BUILDING BLOCKS OF LIFE. BUILT FOR YOU

  2. Protein Engineering Current Paradigms • Mechanism-based • (Rational) detailed structural analysis • Empiricism-based • (Non-rational ) libraries based

  3. Mechanism-Based Protein Engineering Based on thermodynamic principles • Calculations are approximate • calculation cost • structures are really not rigid (MDS) • Calculations are primarily able to predict binding • catalysis is a special case of binding to a transition state • Changes in amino acids are designed based on these principles • very small numbers (<5) of new proteins are synthesized and tested

  4. Proteins related to wild type Simulated cross over New variants Empiricism-Based Protein Engineering • Uses similar principles to evolution • make many variants • screen to find those with the best properties • No mechanistic understanding needed • Produces large numbers of variants (>1,000) which are very difficult / expensive to screen for practically relevant properties

  5. The Key Challenge in Protein Engineering = High throughput screens (surrogate assays) Reality Molecular mechanistic models (does not model activity) What we need is not what we assay for….

  6. What we want in Protein Engineering Wish List • No need to develop surrogate assay • Variants are tested directly under application conditions • Rapid process. Requirements • Identification of appropriate amino acid substitutions • Design and synthesis of information-rich variants • Interpretation of quantitative functional data using machine learning techniques.

  7. Protein Engineering using Machine Learning Starting point Select a protein with some correct initial properties Initial design a) Choose substitutions. b) Design an initial variant set (<50) containing those substitutions Reality check Synthesize and test the variant set for function(s) of interest. New design Propose a new variant set (<50) based on the model. Iterate Machine learning Model the effect of sequence changes on function(s) of interest. End Select the best variant(s).

  8. Engineering of Proteinase K • Long-term goal of engineering proteinase K to degrade polylactic acid • Member of the serine protease family • Large amounts of phylogenetic and sequence information available • Several different measurable activities available for optimization

  9. Protein Engineering using Machine Learning Starting point Select a protein with some correct initial properties Initial design a) Choose substitutions. b) Design an initial variant set (<50) containing those substitutions Reality check Synthesize and test the variant set for function(s) of interest. New design Propose a new variant set (<50) based on the model. Iterate Machine learning Model the effect of sequence changes on function(s) of interest. End Select the best variant(s).

  10. Expert System for Substitution Selection Expert system: - Calculation of 9 independent scores that measure changes that have succeeded in other places in Nature - Weight and combine scores to pick best changes Proteins related to proteinaseK 19 switches = search space of 219 = 500,000 ? ? ? ? ?

  11. x x x Aa 1 x x x x x x x x x x x Aa 2 x x x Finding Optima in Complex Landscapes:Design of Experiment Aa 1 Aa 2 Changing 1 amino acid at a time Making multiple changes simultaneously …Now try to envision doing this not with 2, but 200 amino acids / dimensions

  12. Design of Initial Proteinase K Variants

  13. Protein Engineering using Machine Learning Starting point Select a protein with some correct initial properties Initial design a) Choose substitutions. b) Design an initial variant set (<50) containing those substitutions Back to Proteinase K Reality check Synthesize and test the variant set for function(s) of interest. New design Propose a new variant set (<50) based on the model. Iterate Machine learning Model the effect of sequence changes on function(s) of interest. End Select the best variant(s).

  14. First proteinase K dataset

  15. Protein Engineering using Machine Learning Starting point Select a protein with some correct initial properties Initial design a) Choose substitutions. b) Design an initial variant set (<50) containing those substitutions Reality check Synthesize and test the variant set for function(s) of interest. New design Propose a new variant set (<50) based on the model. Iterate Machine learning Model the effect of sequence changes on function(s) of interest. End Select the best variant(s).

  16. Sequence-Activity Modeling: How Does it Work? 1. Represent the sequence as a matrix Seq1 AGRWGIGAYHKLIMA Seq2 AGRTGVGVYHKLIMA Seq3 AGRWGIGVYHRLIMA Seq4 AGRTGVGAYHRLIMA becomes T WV IV AR K x x1 x2 x3 x4 x5 x6 x7 x8 Seq1 0 1 0 1 0 1 0 1 Seq2 1 0 1 0 1 0 0 1 Seq3 0 1 0 1 1 0 1 0 Seq4 1 0 1 0 0 1 1 0 2. Measure the activity or activities of interest under the final application conditions 3. y = c1x1 + c2x2 + c3x3 + c4x4 +… cixi

  17. Assessing the Proteinase K Sequence-Activity Relationship Measured activity wt Predicted activity y = c1x1 + c2x2 + c3x3 + c4x4 +… cixi

  18. Learning Methods • Variety of regression methods • Ridge Regression & Lasso • SVM Regression & LPSVM Regression • Matching Loss Regression & One-norm Matching Loss Regression • Partial Least Square Regression • LPBoost Regression • Use bagging to improve the prediction stability

  19. Variants Design I • Main issue: Exploitationvs. Exploration • Optimum design (Exploitation) • Take the combination of substitutions predicted to have maximal activity • Also consider • Substitution frequency in the dataset • Variation of weight estimation. • Used in 2nd & 3rd iterations

  20. Variants Design II • Diversity design (Exploration) • Calculate the combination of substitutions predicted to have maximal activity that is also • No more than 5 changes from a sequence that has already been tested • No closer than 3 changes from a sequence that has already been tested or selected for synthesis • Used in 2nd iteration

  21. Three Iterations of Activity Engineering 50 • ONLY 58 variants were tested to allow design of the fourth set, which contained • 3 variants 20-30 x improved over wild-type • 50% of variants more active than the best of previous sets • 70% of variants more active than wild types • 3-11 changes found in variants better than WT 1st set: 34 variants 45 2nd set: 24 variants 40 3rd set: 38 variants 35 30 Activity relative to wild type 25 20 15 10 5 wild-type 0 0 20 40 60 80 100 120 80 90 Variants in order synthesized

  22. Activity Improvement Improving Activity

  23. 14 12 10 8 6 4 2 0 Activity (pmol/s/ml) Half life at 68°C (s) Variants are Improved in Multiple Properties Half life at 68°C (s) Activity (pmol/s/ml)

  24. Conclusions • Machine learning • Making a very small number of variants (58) allows a productive search of a total space with 500,000 possible combinations • Synthetic Biology • Recent advances in gene synthesis methods were essential for this type of exploration

  25. The Future • Proteins are the building blocks of life with a wide array of applications (therapeutics, diagnostics, industrial catalysts) • Finding a reliable mechanism for optimizing proteins for human applications would be an amazing feat • We steal ideas about how proteins evolve from nature, but optimize proteins outside their in vivo constraints (the proteins don’t have to be compatible with life)

More Related