1 / 54

Protein Folding Prediction

Protein Folding Prediction. Lauren M. Yarholar Rufei Lu Warren Yates Armando Diaz Miguel J Bagajewicz , Ph.D. School of Chemical, Biological, and Materials Engineering, College of Engineering, University of Oklahoma. Overview. Introduction Background Problem Energy Forms Methods

sanam
Download Presentation

Protein Folding Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Folding Prediction Lauren M. Yarholar Rufei Lu Warren Yates Armando Diaz Miguel J Bagajewicz, Ph.D. School of Chemical, Biological, and Materials Engineering, College of Engineering, University of Oklahoma

  2. Overview • Introduction • Background • Problem • Energy Forms • Methods • Genetic Algorithm • Results and Discussion • Conclusion • VBA (Visual Basic Add-in) Program Demonstration

  3. Background • A protein is a string of amino acids connected by peptide bonds. • Amino acid • Acidic • Basic • Aliphatic • Polar uncharged • Aromatic N-Terminus C-Terminus

  4. Background - Amino Acid

  5. Background - Amino Acid

  6. Protein Folding – Primary

  7. Protein Folding – Secondary

  8. Protein Folding – Tertiary

  9. Protein Folding – Quaternary

  10. Appropriate protein folding is critical for function and health • Proteins catalyze over 1,000 biochemical reactions in the human body.

  11. Appropriate protein folding is critical for function and health • Protein misfoldings are responsible for over 20 diseases. • Mad Cow disease caused by an “evil” protein - The “evil” protein and normal protein have identical primary structures, but their tertiary structures are different. Normal PrP Diseased PrP

  12. Difficulties of Predicting Protein Structures • Some proteins fold as fast as a millionth of a second • Theoretically, a protein of only 100 amino acids following the trial and error method would take 100 billion years to try out all possible conformations! • Protein structures are highly dependent upon various environmental parameters. • Such as temperature, pH, solvent, etc.

  13. Protein Folding Prediction Methods • Comparative - Use evolutionary related protein • Advantages: fast and simple • Disadvantages: conformation depends upon environmental parameters • Folding Recognition - Utilize a database of known 3-D protein structure • Advantages: more accurate than comparative • Disadvantages: not enough NMR confirmed protein structures • Ab Initio - Uses both scientific and engineering approach • Advantages: has potential to predict exact shape and immediate structures • Disadvantages: computing limitations, difficulty in selecting correct potential energy function

  14. Problems with the Current Prediction Methods • Not enough NMR confirmed protein structure in Protein Data Bank (PDB) • Evolutionary relatedness does not necessarily translate to similar structure • Ab initio difficulties • Hydrophilic and hydrophobic modeling gives only general arrangement of the protein • 2-D modeling does not predict 3-D shape of the protein • Monte-carlo computing method is time consuming and does not necessarily reach global minimum

  15. Objectives • Develop a genetic algorithm based program to predict protein conformation • Reduce the generations needed for prediction, thus enhance the efficiency of the search • Explore different additional operators to modify genetic algorithm • Predict the protein conformation of a short 5-AA peptide, Enkephalin

  16. Energy Calculation Potential Energy Model Energy Calculation Energy VBA program

  17. Potential Energy Models • Electrostatic Energy • Nonbonding Energy • Hydrogen Bonding Energy • Cystein-Cystein Loop Energy

  18. Electrostatic Energy • Energy term calculated in atom pairs • Modeled after coulomb force • Forces between two charges at certain distance (rij)

  19. Electrostatic Energy r + + E, Joule Electrostatic term r, Angstrom

  20. Nonbonding Energy • Two types of Lennard-Jones potential • 1-4 atom - connected by three bonds • 1-5 atom, higher interaction - connected by more than three bonds

  21. Nonbonded Energy • Modeled after Lennard-Jones Potential Repulsion/Attractive forces F -F 1-4 Interactions 1 2 1-5 Interactions

  22. Hydrogen Bonding • Energy associated with the hydrogen bonding in the protein.

  23. Cysteine-Cysteine Loop Closing • Included if there are one or more intramolecular disulfide bonds

  24. Atom Position Calculation Backbone Calculation Side-Chain Calculation Branch of Side-Chain Calculation

  25. Introduction • The rotational angle between the bond between one pair of adjacent atoms and the next pair’s bond is called a dihedral angle • Phi is between N and C, psi is between C and C’, omega is between C’ and N

  26. Backbone Calculation • First 3 atoms on the peptide chain are fixed • The coordinate system is arbitrarily determined around the first H atom of the N-terminus • Assumptions: • Minimal bond length stretch • Bond angle stays constant • Torsion angle (dihedral angle) applies to the 4th atom x q w Ca (-1.52,1.37,0) Y N (-1.04 ,0,0) Z H- (0,0,0)

  27. Backbone Calculation The first 3 Bn parameters are fixed due to the previous assumption, B1, B2, and B3 corresponds to the H-, -N-, Ca

  28. Side and Branch Group Calculation • Fisher projections to determine the dihedral angle of side-group atoms • Assumption: • Tetrahedral structure: 120o apart • Bent structure: 180o apart w1= dihedral angle w2= 120 + w1 w2= 180 + w1 w1

  29. Genetic Algorithm GA Search and Optimization Fitness Function Genetic Operators α-helix and b-sheet implementation Binary GA

  30. Genetic Algorithm • Search and optimization method that mimics the natural selection • Terms to define • Chromosome – a set of torsion angles • Gene – an individual torsion angle • Generation – a single loop within GA loop search • Loops through the reproduction, mutation, and adaptation process to obtain best fit model

  31. Genetic Algorithm • Use a computer simulation to perform an intelligent search/optimization to find the native protein conformation that requires the least amount of energy Native Conformation

  32. Genetic Algorithm based Protein Structure Search (GAPSS) • GAPSS is developed under Visual Basic Add-in environment • Modified genetic operators • Fitness function based selection • Multiple entries crossover • Non-uniform mutation • Adaptation • Advantages • Faster convergence • User-friendly

  33. Fitness Function • Basic three primary energy: Eletrostatic, Nonbonded (6-12), and Hydrogen Bonded • Exclude Torsion Energy • Not real interaction energy • Introduce penalty for positive torsion • Cystine Loop-Closing introduced only when more than one cysteins are present in the protein

  34. Genetic Operator - Selection • Selection Operator • Ranked Selection – higher the rank higher the probability of being chosen • Fitness Selection – better the fitness higher the probability of being chosen • Benefits of Selection • Aid the Elitism Search Higher rank or better fitness Lower rank or worse fitness

  35. Genetic Operator - Mutation • Mutation Operator • Uniform Mutation – randomly replace with a value from -180 to 180 • Non-uniform mutation – add or subtract a random value between 0 and 180 • Effects of Mutation • Introduce variance to search • Aid the search for global minimum by directing gradient search out of the local minima

  36. Genetic Operator - Crossover • Crossover Operator • Random 2-point Crossover – randomly exchange between parents 2 angles at a time • Multiple Entries Crossover – multiple random exchange • Benefits of Crossover • Aid the search for elites • Optimize the search by keeping the optimal folding segments

  37. Genetic Operator - Adaptation • Adaptation Operator • Gradient search applied to each chromosome • Predict energy profile • Benefits of Adaptation • Provide the local minima search • Determine the energy profile of the native folding process

  38. Three GA Approaches • Free GA search – no restriction on dihedral angles with exception of omega and ring structure • Advantages: use in any protein search, empirical way of obtaining protein conformation, and useful for energy profile search • α-helices and b-sheets specific GA search – randomly select segment of protein as α-helices and b-sheets • Advantages: enhance the speed of free GA and accurate search for α-helices and b-sheets • Binary GA search – use binary to represent dihedral angles instead decimal • Advantages: No barrier when doing crossover

  39. α-helix and b-sheet Implementation • Creates α-helices and b-sheets of random lengths at random start positions • Each α-helix or b-sheet created in this way is described by two parameters • Crossover will involve trading the two parameters between two individuals

  40. α-helix and b-sheet Implementation • When α-helices are crossed over, each individual’s new energy is compared to its old energy. If there is a net improvement, the crossover is kept. • The “former helix” regions will be filled with random torsion angles like normal Green region Blue region

  41. Binary Code Implementation • Transfer torsion angles to binary code • Integer and decimal coded separately to shorten the total number of digits - 17 digits altogether • Idea is to make the torsion angles on a single chromosome represented by one long continuous chain • Cross over and Mutation operators all similar to GA 101001010100100001010011101011000010101101010010000101001010100100001010010101001010010100101010011100

  42. Results and Discussion Individual AA Prediction Enkephalin Prediction Performance Analysis Discussion

  43. Single AA Prediction • All single AA was predicted with GAPSS • GA parameters • Initial population: 20 • Generation limitation: 15 • Percentage of mutations: 90% • Compared to native single AA folding

  44. Single AA Prediction Asparagine N Asn Alanine A Ala Asparatic Acid D Asp Cysteine C Cys Glutamine Q Gln Glutamic Acid E Glu Glycine G Gly Isoleucine I Ile

  45. Single AA Prediction Leucine L Leu Serine S Ser Methionine M Met Valine V Val Threonine T Thr

  46. Enkephalin Prediction • Enkephalin is pentapeptide that is involved in regulating pain • Two forms of enkephalin • Methylated-enkephalin – Tyr-Gly-Gly-Phe-Met • Leucine-enkephalin – Tyr-Gly-Gly-Phe-Leu • Short enough to confirm the accuracy of the GAPSS, however still contains complex ring side groups

  47. Enkephalin Prediction • Gradient zero conformations suggests the GAPSS are capable of obtaining local minima • Backbone conformations showed incredible similarities • Side group conformations still show discrepancy between predicted and theoretical

  48. Local Minimum Conformatons • GAPSS was able to locate a few local minimum protein conformations

  49. Enkephalin Prediction - Backbone • Backbone structure was predicted by the GAPSS GA predicted Backbone Structure NMR Confirmed Backbone Structure

  50. Enkephalin Prediction • Discrepancies between side groups due to the lack of entropy, solvation energy, and center partial charge assumption GA predicted Backbone Structure NMR Confirmed Backbone Structure

More Related