Protein Folding Prediction - PowerPoint PPT Presentation

sanam
protein folding prediction n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Protein Folding Prediction PowerPoint Presentation
Download Presentation
Protein Folding Prediction

Loading in 2 Seconds...

play fullscreen
1 / 54
Download Presentation
120 Views
Download Presentation

Protein Folding Prediction

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Protein Folding Prediction Lauren M. Yarholar Rufei Lu Warren Yates Armando Diaz Miguel J Bagajewicz, Ph.D. School of Chemical, Biological, and Materials Engineering, College of Engineering, University of Oklahoma

  2. Overview • Introduction • Background • Problem • Energy Forms • Methods • Genetic Algorithm • Results and Discussion • Conclusion • VBA (Visual Basic Add-in) Program Demonstration

  3. Background • A protein is a string of amino acids connected by peptide bonds. • Amino acid • Acidic • Basic • Aliphatic • Polar uncharged • Aromatic N-Terminus C-Terminus

  4. Background - Amino Acid

  5. Background - Amino Acid

  6. Protein Folding – Primary

  7. Protein Folding – Secondary

  8. Protein Folding – Tertiary

  9. Protein Folding – Quaternary

  10. Appropriate protein folding is critical for function and health • Proteins catalyze over 1,000 biochemical reactions in the human body.

  11. Appropriate protein folding is critical for function and health • Protein misfoldings are responsible for over 20 diseases. • Mad Cow disease caused by an “evil” protein - The “evil” protein and normal protein have identical primary structures, but their tertiary structures are different. Normal PrP Diseased PrP

  12. Difficulties of Predicting Protein Structures • Some proteins fold as fast as a millionth of a second • Theoretically, a protein of only 100 amino acids following the trial and error method would take 100 billion years to try out all possible conformations! • Protein structures are highly dependent upon various environmental parameters. • Such as temperature, pH, solvent, etc.

  13. Protein Folding Prediction Methods • Comparative - Use evolutionary related protein • Advantages: fast and simple • Disadvantages: conformation depends upon environmental parameters • Folding Recognition - Utilize a database of known 3-D protein structure • Advantages: more accurate than comparative • Disadvantages: not enough NMR confirmed protein structures • Ab Initio - Uses both scientific and engineering approach • Advantages: has potential to predict exact shape and immediate structures • Disadvantages: computing limitations, difficulty in selecting correct potential energy function

  14. Problems with the Current Prediction Methods • Not enough NMR confirmed protein structure in Protein Data Bank (PDB) • Evolutionary relatedness does not necessarily translate to similar structure • Ab initio difficulties • Hydrophilic and hydrophobic modeling gives only general arrangement of the protein • 2-D modeling does not predict 3-D shape of the protein • Monte-carlo computing method is time consuming and does not necessarily reach global minimum

  15. Objectives • Develop a genetic algorithm based program to predict protein conformation • Reduce the generations needed for prediction, thus enhance the efficiency of the search • Explore different additional operators to modify genetic algorithm • Predict the protein conformation of a short 5-AA peptide, Enkephalin

  16. Energy Calculation Potential Energy Model Energy Calculation Energy VBA program

  17. Potential Energy Models • Electrostatic Energy • Nonbonding Energy • Hydrogen Bonding Energy • Cystein-Cystein Loop Energy

  18. Electrostatic Energy • Energy term calculated in atom pairs • Modeled after coulomb force • Forces between two charges at certain distance (rij)

  19. Electrostatic Energy r + + E, Joule Electrostatic term r, Angstrom

  20. Nonbonding Energy • Two types of Lennard-Jones potential • 1-4 atom - connected by three bonds • 1-5 atom, higher interaction - connected by more than three bonds

  21. Nonbonded Energy • Modeled after Lennard-Jones Potential Repulsion/Attractive forces F -F 1-4 Interactions 1 2 1-5 Interactions

  22. Hydrogen Bonding • Energy associated with the hydrogen bonding in the protein.

  23. Cysteine-Cysteine Loop Closing • Included if there are one or more intramolecular disulfide bonds

  24. Atom Position Calculation Backbone Calculation Side-Chain Calculation Branch of Side-Chain Calculation

  25. Introduction • The rotational angle between the bond between one pair of adjacent atoms and the next pair’s bond is called a dihedral angle • Phi is between N and C, psi is between C and C’, omega is between C’ and N

  26. Backbone Calculation • First 3 atoms on the peptide chain are fixed • The coordinate system is arbitrarily determined around the first H atom of the N-terminus • Assumptions: • Minimal bond length stretch • Bond angle stays constant • Torsion angle (dihedral angle) applies to the 4th atom x q w Ca (-1.52,1.37,0) Y N (-1.04 ,0,0) Z H- (0,0,0)

  27. Backbone Calculation The first 3 Bn parameters are fixed due to the previous assumption, B1, B2, and B3 corresponds to the H-, -N-, Ca

  28. Side and Branch Group Calculation • Fisher projections to determine the dihedral angle of side-group atoms • Assumption: • Tetrahedral structure: 120o apart • Bent structure: 180o apart w1= dihedral angle w2= 120 + w1 w2= 180 + w1 w1

  29. Genetic Algorithm GA Search and Optimization Fitness Function Genetic Operators α-helix and b-sheet implementation Binary GA

  30. Genetic Algorithm • Search and optimization method that mimics the natural selection • Terms to define • Chromosome – a set of torsion angles • Gene – an individual torsion angle • Generation – a single loop within GA loop search • Loops through the reproduction, mutation, and adaptation process to obtain best fit model

  31. Genetic Algorithm • Use a computer simulation to perform an intelligent search/optimization to find the native protein conformation that requires the least amount of energy Native Conformation

  32. Genetic Algorithm based Protein Structure Search (GAPSS) • GAPSS is developed under Visual Basic Add-in environment • Modified genetic operators • Fitness function based selection • Multiple entries crossover • Non-uniform mutation • Adaptation • Advantages • Faster convergence • User-friendly

  33. Fitness Function • Basic three primary energy: Eletrostatic, Nonbonded (6-12), and Hydrogen Bonded • Exclude Torsion Energy • Not real interaction energy • Introduce penalty for positive torsion • Cystine Loop-Closing introduced only when more than one cysteins are present in the protein

  34. Genetic Operator - Selection • Selection Operator • Ranked Selection – higher the rank higher the probability of being chosen • Fitness Selection – better the fitness higher the probability of being chosen • Benefits of Selection • Aid the Elitism Search Higher rank or better fitness Lower rank or worse fitness

  35. Genetic Operator - Mutation • Mutation Operator • Uniform Mutation – randomly replace with a value from -180 to 180 • Non-uniform mutation – add or subtract a random value between 0 and 180 • Effects of Mutation • Introduce variance to search • Aid the search for global minimum by directing gradient search out of the local minima

  36. Genetic Operator - Crossover • Crossover Operator • Random 2-point Crossover – randomly exchange between parents 2 angles at a time • Multiple Entries Crossover – multiple random exchange • Benefits of Crossover • Aid the search for elites • Optimize the search by keeping the optimal folding segments

  37. Genetic Operator - Adaptation • Adaptation Operator • Gradient search applied to each chromosome • Predict energy profile • Benefits of Adaptation • Provide the local minima search • Determine the energy profile of the native folding process

  38. Three GA Approaches • Free GA search – no restriction on dihedral angles with exception of omega and ring structure • Advantages: use in any protein search, empirical way of obtaining protein conformation, and useful for energy profile search • α-helices and b-sheets specific GA search – randomly select segment of protein as α-helices and b-sheets • Advantages: enhance the speed of free GA and accurate search for α-helices and b-sheets • Binary GA search – use binary to represent dihedral angles instead decimal • Advantages: No barrier when doing crossover

  39. α-helix and b-sheet Implementation • Creates α-helices and b-sheets of random lengths at random start positions • Each α-helix or b-sheet created in this way is described by two parameters • Crossover will involve trading the two parameters between two individuals

  40. α-helix and b-sheet Implementation • When α-helices are crossed over, each individual’s new energy is compared to its old energy. If there is a net improvement, the crossover is kept. • The “former helix” regions will be filled with random torsion angles like normal Green region Blue region

  41. Binary Code Implementation • Transfer torsion angles to binary code • Integer and decimal coded separately to shorten the total number of digits - 17 digits altogether • Idea is to make the torsion angles on a single chromosome represented by one long continuous chain • Cross over and Mutation operators all similar to GA 101001010100100001010011101011000010101101010010000101001010100100001010010101001010010100101010011100

  42. Results and Discussion Individual AA Prediction Enkephalin Prediction Performance Analysis Discussion

  43. Single AA Prediction • All single AA was predicted with GAPSS • GA parameters • Initial population: 20 • Generation limitation: 15 • Percentage of mutations: 90% • Compared to native single AA folding

  44. Single AA Prediction Asparagine N Asn Alanine A Ala Asparatic Acid D Asp Cysteine C Cys Glutamine Q Gln Glutamic Acid E Glu Glycine G Gly Isoleucine I Ile

  45. Single AA Prediction Leucine L Leu Serine S Ser Methionine M Met Valine V Val Threonine T Thr

  46. Enkephalin Prediction • Enkephalin is pentapeptide that is involved in regulating pain • Two forms of enkephalin • Methylated-enkephalin – Tyr-Gly-Gly-Phe-Met • Leucine-enkephalin – Tyr-Gly-Gly-Phe-Leu • Short enough to confirm the accuracy of the GAPSS, however still contains complex ring side groups

  47. Enkephalin Prediction • Gradient zero conformations suggests the GAPSS are capable of obtaining local minima • Backbone conformations showed incredible similarities • Side group conformations still show discrepancy between predicted and theoretical

  48. Local Minimum Conformatons • GAPSS was able to locate a few local minimum protein conformations

  49. Enkephalin Prediction - Backbone • Backbone structure was predicted by the GAPSS GA predicted Backbone Structure NMR Confirmed Backbone Structure

  50. Enkephalin Prediction • Discrepancies between side groups due to the lack of entropy, solvation energy, and center partial charge assumption GA predicted Backbone Structure NMR Confirmed Backbone Structure