1 / 37

Proteins Secondary Structure Predictions

Structural Bioinformatics. Proteins Secondary Structure Predictions. Structure Prediction Motivation. Better understand protein function Broaden homology Detect similar function where sequence differs (only ~50% remote homologies can be detected based on sequence) Explain disease

ryder-watts
Download Presentation

Proteins Secondary Structure Predictions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structural Bioinformatics Proteins SecondaryStructure Predictions

  2. Structure Prediction Motivation • Better understand protein function • Broaden homology • Detect similar function where sequence differs (only ~50% remote homologies can be detected based on sequence) • Explain disease • Explain the effect of mutations • Design drugs

  3. Myoglobin – the first high resolution protein structure Solved in 1958 by Max Perutz John Kendrew of Cambridge University. Won the 1962 and Nobel Prize in Chemistry. In 1.1.2012 there are 72,468 protein structures in the protein structure database. Great increase but still a magnitude lower then 53,3657 protein sequences in Uniprot

  4. What can we do?? MERFGYTRAANCEAP…. Predicting the three dimensional structure from sequence of a protein is very hard (some times impossible) However we can predict with relative high precision the secondary structure

  5. What do we mean bySecondary Structure ? Secondary structure are the building blocks of the protein structure: =

  6. What do we mean bySecondary Structure ? Secondary structure is usually divided into three categories: Anything else – turn/loop Alpha helix Beta strand (sheet)

  7. Alpha Helix: Pauling (1951) • A consecutive stretch of 5-40 amino acids (average 10). • A right-handed spiral conformation. • 3.6 amino acids per turn. • Stabilized by H-bonds 3.6 residues 5.6 Å

  8. Beta Strand: Pauling and Corey (1951) • Different polypeptide chains run alongside each • other and are linked together by hydrogen bonds. • Each section is called β -strand, • and consists of 5-10 amino acids. β -strand

  9. 3.25Å 4.6Å 3.47Å 4.6Å Beta Sheet The strands become adjacent to each other, forming beta-sheet. Antiparallel Parallel

  10. Loops • Connect the secondary structure elements. • Have various length and shapes. • Located at the surface of the folded protein and therefore may have important role in biological recognition processes.

  11. Three dimensional Tertiary Structure Describes the packing of alpha-helices, beta-sheets and random coils with respect to each other on the level of one whole polypeptide chain

  12. Secondary Tertiary ? ? RBP ? Globin

  13. How do the (secondary and tertiary) structures relate to the primary protein sequence??

  14. SEQUENCE STRUCTURE -Early experiments have shown that the sequence of the protein is sufficient to determine its structure (Anfisen) - Protein structure is more conserved than protein sequence and more closely related to function.

  15. How (CAN) Different Amino Acid Sequence Determine Similar Protein Structure ?? Lesk and Chothia 1980

  16. The Globin Family

  17. Different sequences can result in similar structures 1ecd 2hhd

  18. We can learn about the important features which determine structure and function by comparing the sequences and structures ?

  19. The Globin Family

  20. Why is Proline 36 conserved in all the globin family ?

  21. Where are the gaps?? The gaps in the pairwise alignment are mapped to the loop regions

  22. retinol-binding protein odorant-binding protein apolipoprotein D How are remote homologs related in terms of their structure? RBD b-lactoglobulin

  23. PSI-BLAST alignment of RBP and b-lactoglobulin: iteration 3 Score = 159 bits (404), Expect = 1e-38 Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%) Query: 3 WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54 V L+ LA A + S V+ENFD ++ G WY + K Sbjct: 1 MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59 Query: 55 DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114 + I A +S+ E G + K V + ++ +PAK +++++ + Sbjct: 60 NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112 Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164 +WI+ TDY+ YA+ YSC + ++ R+P LPPE Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159

  24. The Retinol Binding Protein b-lactoglobulin

  25. Structure Prediction Goal: Predict protein structure based on sequence information

  26. Prediction Approaches • Two stage approach 1. Primary (sequence) to secondary structure 2. Secondary to tertiary • One stage approach - Primary to tertiary structure

  27. Secondary Structure Prediction • Given a primary sequence ADSGHYRFASGFTYKKMNCTEAA what secondary structure will it adopt ?

  28. Secondary Structure Prediction Methods • Chou-Fasman / GOR Method • Based on amino acid frequencies • Machine learning methods • PHDsec and PSIpred • HMM (Hidden Markov Model)

  29. Chou and Fasman (1974) Name P(a) P(b) P(turn) Alanine 142 83 66 Arginine 98 93 95 Aspartic Acid 101 54 146 Asparagine 67 89 156 Cysteine 70 119 119 Glutamic Acid 151 037 74 Glutamine 111 110 98 Glycine 57 75 156 Histidine 100 87 95 Isoleucine 108 160 47 Leucine 121 130 59 Lysine 114 74 101 Methionine 145 105 60 Phenylalanine 113 138 60 Proline 57 55 152 Serine 77 75 143 Threonine 83 119 96 Tryptophan 108 137 96 Tyrosine 69 147 114 Valine 106 170 50 The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet  breaker) Success rate of 50%

  30. Secondary Structure Method Improvements ‘Sliding window’ approach • Most alpha helices are ~12 residues longMost beta strands are ~6 residues long • Look at all windows of size 6/12 • Calculate a score for each window. If >threshold  predict this is an alpha helix/beta sheet TGTAGPOLKCHIQWMLPLKK

  31. Improvements since 1980’s • Adding information from conservation in MSA • Smarter algorithms (e.g. Machine learning, HMM). Success -> 75%-80%

  32. Machine learning approach for predicting Secondary Structure (PHD, PSIpred) Query Step 1: Generating a multiple sequence alignment SwissProt Query Subject Subject Subject Subject

  33. Query Step 2: Additional sequences are added using a profile. We end up with a MSA which represents the protein family. seed MSA Query Subject Subject Subject Subject

  34. Step 3: Query The sequence profile of the protein family is compared (by machine learning methods) to sequences with known secondary structure. seed Machine Learning Approach MSA Known structures Query Subject Subject Subject Subject

  35. HMM approach for predicting Secondary Structure (SAM) • HMM enables us to calculate the probability of assigning a sequence to a secondary structure TGTAGPOLKCHIQWML HHHHHHHLLLLBBBBB p = ?

  36. Beginning with an α-helix The probability of observing Alanine as part of a β-sheet α-helix followed by α-helix The probability of observing a residue which belongs to an α-helix followed by a residue belonging to a turn = 0.15 Table built according to large database of known secondary structures

  37. The above table enables us to calculate the probability of assigning secondary structure to a protein • Example TGQ HHH p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8x 0.0635 = 0.0020995

More Related