Structural Bioinformatics. Proteins Secondary Structure Predictions. Structure Prediction Motivation. Better understand protein function Broaden homology Detect similar function where sequence differs (only ~50% remote homologies can be detected based on sequence) Explain disease
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Proteins SecondaryStructure Predictions
(only ~50% remote homologies can be detected based on sequence)
Myoglobin – the first high resolution protein structure
Solved in 1958 by Max Perutz John Kendrew of Cambridge University.
Won the 1962 and Nobel Prize in Chemistry.
“ Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regularities which one instinctively anticipates.”
Predicting the three dimensional structure from sequence of a protein is very hard
(some times impossible)
However we can predict with relative high precision the secondary structure
Secondary structure are the building blocks of the protein structure:
Secondary structure is usually divided into three categories:
Anything else – turn/loop
Beta strand (sheet)
Alpha Helix: Pauling (1951)
Beta Strand: Pauling and Corey (1951)
The strands become adjacent to each other, forming beta-sheet.
Describes the packing of alpha-helices, beta-sheets and random coils with respect to each other on the level of one whole polypeptide chain
-Early experiments have shown that the sequence of the protein is sufficient to determine its structure (Anfisen)
- Protein structure is more conserved than protein sequence and more closely related to function.
Lesk and Chothia 1980
The Globin Family
Different sequences can result in similar structures
We can learn about the important features which determine structure and function by comparing the sequences and structures ?
The Globin Family
Why is Proline 36 conserved in all the globin family ?
The gaps in the pairwise alignment are mapped to the loop regions
How are remote homologs related in terms of their structure?
PSI-BLAST alignment of RBP and b-lactoglobulin: iteration 3
Score = 159 bits (404), Expect = 1e-38
Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%)
Query: 3 WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54
V L+ LA A + S V+ENFD ++ G WY + K
Sbjct: 1 MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59
Query: 55 DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114
+ I A +S+ E G + K V + ++ +PAK +++++ +
Sbjct: 60 NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112
Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164
+WI+ TDY+ YA+ YSC + ++ R+P LPPE
Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159
The Retinol Binding Protein
on sequence information
1. Primary (sequence) to secondary structure
2. Secondary to tertiary
- Primary to tertiary structure
According to the most simplified model:
what secondary structure will it adopt ?
Name P(a) P(b) P(turn) Alanine 142 83 66
Arginine 98 93 95
Aspartic Acid 101 54 146
Asparagine 67 89 156
Cysteine 70 119 119
Glutamic Acid 151 037 74
Glutamine 111 110 98
Glycine 57 75 156
Histidine 100 87 95
Isoleucine 108 160 47
Leucine 121 130 59
Lysine 114 74 101
Methionine 145 105 60
Phenylalanine 113 138 60
Proline 57 55 152
Serine 77 75 143
Threonine 83 119 96
Tryptophan 108 137 96
Tyrosine 69 147 114
Valine 106 170 50
The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet breaker)
Success rate of 50%
‘Sliding window’ approach
Success -> 75%-80%
Generating a multiple sequence alignment
Additional sequences are added using a profile. We end up with a MSA which represents the protein family.
The sequence profile of the protein family is compared (by machine learning methods) to sequences with known secondary structure.
HMM approach for predicting
Secondary Structure (SAM)
p = ?
Beginning with an α-helix
The probability of observing Alanine as part of a β-sheet
α-helix followed by α-helix
The probability of observing a residue which belongs to an α-helix followed by a residue belonging to a turn = 0.15
Table built according to large database of known secondary structures
p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8x 0.0635 = 0.0020995