Structural Bioinformatics. Proteins Secondary Structure Predictions. Structure Prediction Motivation. Better understand protein function Broaden homology Detect similar function where sequence differs (only ~50% remote homologies can be detected based on sequence) Explain disease
(only ~50% remote homologies can be detected based on sequence)
Solved in 1958 by Max Perutz John Kendrew of Cambridge University.
Won the 1962 and Nobel Prize in Chemistry.
“ Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regularities which one instinctively anticipates.”
Predicting the three dimensional structure from sequence of a protein is very hard
(some times impossible)
However we can predict with relative high precision the secondary structure
Secondary structure are the building blocks of the protein structure:
Secondary structure is usually divided into three categories:
Anything else – turn/loop
Beta strand (sheet)
Alpha Helix: Pauling (1951)
Beta Strand: Pauling and Corey (1951)
The strands become adjacent to each other, forming beta-sheet.
Describes the packing of alpha-helices, beta-sheets and random coils with respect to each other on the level of one whole polypeptide chain
SEQUENCE primary protein sequence??
-Early experiments have shown that the sequence of the protein is sufficient to determine its structure (Anfisen)
- Protein structure is more conserved than protein sequence and more closely related to function.
Lesk and Chothia 1980
The Globin Family primary protein sequence??
Different sequences can result in similar structures primary protein sequence??
We can learn about the important features which determine structure and function by comparing the sequences and structures ?
The Globin Family structure and function by comparing the sequences and structures ?
Why is Proline 36 conserved in all the globin family ? structure and function by comparing the sequences and structures ?
The gaps in the pairwise alignment are mapped to the loop regions
retinol-binding structure and function by comparing the sequences and structures ?
How are remote homologs related in terms of their structure?
PSI-BLAST alignment of RBP and structure and function by comparing the sequences and structures ?b-lactoglobulin: iteration 3
Score = 159 bits (404), Expect = 1e-38
Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%)
Query: 3 WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54
V L+ LA A + S V+ENFD ++ G WY + K
Sbjct: 1 MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59
Query: 55 DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114
+ I A +S+ E G + K V + ++ +PAK +++++ +
Sbjct: 60 NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112
Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164
+WI+ TDY+ YA+ YSC + ++ R+P LPPE
Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159
The Retinol Binding Protein structure and function by comparing the sequences and structures ?
on sequence information
1. Primary (sequence) to secondary structure
2. Secondary to tertiary
- Primary to tertiary structure
According to the most simplified model: structure and function by comparing the sequences and structures ?
what secondary structure will it adopt ?
Name P(a) P(b) P(turn) Alanine 142 83 66
Arginine 98 93 95
Aspartic Acid 101 54 146
Asparagine 67 89 156
Cysteine 70 119 119
Glutamic Acid 151 037 74
Glutamine 111 110 98
Glycine 57 75 156
Histidine 100 87 95
Isoleucine 108 160 47
Leucine 121 130 59
Lysine 114 74 101
Methionine 145 105 60
Phenylalanine 113 138 60
Proline 57 55 152
Serine 77 75 143
Threonine 83 119 96
Tryptophan 108 137 96
Tyrosine 69 147 114
Valine 106 170 50
The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet breaker)
Success rate of 50%
‘Sliding window’ approach
Success -> 75%-80%
Generating a multiple sequence alignment
Query (PHD, PSIpred)
Additional sequences are added using a profile. We end up with a MSA which represents the protein family.
The sequence profile of the protein family is compared (by machine learning methods) to sequences with known secondary structure.
HMM approach for predicting (PHD, PSIpred)
Secondary Structure (SAM)
p = ?
Beginning with an (PHD, PSIpred)α-helix
The probability of observing Alanine as part of a β-sheet
α-helix followed by α-helix
The probability of observing a residue which belongs to an α-helix followed by a residue belonging to a turn = 0.15
Table built according to large database of known secondary structures
p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8x 0.0635 = 0.0020995