Prediction of protein disorder - PowerPoint PPT Presentation

prediction of protein disorder n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Prediction of protein disorder PowerPoint Presentation
Download Presentation
Prediction of protein disorder

play fullscreen
1 / 34
Prediction of protein disorder
222 Views
Download Presentation
ivrit
Download Presentation

Prediction of protein disorder

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Prediction of protein disorder Zsuzsanna Dosztányi Institute of Enzymology, Budapest, Hungaryzsuzsa@enzim.hu

  2. Protein Structure/Function Paradigm Dominant view: 3D structure is a prerequisite for protein function Amino acid sequence Structure Function

  3. But…. • Heat stability • Protease sensitivity • Failed attempts to crystallize • Lack of NMR signals • “Weird” sequences …

  4. IDPs • Intrinsically disordered proteins/regions (IDPs/IDRs) • Do not adopt a well-defined structure in isolation under native-like conditions • Highly flexible ensembles, little secondary structure, no folded structure • Functional proteins

  5. Protein disorder is prevalent 60 E 40 LDR (40<) protein, % A 20 B 0 kingdom

  6. Protein disorder is important • Prion protein Prion disease • CFTR Cystic fibrosis • tAlzheimer’s • -synuclein Parkinson’s • p53, BRCA1 cancer

  7. Protein disorder is functional regulatory signaling biosynthetic metabolic protein (%) 30< 40< 50< 60 < length of disordered region Iakoucheva et al. (2002) J. Mol. Biol. 323, 573

  8. p53 tumor suppressor transactivation regulation tetramerization DNA-binding DBD TD RD TAD Wells et al. PNAS 2008; 105: 5762

  9. Heterogeneity in protein disorder Transient structures Flexible loop RC-like Compact

  10. Modularity in proteins • Many proteins contains multiple domains • Composed of ordered and disordered segments • Average length of a PDB chain is < 300 • Average length of a human proteins ~ 500 • Average length of cancer-related proteins > 900 • Structural properties of full length proteins …

  11. Bioinformatics of protein disorder • Part 1 • Databases • Prediction of protein disorder • Part 2 • Prediction of functional regions within IDPs

  12. Datasets • Ordered proteins in the PDB • over 94000 structures • few 1000 folds • Some structures in the PDB classify as disordered! • only adopt a well-defined structure in complex • in crystals, with cofactors, proteins, … • Disorder in the PDB • Missing electron density regions from the PDB • NMR structures with large structural variations • Less than 10% of all positions • Usually short (<10 residues), often at the termini

  13. Disprot www.disprot.org Current release: 6.02Release date: 05/24/2013Number of proteins: 694Number of disordered regions: 1539 Experimentally verified disordered proteins collected from literature (X-ray, NMR, CD, proteolysis, SAXS, heat stability, gel filtration, …)

  14. Additional databases • Combining experiments and predictions • Genome level annotations • MobiDB: http://mobidb.bio.unipd.it • D2P2: http://d2p2.pro • IDEAL: http://www.ideal.force.cs.is.nagoya-u.ac.jp/IDEAL

  15. Amino acid compositions He et al. Cell Res. 2009; 19: 929

  16. Sequence properties of disordered proteins • Amino acid compositional bias • High proportion of polar and charged amino acids (Gln, Ser, Pro, Glu, Lys) • Low proportion of bulky, hydrophobhic amino acids (Val, Leu, Ile, Met, Phe, Trp, Tyr) • Low sequence complexity • Signature sequences identifying disordered proteins Protein disorder is encoded in the amino acid sequence

  17. Mean net charge Mean hydrophobicity Uversky plot: charge-hydrophobicity (two parameters) Uversky (2002) Eur. J. Biochem. 269, 2 Uversky (2002) Eur. J. Biochem. 269, 2

  18. Making it position specific: FoldIndex http://bip.weizmann.ac.il/fldbin/findex p53 Prilusky (2005) Bioinformatics 21, 3435

  19. Disorder Prediction Methods Amino acid propensity scales GlobPlot Compare the tendency of amino acids: • to be in coil (irregular) structure. • to be in regular secondary structure elements Linding (2003) NAR 31, 3701

  20. GlobPlot

  21. GlobPlot From position specific predictions Where are the ordered domains? Longer disordered segments? Noise vs. real data

  22. downhill regions correspond to putative domains (GlobDom) up-hill regions correspond to predicted protein disorder GlobPlot: http://globplot.embl.de/

  23. If a residue cannot form enough favorable interactions within its sequential environment, it will not adopt a well defined structure it will be disordered Disorder Prediction Methods Physical principles IUPred Dosztanyi (2005) JMB 347, 827

  24. Energy description of proteins Estimation of interaction energies based on statistical potentials: Calculated from the frequency of amino acid interactions in globular proteins alone, based on the Boltzmann hypothesis. For example: • L-I interaction is frequent (hydrophobic effect) L-Iinteraction energy is low (favorable) • K-R interaction is rare (electrostatic repulsion) K-R interaction energy is high (unfavorable)

  25. Decide the probability of the residue being disordered based on this A – 10% C – 0% D – 12 % E – 10 % F – 2 % etc… Estimate the interaction energy between the residue and its environment Amino acid composition of environ-ment: Predicting protein disorder - IUPred • The algorithm: …PSVEPPLSQETFSDL WKLLPENNVLSPLPSQAMDDLMLSP D DIEQWFTEDPGPDEAPRMPEAAPRVA PAPAAPTPAA... Based only on the composition of environment of D’s we try to predict if it is in a disordered region or not:

  26. IUPred: http://iupred.enzim.hu/

  27. Disorder Prediction Methods Machine learning DISOPRED2 Binary classification problem Ward (2004) JMB 337, 635

  28. DISOPRED2 …..AMDDLMLSPDDIEQWFTED….. SVM with linear kernel F(inp) Assign label: D or O O D

  29. DISOPRED2 Cutoff value!

  30. PONDR VSL2 Differences in short and long disorder • amino acid composition • methods trained on one type of dataset tested on other dataset resulted in lower efficiencies PONDR VSL2: separate predictors for short and long disorder combined length independent predictions Peng (2006) BMC Bioinformatics 7, 208

  31. PONDR-FIT Disorder prediction methods Meta-predictor PONDR VLXT PONDR VL3 PONDR VSL2 Sequence Prediction ANN IUPred FoldIndex TopIDP Xue et al. Biochem Biophys Acta. 2010; 180: 996

  32. Complexity of protein disorder

  33. Prediction of protein disorder • Disordered residues can be predicted from the amino acid sequence • ~ 80% at the residue level • Methods can be specific to certain type of disorder • accordingly, accuracies vary depending on datasets • Predictions are based on binary classification of disorder