1 / 17

Development of classification methods to predict new 14-3-3-binding proteins and phosphopeptides

Development of classification methods to predict new 14-3-3-binding proteins and phosphopeptides. Fábio M. Marques Madeira Supervisor: Professor Geoff Barton. 7 th May 2013. 14-3-3s dock onto pairs of tandem phosphoSer / Thr. 2R-ohnologue families. P. P. Kinase 1. 14-3-3. Kinase 2.

callia
Download Presentation

Development of classification methods to predict new 14-3-3-binding proteins and phosphopeptides

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Development of classification methods to predict new 14-3-3-binding proteins and phosphopeptides Fábio M. Marques Madeira Supervisor: Professor Geoff Barton 7th May 2013

  2. 14-3-3s dock onto pairs of tandem phosphoSer/Thr 2R-ohnologue families P P Kinase 1 14-3-3 Kinase 2 Hundreds of structurally and functionally diverse targets 1

  3. The binding specificity of 14-3-3s is determined by overall steric fit and the sequence flanking the phosphoSer/Thr site P P Mode I: RSX(pS/T)XP Mode II: RX(F/Y)X(pS)XP Mode III: C-terminal X(pS/T) Johnson et al., (2011) Molecular & cellular proteomics10, M110.005751. 2

  4. ANIA: ANnotation and Integrated Analysis of the 14-3-3 interactome 3

  5. Development and evaluation of three new classifiers Position-specific scoring matrix (PSSM) Artificial Neural Network (ANN) Support Vector Machines (SVM) 6

  6. Defining positive and negative examples for training and testing Training datasets: Current Pos 93 Neg Previous Pos 76 Neg 72 Proteins -N C- 1,192 Likely Neg pS/T pS/T 5

  7. Defining positive and negative examples for training and testing Training datasets: Blind datasets: Previous 17 Pos 17 Neg Current Pos 93 Neg Current 38 Pos 38 Neg Previous Pos 76 Neg 1,192 Likely Neg • Sequence redundancy thresholds: • 60%, 50% and 40% -11:11 -9:9 -7:7 Different motif regions/lengths: -5:5 -3:3 5

  8. Development and evaluation of three new classifiers The area under the curve (AUC) was tested by Jackknife 7

  9. Development and evaluation of three new classifiers Q - Accuracy MCC - Matthews Correlation Coefficient 8

  10. Amino acid alphabet reduction reduces accuracy Grouping 20 amino acids in 10 physicochemical classes: Livingston and Barton, 1993 Li et al., 2003 • Overall, alphabet reduction led to lower classification performances, suggesting that some sequence features that influence 14-3-3 binding, were lost by the reduction. 9

  11. Protein secondary structure, disorder and conservation do not improve the performance of the ANN Sequence conservation Protein secondary structure by Jpred Protein disorder by IUPred, DisEMBL and GlobPlot P – Positives; N – Negatives (true + likely neg); L – Likely neg only; R – Random neg 10

  12. Blind testing shows that the PSSM is the best overall predictor 80% Overall Accuracy 11

  13. Prediction of new 14-3-3-binding sites using the PSSM Human Proteome 12

  14. The PSSM predictor outperforms Scansite intermsofaccuracy Scansite includes a set of predictions based on type I 14-3-3-binding motif: RSX(pS/T)XP Scansite PSSM 13

  15. Conclusions • New strategy to map negative datasets • Performance improvement (AUC from ~0.80 to 0.88) and 80% accuracy, for the PSSM model (60% and [-5:5]) • Large-scale prediction of the human 14-3-3-binding proteome • The PSSM classifier outperforms Scansite in terms of accuracy 15

  16. Future work • Test training of the classifiers using non-symmetrical motif regions: e.g. [-6:3] • Investigate new machine learning algorithms such as Bayesian classifiers • Use the PSSM classifier to predict the 14-3-3-binding proteome of model organisms such as Arabidopsis thaliana • Integrate predictions in ANIA and investigate if the candidate sites are lynchpin sites conserved across 2R-ohnologue family members 16

  17. Acknowledgements • Geoff Barton • Chris Cole • All members in the Computational Biology group • Carol MacKintosh and Michele Tinti

More Related