1 / 19

Hongjian Li Department of Computer Science and Engineering Chinese University of Hong Kong

Hongjian Li Department of Computer Science and Engineering Chinese University of Hong Kong 17 January 2013 hjli@cse.cuhk.edu.hk http://www.cse.cuhk.edu.hk/~hjli. Protein-Ligand Complex, e.g. 1HCL. Intermolecular interactions Van der Waals force Hydrogen bonds π ‒ π interactions etc.

kimi
Download Presentation

Hongjian Li Department of Computer Science and Engineering Chinese University of Hong Kong

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hongjian Li Department of Computer Science and Engineering Chinese University of Hong Kong 17 January 2013hjli@cse.cuhk.edu.hk http://www.cse.cuhk.edu.hk/~hjli

  2. Protein-Ligand Complex, e.g. 1HCL • Intermolecular interactions • Van der Waals force • Hydrogen bonds • π ‒ π interactions • etc. • Binding affinity • Kd, Ki, IC50 • The lower, the better • ‒log10K • The higher, the better

  3. PDBbind v2012 • A: experimentally determined structures as of 2012 • B: protein-ligand, DNA/RNA-ligand, protein-DNA/RNA, prot-prot complexes • C: Complexes with Kd, Ki, or IC50 • 7,121 protein-ligand complexes • D: Protein-ligand complexes with Kd or Ki • E: 90% BLAST, 67 non-redundant clusters • With highest, lowest, medium affinity

  4. Docking’s Two Purposes • Redocking, i.e. pose identification • RMSD < 2Å • 78% success rate • Scoring • Hard to recover Kd, Ki, IC50 • R ϵ [0.2, 0.5] • R = 0.531 for Vina, R = 0.528 for idock Ref. Pred. Pred.

  5. Existing Scoring Functions • Assume a predetermined theory-inspired functional form, e.g. Vina/idock

  6. Previous Work • Non-parametric machine learning • Implicitly capture binding effects that are hard to model explicitly • Deng et al., 2004 • Distance-dependent interaction frequencies • Kernel Partial Least Squares • Small external test sets, 6 or 10 compounds • Aminiet al., 2007 • Support vector regression (SVR) • Family-specific scoring functions • 26 to 72 complexes, cross validation

  7. RF-Score • The first application of Random Forests (RFs) to predicting protein–ligand binding affinity • 9 common atom types for protein and ligand • Occurrences for a particular j–iatom type pair • dcutoff=12, Z(C)=6,Z(N)=7,Z(O)=8,Z(P)=15,Z(S)=16

  8. Visualization of Feature Vector

  9. RF-Score • RF trains binary trees using the CART algorithm • RF grows tree without pruning from a bootstrap sample of the training data • RF selects the best split at each node of the tree from a typically small number (mtry) of randomly chosen features. mtryϵ {2,…,36} • RF stops splitting a node with <=5 samples • Prediction from an individual tree is the arithmetic mean of its samples in a leaf node • P = 500 trees, RF prediction is arithmetic mean

  10. RF-Score • Out-of-bag (OOB) data as internal validation • Possible mtryvalues cover all the feature subset sizes, i.e. {2,…,36} • PDBbind v2007 refined set – core set, N = 1105

  11. Prediction Accuracy • R = 0.953, RMSE = 0.74 • ROOB = 0.699 • RMSEOOB = 1.52 • R = 0.776, RMSE = 1.58 • Close to OOB

  12. y-Scrambling Validation • To eliminate chance correlation • Random permutation of y-data • Over 10 independent trials on the test set • R = −0.018 with standard deviation SR = 0.095 • RMSE = 2.42 with SRMSE= 0.04 • Conclusion: chance correlation is negligible

  13. Data Size Matters • RMSEOOB gets closer to RMSE as Ntrain increases

  14. Variable Importance • x-Scrambling • %IncMSE > 20 • X6,6 • Hydrophobic interactions • X7,6, X8,6, X16,6, X6,8 • Polar–non-polar contacts • X8,8, X7,8, X7,7, X8,7 • Hydrogen bonds • Z(C)=6,Z(N)=7,Z(O)=8,Z(P)=15,Z(S)=16

  15. Less Coverage of Atom Types • Only the 3 most common atom types {C,N,O} • 36 features  9 features • Conclusion • No large performance decrease • Test cases with {F,P,S,Cl,Br,I} contribute the diff.

  16. Comparison w/ the State of the Art • Test set: PDBbind v2007 core set, N = 195 • Same training and test sets for top 3 scores

  17. Conclusion • RF-Score via non-parametric machine learning • Circumvent the need for modelling assumptions • High correlation on a diverse test set • Drawback • Interpretability of features in terms of interactions • Future work • Distance-dependent features • Atom’s hybridization state & bonding environment

  18. Non-Standard AAs & Metal Ions Ref. Vina idock

  19. Q & A

More Related