1 / 40

Classification of Drugs by SVM

CZ3253: Computer Aided Drug design Lecture 7: Drug Design Methods II: SVM Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, National University of Singapore. Classification of Drugs by SVM.

licia
Download Presentation

Classification of Drugs by SVM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CZ3253: Computer Aided Drug designLecture 7: Drug Design Methods II: SVM Prof. Chen Yu ZongTel: 6874-6877Email: csccyz@nus.edu.sghttp://xin.cz3.nus.edu.sgRoom 07-24, level 7, SOC1, National University of Singapore

  2. Classification of Drugs by SVM • A drug is classified as either belong (+) or not belong (-) to a class Examples of drug class: inhibitor of a protein, BBB penetrating, genotoxic Examples of protein class: enzyme EC3.4 family, DNA-binding • By screening against all classes, the property of a drug or the function of a protein can be identified Class-1 SVM - Drug Class-2 SVM - Class-3 SVM + Drug belongs to Family-3 - -

  3. Classification of Drugs or Proteins by SVM What is SVM? • Support vector machines, a machine learning method, learning by examples, statistical learning, classify objects into one of the two classes. Advantages of SVM: • Diversity of class members (no racial discrimination). • Use of structure-derived physico-chemical features as basis for drug classification (no structure-similarity required in the algorithm).

  4. SVM References • C. Burges, "A tutorial on support vector machines for pattern recognition", Data Mining and Knowledge Discovery, Kluwer Academic Publishers,1998 (on-line). • R. Duda, P. Hart, and D. Stork, Pattern Classification, John-Wiley, 2nd edition, 2001 (section 5.11, hard-copy). • S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Pres, 2001 (sections 3.6.2, 3.7.2, hard copy). • Online lecture notes (http://www.cs.unr.edu/~bebis/MathMethods/SVM/lecture.pdf ) • Publications of SVM drug prediction: • J. Chem. Inf. Comput. Sci. 44,1630 (2004) • J. Chem. Inf. Comput. Sci. 44, 1497 (2004) • Toxicol. Sci. 79,170 (2004).

  5. Descriptor Positive examples Negative examples Machine Learning Method Inductive learning: Example-based learning

  6. Feature vectors: Positive examples Descriptor Negative examples Feature vector Machine Learning Method A=(1, 1, 1) B=(0, 1, 1) C=(1, 1, 1) D=(0, 1, 1) E=(0, 0, 0) F=(1, 0, 1)

  7. Z Input space F B A E Y X SVM Method Feature vectors in input space: Feature vector A=(1, 1, 1) B=(0, 1, 1) C=(1, 1, 1) D=(0, 1, 1) E=(0, 0, 0) F=(1, 0, 1)

  8. Protein family members Border New border Protein family members Nonmembers Nonmembers Project to a higher dimensional space SVM Method

  9. New border Support vector Support vector Protein family members Nonmembers SVM method

  10. Support vector Protein family members Nonmembers New border Support vector SVM Method

  11. Best Linear Separator?

  12. Best Linear Separator?

  13. Find Closest Points in Convex Hulls d c

  14. Plane Bisect Closest Points d c

  15. Find using quadratic program Many existing and new solvers.

  16. Best Linear Separator:Supporting Plane Method Maximize distance Between two parallel supporting planes Distance = “Margin” =

  17. Best Linear Separator?

  18. SVM Method Border line is nonlinear

  19. Non-linear transformation: use of kernel function SVM method

  20. SVM method Non-linear transformation

  21. SVM Method

  22. SVM Method

  23. SVM Method

  24. SVM Method

  25. SVM for Classification of Drugs How to represent a drug? • Each structure represented by specific feature vector assembled from structural, physico-chemical properties: • Simple molecular properties (molecular weight, no. of rotatable bonds etc. 18 in total) • Molecular Connectivity and shape (28 in total) • Electro-topological state polarity (84 in total) • Quantum chemical properties (electric charge, polaritability etc. 13 in total) • Geometrical properties (molecular size vector, van der Waals volume, molecular surface etc. 16 in total) J. Chem. Inf. Comput. Sci. 44,1630 (2004) J. Chem. Inf. Comput. Sci. 44, 1497 (2004) Toxicol. Sci. 79,170 (2004).

  26. SVM Feature SelectionCACO2 - 718 descriptorsAverage of 10 Models Q2 is MSE scaled by variance: = (mean square error) / (true variance) Test Q2 = .7073

  27. Feature Selection Using subset of descriptors might greatly improve results. • Do feature selection using Linear SVM with 1-norm regularization 2-norm 1-norm

  28. Feature Selection via Sparse SVM/LP • Construct linear -SVM using 1-norm LP: • Pick best C, for SVM • Keep descriptors with nonzero coefficients

  29. Partition Training Data Training Set Validation Set Linear SVM Algorithm For Feature Selection A Linear Regression Model Repeat B times Bag B Models and Obtain Subset of Features Bagged Feature Selection Random Variable - r

  30. Bagged SVM (RBF)CACO2 - 31 Descriptors Test Q2 = .134

  31. SlogP.VSA0 ABSDRN6 DRNB10 DRNB00 PIPB04 PEOE.VSA.FHYD PEOE.VSA.FNEG a.don KB11 PEOE.VSA.4 BNPB31 PEOE.VSA.FPOL PEOE.VSA.PPOS FUKB14 KB54 SlogP.VSA6 PIPMAX EP2 PEOE.VSA.FPPOS SMR.VSA2 ANGLEB45 apol BNPB50 SlogP.VSA9 pmiZ BNP8 PIPB53 ABSFUKMIN BNPB21 ABSKMIN SIKIA Starplot Caco2 - 31 Descriptors

  32. Feature Selection Visualize Features Assess Chemistry Chemistry In/Out Modeling Data +Descriptors Test Data Chemistry Interpretation SVM Model Construct SVM Nonlinear model Predict bioactivities

  33. Bagged SVM (RBF)CACO2 - 15 Descriptors Test Q2 = .166

  34. a.don CACO2 – 15 Variables DRNB10 PEOE.VSA.FNEG BNPB31 KB54 ABSDRN6 ABSKMIN FUKB14 SMR.VSA2 PEOE.VSA.FPPOS SIKIA SlogP.VSA0 ANGLEB45 DRNB00 pmiZ

  35. Chemical Insights • Hydrophobicity  - a.don • SIZE and Shape ABSDRN6, SMR.VSA2,  ANGLEB45, PmiZ  Large is bad. Flat is bad. Globular is good. • Polarity – PEOE.VSA.FPPOS, PEOE.VSA.FNEG: negative partial charge good. Correspond to conventional wisdom – rule of 5.

  36. Hybrid TAE/SHAPE • Shape important overall factor • DRNB10, DRNB00: del rho dot N • BNP31: bare nuclear potential • KB54: kinetic energy descriptors very large lipophilic molecules don’t work • FUKB14: Fukui Surface • Interpretations difficult • Point to chemistry challenges/hypotheses

  37. Final SVM Approach • Construct large set of descriptors. • Perform feature selection: • Sensitivity Analysis or SVM-LP • Construct many SVM models • Optimize using QP or LP • Evaluate by Validation Set or Leave-one-out • Select best models by grid or pattern search • Bag best k models to create final function

  38. Drug Discovery Results (LOO)

  39. Useful for inhibitor/activator/substrate prediction, drug safety and pharmacokinetic prediction. Drug SVM-based drug design and property prediction software Chemical Structure Chemical Structure Your drug structure Option 2 Option 1 http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi Which class your drug belongs to? Send structure to classifier Input structure through internet Support vector machines classifier for every Drug class Computer loaded with SVMProt Drug designed or property predicted Identified classes Input structure on local machine J. Chem. Inf. Comput. Sci. 44,1630 (2004) J. Chem. Inf. Comput. Sci. 44, 1497 (2004) Toxicol. Sci. 79,170 (2004).

  40. Protein inhibitor/activator/substrate prediction: • 86% of the 129 estrogen receptor activators and 84% of 101 non-activators correctly predicted. • 81% of 116 P-glycoprotein substrates and 79% of 85 non-substrates correctly predicted • Drug Toxicity Prediction: • 97% of 102 TdP+ and 84% of 243 TdP- agents correctly predicted • 73% of 229 genotoxic and 93% of 631 non-genotoxic agents correctly predicted • Pharmacokinetics prediction: • 95% of 276 BBB+ and 82% of 139 BBB- agents correctly predicted • 90% of 131 human intestine absorption and 80% of 65 non-absoption agents correctly predicted. • J. Chem. Inf. Comput. Sci. 44,1630 (2004) • J. Chem. Inf. Comput. Sci. 44, 1497 (2004) • Toxicol. Sci. 79,170 (2004). SVM Drug Prediction Results

More Related