1 / 20

Derrick DeConti - PAINS train

A machine learning approach to identifying false positives in chemical drug screens.

decontid
Download Presentation

Derrick DeConti - PAINS train

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PAINS Train PAINS Train Identifying false positives in drug screens Identifying false positives in drug screens Derrick DeConti Derrick DeConti Insight Health Data Science Insight Health Data Science

  2. Drug Screens Target of Interest

  3. Non-specific (promiscuous) Specific Target D Target D Target A Target A Target B Target B Target C Target C

  4. Pan Assay Interference Compounds (PAINS)

  5. Current Practice ●Poor sensitivity ●Experience – Learned heuristics ●Poor precision ●Structural similarity – Hard filters ●High FDR

  6. Machine Learning Approach ●Based on chemical structure ●Creates own substructure classification ●Provide a likelihood of promiscuity

  7. Demo

  8. Labeled Data PAINS Non-PAINS – Baell 20101 – ChEMBL – Empirically derived 1. Baell, J. B., & Holloway, G. a. (2010). New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. Journal of Medicinal Chemistry, 53(7), 2719–2740. doi: 10.1021/jm901137j

  9. Transformation of Data with RDKit [ 1 0 0 ..., 0 1 0] 2,048 bit vector

  10. Clustering of PAINS

  11. Clustering of PAINS Overlap in cluster-based classification

  12. Predictive Classification Random Forest ●Binary format of data ●Interdependence within vector Testing ●5-fold validation ●20% left out for validation set

  13. Secondary Chemical Set ●Neglected tropical disease compound set ●Composed of two structural classes ●Drug-like molecules

  14. Validation ●Spike in PAINS – Into secondary chemical set ●Test previously trained random forest – Versus cluster based method

  15. Comparison of Methods

  16. FDA Drug Promiscuity ●1,055 FDA approved drugs – 38 classified as promiscuous ●86% cluster closely

  17. About Me Bioinformatics Genomics

  18. Random Forest Optimization

  19. Secondary Chemical Set

  20. K-means clustering within PAINS

More Related