Computational Approaches Used With Industry Provided Repurposing Candidates - Uses in Rare and Neglected Diseases Sean Ekins 1,2 , Christopher Southan 3 , Michael Travers 1 , Antony J. Williams 4 , Joel S. Freundlich 5, 6 , Barry A. Bunin 1 and Alex M. Clark 7
Computational Approaches Used With Industry Provided Repurposing Candidates - Uses in Rare and Neglected Diseases
Sean Ekins1,2 , Christopher Southan3, Michael Travers1, Antony J. Williams4, Joel S. Freundlich5, 6, Barry A. Bunin1 and Alex M. Clark7
1 Collaborative Drug Discovery, 1633 Bayshore Hwy, Suite 342, Burlingame, CA 94010, U.S.A., 2 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay Varina, NC 27526, U.S.A., 3TW2Informatics Limited, Goteborg, 42166, Sweden, 4 Royal Society of Chemistry, 904 Tamaras Circle, Wake Forest, NC 27587, USA. 5 Department of Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA.6 Department of Microbiology, Immunology and Pathology, Colorado State University, 200 West Lake Street, CO 80523, USA.6Molecular Materials Informatics, 1900 St. Jacques #302, Montreal, Quebec, Canada H3J 2S1.
Recent repurposing project tendering calls by the National Center for Advancing Translational Sciences (US) and the Medical Research Council (UK) have included compound information and pharmacological data. However none of the internal company development code names were assigned to chemical structures in the official documentation. We now describe data gathering and curation to assign structures and in silico analysis. We also describe how this data has been shared using Mobile apps and how collaborative software can facilitate target predictions by integrating to public data sources. These efforts suggest potential new uses for molecules that can be tested in vitro.
We are currently seeing a shift towards drug repositioning or repurposing  as a strategy to find new uses for previously approved drugs and “parked” or “off the shelf” molecules which have reached the clinic without any safety signals but did not show efficacy against their intended primary disease target. Both NCATS and MRC have released sets of compounds from drug companies without structures representing 56 and 22 small molecules, respectively . We have attempted to collate these molecular structures then use them with computational machine learning methods and similarity based methods to predict potential bioactivity. This has potential for quick hypothesis testing and triage of ideas, as well as highlighting potential molecules for rare and neglected diseases, in which the resources are limited for testing.
We have identified molecular structures for the majority of MRC and NCATS compounds included in the industry provided repurposing candidate datasets that were not available previously . We have demonstrated how mobile apps  and collaborative software  can be used to share the molecules and data (Box 1) as well as to suggest new uses for the compounds, currently under evaluation. All of these technologies and the datasets we have created are accessible now.
If we are to enable in silico approaches to be used as described here for repurposing candidates, it is important that such efforts in future release the structures to prevent replication of efforts and errors . Our ongoing work will aim to identify whether selected compounds have activity against neglected and rare diseases and accelerate this discovery process. Such computational tools may represent a disruptive strategy involving active collaboration .
 Ekins S, Williams AJ, Krasowski MD and Freundlich JS, In silico repositioning of approved drugs for rare and neglected diseases, Drug Disc Today, 16: 298-310, 2011.
 Southan C, Williams AJ and Ekins S, Challenges and Recommendations for obtaining chemical structures of industry provided repurposing candidates, Drug Disc Today, 18: 58-70, 2013.
 Ekins S, Clark AM and Williams AJ,Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration, Mol Informatics, 31: 585-597, 2012.
 Ekins S, Reynolds RC, Kim H, Koo M-S, Ekonomidis M, Talaue M, Paget SD, Woolhiser LK Lenaerts AJ, Bunin BA, Connell N and Freundlich JS. Novel Bayesian leveraging bioactivity and cytotoxicity information for drug discovery, Chem Biol, In Press 2013.
 Ekins S, Reynolds RC,, Franzblau SG, Wan B, Freundlich JS and Bunin BA. Enhancing hit identification in Mycobacterium tuberculosis drug discovery using validated dual-event Bayesian models, Submitted 2012
 Hohman M, Gregory K, Chibale K, Smith PJ, Ekins S and Bunin B, Novel web-based tools combining chemistry informatics, biology and social networks for drug discovery, Drug Disc Today, 14: 261-270, 2009.
 Williams AJ, Ekins S and Tkachenko V, Towards a Gold Standard: Regarding Quality in Public Domain Chemistry Databases and Approaches to Improving the Situation, Drug Discovery Today,17 (13-14):685-701, 2012.
 Ekins S, Waller CL, Bradley MP, Clark AM and Williams AJ, Four disruptive strategies for removing drug discovery bottlenecks, Drug Discovery Today, In Press 2013.
The CDD API was supported by Award Number 2R42AI088893-02 “Identification of novel therapeutics for tuberculosis combining cheminformatics, diverse databases and logic based pathway analysis” from the National Institutes of Allergy and Infectious Diseases.
Methods and Results
We have previously described in detail the painstaking process to curate the molecules . 41 NCATS compounds and 12 MRC compounds were identified with structures (see box). These molecules were then tweeted from the Mobile Molecular DataSheet and can be readily accessed in the Open Drug Discovery Teams mobile app ( Figure 1) These molecules were scored with three M. tuberculosis Bayesian Models [4,5] and a malaria model developed in Discovery Studio (Accelrys, San Diego CA).
The molecules have also been uploaded in a CDD vault  and used via an API to connect with similar molecules (>0.8 Tanimoto similarity) in ChEMBL. This has enabled us to predict potential targets for the compounds (Figure 2) and provide links to these.
We have also assessed the commercial availability of the molecules from vendors and added the information in CDD. To date we have ordered 3 molecules predicted to have some M. tuberculosis activity and these include a known kinase inhibitor (Figure 3). These will be tested in vitro against potential bacterial targets as described previously [4,5].
Figure 2. How CDD can be used to visualize the MRC and NCATS molecules, and via the API suggest potential targets in ChEMBL with similarity based on the closest molecules associated with targets.
Box 1. Data sources
Figure 1. How the Open Drug Discovery Teams App can be used to visualize molecules, in this case the MRC and NCATS compounds tweeted from the Mobile Molecular DataSheet App.
Figure 2. Using CDD to visualize the MRC and NCATS molecules, the predictions from TB Bayesian models and commercial availability.