Enhancement of hit rate in HTS by using fragment-based virtual screening techniques

Enhancement of hit rate in HTS by using fragment-based virtual screening techniques Peter Ertl Novartis Pharma AG, Information and Knowledge Management Basel, Switzerland peter.ertl@pharma.novartis.com

Virtual Screening • Molecules available for screening • 1 - 2 millions in in-house archives of large pharma and agrochemical companies • 3 - 4 millions of samples available commercially • We need to find optimal balance between spending resources for buying, synthesising and screening molecules, and maximal coverage in drug discovery programs. • Virtual screening • discard junk • identify molecules with good chances to be active

Approaches to Virtual Screening

Fragment-based Virtual Screening Based on the assumption that some fragments in molecule increase the probability of bioactivity, while others have negative effect on activity. In a training cycle an analysis of fragment distribution between active and inactive molecules is made and based on this an “activity contribution” is assigned to each fragment. Activity of new / virtual molecule may be estimated by summing up contributions of all fragments in the molecule.

Fragment-based Virtual Screening Active Molecules Inactive Molecules Possible Fragment Types linear fragments substituents (Rgroups) atoms with environment (HOSE codes) rings scaffolds List of fragments List of fragments • Statistical analysis of fragment frequencies in both sets and calculation of fragment “activity contributions”. • 2 test of statistical significance

Fragment-based Virtual Screening List of fragments (several thousands) with calculated “activity contributions”. 3.45 3.34 3.12 3.02 ... -2.96 -3.43 Bioactivity of new / virtual molecule may be estimated as a sum of activity contributions of all fragments in this molecule - naive Bayes approach. Method is fast, only topological information (SMILES) required. ...

Virtual Screening of the NCI anti-HIV Data NCI AIDS antiviral screen 41050 confirmed inactive molecules 421 confirmed active molecules - 1.02 % Crossvalidation procedure - data divided randomly into two halves - training and test sets. training set used to derive the model identification of active fragments and calculation of fragment activity contributions test set used to test the model comparison of predicted activities with the real screening data Source of data dtp.nci.nih.gov/docs/aids/aids_data.html

Virtual Screening of the NCI anti-HIV Data Training - generation of the model 20,525 inactive molecules  39,267 fragments 210 active molecules  1,375 fragments Fragment analysis provided 1503 significant fragments with scores ranging from 3.64 to -3.65 . The scores have been used to predict activities for molecules from the test set.

Distribution of Fragments Activity distribution of 1503 significant fragments identified.

Example of Active Fragments Example of “activity enhancing” fragments (augmented atoms)

Activity Enhancement Performance of fragment-based screening compared with random selection of molecules

Real Activity Enhancement Fragment-based screening - 50% of the data used for training

Performance with Different Training Sets Performance of fragment-based screening when trained on 1/10, 1/3, and 1/2 of the data

Sequential Screening By iterative screening one starts by selecting a random set of molecules and “learns” (improves the model) during the iterative screening process. During the process we may add also some molecules “randomly” to prevent falling into the local minima Selection and screening of a set of random molecules Creation of a model Screening of selected molecules, adding them to the pool Selection of further molecules based on the model

Performance Comparison Comparison of performance of train-test virtual screening and sequential screening

Virtual Screening - Pharma Data A Nearly ideal behaviour at the initial phase of screening, 20-fold enhancement in comparison with random screen.

Virtual Screening - Pharma Data B Virtual screening based on very limited initial information. Model development based on : active set of 23 known ligands from the literature reference set of 100,000 average organic molecules

Conclusions • enrichment up to 20 times in comparison with random screening • high throughput (~106 molecules / day) • ideal technique to pre-screen pool of available compounds between applying computationally more demanding tools • method is complementary to the virtual docking (both methods use different type of information) • minimal requirements on the initial information • results may be used in combichem design (active building blocks) and in the prioritisation of compounds for synthesis

Enhancement of hit rate in HTS by using fragment-based virtual screening techniques