Joseph Jamal Rush III October 31,2009 Bioinformatics Mentor: Mr. Mondal , Claflin University

Evaluating Knowledge-based Scoring Evaluating Knowledge-based Scoring Function for Drug Discovery Case Study: DrugScore Joseph Jamal Rush III October 31,2009 Bioinformatics Mentor: Mr. Mondal, Claflin University

Scoring Function • Scoring Function is the process of evaluating a particular pose by counting the number of favorable intermolecular interactions such as hydrogen bonds and hydrophobic contacts. • They are useful for drug and other types of molecular design.

3 Classes of Scoring Functions • Empirical Scoring Function • Knowledge-based Scoring Function • Force-field Scoring Function

Empirical Scoring Function • Empirical scoring function are directly calibrated with a set of protein-ligand complexes with experimentally determined structures and binding affinities through multivariate regression analysis. • Some empirical scoring functions are Bohm’s scoring function, ChemScore, and X-SCORE.

Force-Field Scoring Function • Force field scoring functions generally quantifies the sum of the interaction energy between the target and the ligand, and the internal energy of the ligand. • Some force-field scoring functions are GoldScore, D-Score, and G-Score.

Knowledge-based Scoring Functions • The principle of the knowledge-based scoring function is based on atom-atom interactions. • Knowledge based scoring function estimates the binding affinities based on statistical observations of intermolecular close contacts in 3D-images. • These functions use the sum of potentials of mean force (PMF) between protein and ligand atoms derived from the Brookhaven Protein Data Bank(PDB) as a measure for protein-ligand binding affinity. • Examples: DrugScore and PMF

Scoring Function Under Test • DrugScore, aKnowledge-based scoring function is used for this project • DrugScore scoring function has two components: the distance-dependent potential and the surface-dependent potential. In our study we only use distance-dependent potential

Equations for DrugScore

Evaluation Methods • The performance of a scoring function is measured by the docking enrichment factor. • The docking enrichment factor reflects the ability of the docking calculations to find true positives throughout the background database compared to random selection.

2 Enrichment Factors • Own Decoy • Means that the enrichment factor uses only the native ligands of the proteins and the corresponding decoys of these ligands. • Entire Database • Means that’s the enrichment factors uses the entire ligands and all the decoys in the database. • In order for each protein to get to “entire database” enrichment factor, it needs to use all the ligands and decoys to calculate.

Calculating the Docking Enrichment Factor

Strategy -Docking Part Change all the protein files rec.pdb to proteinName.pdb Extract individual mol2 files for ligand and decoys. Download proteins, ligands, and decoys files from DUD database Extract the best docking coordinates of ligands and decoys from Autodock log file, *.dlg Use autodock to get the best docking coordinates for ligands and decoys .

Strategy -Scoring Part(Knowledge-Based Scoring Function) Extract atom information parts of proteins Run knowledge-based scoring function program to calculate score Create output files containing scores for each protein-ligand and protein-decoys pair. Sort all the scores of all the compound pairs of each protein, and calculate the enrichment factor for that protein

Tools for Protein-Ligand Docking • Autodock 4, popular free docking software, is used to dock the protein with all their own ligand and decoys in DUD. • The best coordinate generated by Autodock for each of the ligands and decoys docked to their own proteins will be used for calculating scores. • How much time to run the autodock4 depends on the structure of the protein, ligand and decoy, and the equipment of the running machine.

3 proteins used in Experiment • cox1 (Cyclooxygenase 1) • 4492 atoms in the protein file • 25 ligands • 911 decoys • hsp90 (Human shock protein 90 kinase) • 1627 atoms in the protein file • 37 ligands • 979 decoys • pr (Progesterone) • 2004 atoms in the protein file • 27 ligands • 1036 decoys

Scoring Results tag protein ligand score D cox1 ZINC03862207 456696768.00 L cox1 ZINC04617752 1752637952.00

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- protein top% n_total n_sampled hit_total hit_sampled EF_top% -------------------------------------------------------------------------------------------------- cox1 5 936 47 25 3 2.390 cox1 10 936 94 25 6 2.390 cox1 15 936 140 25 6 1.605 cox1 20 936 187 25 6 1.201 cox1 25 936 234 25 6 0.960 cox1 30 936 281 25 6 0.799 cox1 35 936 328 25 6 0.685 cox1 40 936 374 25 6 0.601 cox1 45 936 421 25 6 0.534 cox1 50 936 468 25 9 0.720 cox1 55 936 515 25 9 0.654 cox1 60 936 562 25 9 0.600 cox1 65 936 608 25 9 0.554 cox1 70 936 655 25 9 0.514 cox1 75 936 702 25 9 0.480 cox1 80 936 749 25 12 0.600 cox1 85 936 796 25 13 0.611 cox1 90 936 842 25 16 0.711 cox1 95 936 889 25 19 0.800 cox1 100 936 936 25 25 1.000

Conclusion • Docking algorithms, the design of the scoring function, and the accuracy of the benchmark data are all possible factors to affect the process of evaluation. • Results show that DrugScore does not do good enrichment in case of cox1 and hsp90 but it shows better enrichment in case of pr. So, DrugScore cannot be used universally to find the binding affinity for all proteins. To overcome this problem, we need a universal scoring function, which is our future project.

Future Work • To evaluate how good a scoring function is, we can use more running times of the generic algorithm of the Autodock 4 to check if this can improve the performance of knowledge-based scoring function in the future.

Acknowledgements • Mr. Ananda Mondal • Mrs. Pamela Shuler-HBCU-Undergraduate Program Manager • HBCU-Undergraduate Program, HRD-0713853 • National Science Foundation • Claflin University

Joseph Jamal Rush III October 31,2009 Bioinformatics Mentor: Mr. Mondal , Claflin University