Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine

Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive Cancer Center, University of California Presentation by Susan Tang CS 379a January 23, 2006

Protein-Ligand Docking Overview Goal - To predict how well a given set of ligands will bind to a protein structure - To predict the structure of bound protein-ligand complexes Components - Search method: explore different ways that ligand can interact/fit with protein - Scoring function: assign a quantitative value to each ligand/protein fit

Protein-Ligand Docking Overview Criteria 1) Docking accuracy Measures ability to find a conformation + alignment (pose) of a protein-ligand that is close to reality 2) Scoring accuracy Ability to rank a correct pose of a molecule higher than an incorrect one 3) Screening utility Ability to identify only true ligands in a set that contains false positives 4) Speed How fast the algorithm can screen a library of ligands

Surflex: A new docking methodology • Combines Hammerhead’s empirical scoring function with a molecular similarity method to generate putative poses of ligand fragments • Like Hammerhead, Surflex has 1 mode that uses an incremental construction search approach. But Surflex also has another mode: a whole molecule approach that is faster/more accurate • Surflex is designed primarily as a screening tool for small molecule libraries

Surflex: Computational Design • Protomol Generation First create an ideal active site ligand from the protein structure of interest Input: (a) protein structure (b) list of residues to identify protein active site Output: A protomol, or target to which potential ligands or ligand fragments are aligned based on molecular similarity Procedure: Molecular fragments are put into the protein binding site in multiple positions  optimized for interaction with protein  select high-scoring nonredundant fragments  protomol formation

Surflex: Computational Design • Protomol for streptavidin compared with the native pose of biotin (green) • The bond being pointed to is broken by Surflex to make fragments of biotin for docking.

Surflex: Computational Design • Docking Ligands are docked into the protein to optimize scoring function Input: (a) protein structure, (b) protomol, (c) ligand(s) Output: The optimized poses of docked ligands along with corresponding scores Procedure: Divide input ligand into 1-10 molecular fragments  search each fragment in terms of conformation  each conformation of each fragment is aligned to protomol to get poses with maximum molecular similarity to protomol  score aligned fragments and keep those with highest score and minimal protein interpenetration  construct full ligand molecule from the aligned fragments using either an incremental construction approach or whole molecule approach  highest scoring poses undergo further refinement of conformation and alignment

Surflex: Computational Design Incremental Construction vs. Whole Molecule Algorithm Incremental Construction - Makes strong assumption that maximizing the similarity of tiny fragments to the protomol will generate good poses Whole Molecule Algorithm - bypasses the strong independence assumption made in incremental construction - “dead” pieces are carried with the “live” piece during conformation search - when creating putative poses to protomol, the “dead” pieces in their arbitrary initial conformation are carried into the molecular similarity computation  eliminate those with worst protein interpenetration - for remaining poses, score on basis of individual fragments - recursive search yields whole molecules that consist of fragments selected from different docked poses - these whole molecules score well in total, over all fragments

Surflex: Computational Design • Illustrates the process of docking biotin to streptavidin (blue) • Gray indicates the “live” fragment • Magenta indicates the “dead” fragment • Green lines show the result of merging the two well-docked fragments at the atoms indicated by yellow circles • The merged pose closely follows the parent fragments’ original configurations

Surflex: Evaluation • Evaluation of reliability and accuracy of dockings - Comparison with experimental results on 81 protein/ligand pairs - The pairs were selected to represent structural diversity • Evaluation of Surflex’s utility as a screening tool • Performed on 2 protein targets (thymidine kinase and estrogen receptor) • Competing docking methods were tested side by side using the same data set for comparison purposes (GOLD, Dock, FlexX) • Evaluation of the Surflex’s docking speed - Investigate relationship between docking time and # of rotatable bonds

Surflex: EvaluationData Set Construction 134 protein-ligand Complexes* 81 protein-ligand complexes filter Filtering Criteria: • 15 or fewer rotatable bonds  Most small molecules have <= 15 rotable bonds • no covalent attachments between ligand and protein  Since Surflex’s scoring function was developed strictly on noncovalent complexes • ligands with no obvious errors in structure  Undesirable to modify an existing protein-ligand complex prior to testing * data set used for GOLD docking program

Surflex: EvaluationResults 1)Evaluation of reliability and accuracy of dockings Describes how thorough the search procedure is and to what extent scoring function can recognize good dockings • Surflex returned a pose within 2.5 angstroms rmsd (94 % of cases) • Surflex returned a BEST scoring pose that was within 2.5 angstroms (86 % of cases) • With a single docking from a random initial pose, chances of finding a correct or nearly correct pose is averaged to be ~70 %

Surflex: EvaluationResults

Surflex: EvaluationResults 2) Evaluation of Surflex’s utility as a screening tool Tests ability of program to detect true positives against a background of random molecules (sensitivity vs. specificity) • Surflex had a True Positive rate of > 80% at a False Positive rate of < 1 % • Surflex had the best performance (lowest FP rate for a given TP rate) out of the different individual and combined methods assayed

Surflex: EvaluationResults 3) Evaluation of the Surflex’s docking speed Docking speed becomes very important in screening large compound libraries. • Surflex demonstrated a docking time that was approx. linear in number of rotatable bonds • Rigid molecules took a few seconds and each additional rotatable bond took an additional ~10 seconds • Surflex yielded a mean running time of 44 seconds for the 81 protein-ligands in the test set used earlier • Docking speed ranges from 50-100 seconds per molecule for FlexX, DOCK, and GOLD (Surflex speed is comparable to these times) • Quantitative comparison across methods is difficult due to differences in hardware and methodology

Surflex: EvaluationResults

Conclusions • Surflex marks a step forward in flexible molecular docking programs • Compared to the best docking methods available, Surflex is: • as fast • as accurate in terms of docked ligand RMSD • much more accurate in terms of scoring • Assaying the top scoring 1% of compounds in the screening library should yield a large proportion of true positives • Potential areas of improvement - scoring and penetration terms should be combined into a single score - scoring function should include training on non-binding ligands (negative examples) - effect of nonbonded self-interactions within ligands should be accounted for explicitly - allow a degree of protein flexibility (side chain movement)

Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine