MANAGING THE WEALTH OF NEW HTS DATA

MANAGING THE WEALTH OF NEW HTS DATA Workstream No. 2 Liverpool 17 March 2010

FORMAT • Introductions • The Challenge • The Question • Participants’ Initial Response • Situation & Target Refinement • Proposal Development • Wrap-up & Next Steps

THE CHALLENGE Ian Bathurst

BY 2010 • 5 million chemical entities screened as antimalarials • Whole cell P falciparum screens • Results to be put into public domain • MMV Initiative • >20K sub micromolar hits • <1% likely to be druggable targets for chemotherapy • How many hits targeting the same processes? • Work to establish stage specificity of hits • Work to establish in vitro therapeutic indices

WHO IS SCREENING?

QUESTION Given the large number of molecular structures that have given positive hits in the HTS screens and which are to be release by the pharmaceutical industry (>20,000), to develop systems to:- • Make the information available to the community in an accessible way; • Filter the structures with robust methods to identify those structures which are druggable and more promising starts for lead optimisation; • Allow the community to know who is working on which structures so that duplication can be avoided and resources not unnecessarily wasted.

PRELIMINARY POINTS FOR DISCUSSION • Assay conditions • Screen format • Detection • Time course • Definitions of the Output • Data format • How much data • What does it mean • How is it stored • How is it accessed

PRELIMINARY POINTS FOR DISCUSSION • Where does the data come from (who has do /doing the screens) and where is it stored? • How widely is it available? • How do you access it?

HOW IS THE DATA MINED AND MANAGED • Who in the community is working on which structures? • How do we avoid duplication • How do we avoid wasting resources

HOW IS THE DATA MINED AND MANAGED • What data is available (formats)? • How can it be manipulated? • How do we share data after a project is started (the stick or the carrot)?

FUTURE ANTIMALARIAL SCREENS • Other stored data (speed of killing, clone) • Other stages • Other species

PARTICIPANTS’ INITIAL RESPONSE TO CHALLENGE

ChemoinformaticsDepartment of Information Studies University of Sheffield • 3 Academic staff: Val Gillet; Peter Willett; John Holliday • 14 Researchers (PDRAs & PhD students) • Areas of expertise • Algorithms • Machine learning (evolutionary algorithms, clustering, substructural analysis, binary kernel discrimination); graph theory); 2D and 3D similarity methods • Applications • Virtual screening, analysis of HTS data, diversity analysis, de novo design, pharmacophore elucidation, protein-ligand docking, combinatorial library design

Example Software & Collaborations • GOLD Protein-ligand docking program • Marketed by Cambridge Crystallographic Data Centre • GASP and more recent GALAHAD programs for pharmacophore elucidation • Marketed by Tripos • Collaborations • GSK, Pfizer, Lilly, Novartis, AstraZeneca, Lhasa Limited, Cambridge Crystallographic Data Centre, Johnson & Johnson, Sanofi-Aventis, Syngenta

In-silico Filters • Remove undesirable compounds • Physiochemical Properties • Simple Lipinski type properties • More sophisticated “drug-like” filters? • Bad-lists of functional groups • ADME/Tox • Use of models for predicting adverse effects • herg, p450s etc

Prioritising HTS hits • Cluster to find areas of chemical space where local hit rate enriched • 2D descriptors, chemotypes, 3D pharmacophore features • Use local regions to develop SAR models using actives and inactives • Bayesian classifiers, recursive partitioning etc • For known targets use protein-ligand docking • Multiple actives against multiple targets • Target identification • Compare hits to compounds known to be active against relevant targets • Similarity, pharmacophores etc • Develop SAR based on known compounds and use to predict targets of unknowns

How to make information available? Pharma likely to be open to freely sharing information on commercially available compounds cf GSK (eg. open access web database) Not likely to openly share non-patented proprietary compounds MMV could continue to act as the “keeper” of all the data and use their consultants to prioritise compounds for follow up, going back to companies for more information if necessary and flagging when another group is working in the same space. (BUT – competition is not always a bad thing!)

How to pick the best molecules? Standard drug discovery approach: Rule of 5/Synthetic doability/compatibility with library synthesis/Structural alerts/Availability of existing analogues in compound file (“free” SAR) Additionally for malaria: Synthetic complexity/cost of goods/Speed of parasite kill/Activity against multiple parasite stages (esp. gametocytes) Consider target based prioritisation approach?

Analysis by target

Analysis by target Dihydrofolate Reductase Map to Plasmodium genome to identify putative targets. Could we share this type of analysis and prioritise on the basis of target? Less commercially sensitive?

Managing the Wealth of new HTS data Frederic Bost SCYNEXIS, Inc.

Choose basic visual models: bar graphs, scatter plots, spider plots, data dashboards Represent Database HTS Data (& all other relevant data) Compare Mine Data analysis tools Filter Remove all but the molecules & data of interest Apply methods from statistics and/or data mining to discern patterns, offer predictions Existence & similarity tools Hit selection/prioritization Retro feed knowledge Hit validation program

Database: Data quality and curation are critical Emphasis on standardization, SOPs, quality control How to motivate researchers to provide high quality data and to follow standards? Data analysis: Build data analysis models for neglected disease targets in an open-source mode Make the models and tools easily available to the community How to motivate researchers to develop these models? Knowledge sharing Set-up a communication platform to share learned knowledge Build a sense of community

V1S

V1S Tm93c1088 D10_yDHOD 6 DHRF, 1 new chemotype 4 bc1, 1 new chemotype

CDD Platform CDD Vault – Secure place for private data – private by default CDD Collaborate – Selectively share subsets of data CDD Public – expanding public data sets - Over 3M compounds, with molecular properties, similarity and substructure searching, data plotting etc published compounds for Malaria, TB etc (over 300K cpds) will host datasets from companies, foundations etc data from community members (ADME/Tox, etc), vendor libraries (Asinex, TimTec, Chembridge) Unique to CDD– simultaneously query your private data, collaborators’ data, & public data, Easy GUI www.collaborativedrug.com CRIMALDDI Meeting 2010

Linking databases • >23 million compounds. A database linking to over 300 data sources and underpinning the semantic web. Linking to patents and publications , chemical suppliers and online resources. A host for crowdsourced data, annotation and curation. tony@chemspider.com www.collaborativedrug.com CRIMALDDI Meeting 2010

Managing the wealth of new HTS data Malaria box of ~6.3 k compounds annotated for IC50 and CC50 Data on HEOS 10 target screens completed (manuscript in preparation)  data could be made available on HEOS Powders available for about 4k compounds Data on currently investigated series available on HEOS Challenge is data transfer form internal to external databases  specifically human resources to manage database curation Abandoned series and deposit data on HEOS as well Publish all data in peer review journals on abandoned series 28 | Presentation Title | Presenter Name | Date | Subject | Business Use Only

Managing the wealth of new HTS dataWS-2 Donatella Taramelli Università di Milano • Define a role (if any) for • Medium-small Academic groups • Research centers • Europe • Developing Countries

Managing the wealth of new HTS dataWS-2 • Define the target • Orally available anti Pf erythrocytic stage? Other stages? Other species? • Define the numbers • Select down to 20-30 molecules? • Define the time frame

Activity Selection on chemical feasibility Drugability Oral bioavailability Metabolic stability HTS In silico modeling Toxicity Selection on putative target

REFINING THE CHALLENGE AND THE QUESTION

DEVELOPING THE PROPOSAL

WRAP UPNEXT STEPS

MANAGING THE WEALTH OF NEW HTS DATA

MANAGING THE WEALTH OF NEW HTS DATA

Presentation Transcript

Managing Data

Managing the New Workplace

Concepts of Wealth The Cycles of Wealth

Managing Data

Analysing the Wealth of Works Management Data - WhgDC

The Destruction of Wealth

The Wealth of Nature

Managing the Data Lifecycle of Big Data Environments

Data storage considerations for HTS platforms

Managing data

Managing the Lifecycle of Hollywood Data

Managing the Costs of New Developments

Managing the Data

The wealth of waste

The Wealth of Nations

Managing Data

Managing Data

The Purpose of Wealth

HTS data file

High Throughput Sequence (HTS) data analysis

the book of wealth

The Evaporation of Wealth