1 / 25

Ontology and Annotation of PubChem Bioassays

Ontology and Annotation of PubChem Bioassays. Uma Vempati. Outline. Background of drug discovery High throughput screening (HTS) PubChem database: lack of organization Motivation to create BioAssay Ontology PubChem annotations: preliminary results Annotations: applications.

len
Download Presentation

Ontology and Annotation of PubChem Bioassays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology and Annotation of PubChem Bioassays Uma Vempati

  2. Outline • Background of drug discovery • High throughput screening (HTS) • PubChem database: lack of organization • Motivation to create BioAssay Ontology • PubChem annotations: preliminary results • Annotations: applications

  3. What is a drug? • Drug: small chemical compound (<500 Da) • Oral bio-available (easy to administer): ADME • Must be efficacious (improvement of a disease over placebo or • existing drug) • Interaction with one or a group of protein targets involved in • the disease (inhibition or activation to remedy disease-causing process) • Must be safe (minimal side effects, toxicity, drug-drug • interactions etc.) • Specific interaction with ONLY the intended target(s) and • no other proteins • Fenfluramine, pergolide,cabergoline and Vioxx are examples of drugs with serious side effects!

  4. Challenge to find new drugs: • Consider chemical space • Known chemical space: 50 x106 • Total chemical space: 1020-1060 • Biologically relevant chemical space • 29 providers with 30% redundancy! • ~1,500 approved drugs • Relationship between the continuum of chemical space (light blue) and the discrete areas of chemical space that are occupied by compounds with specific affinity for biological molecules. Jean-Yves Ortholand, SBS conference, 2010; Lipinski and Hopkins, 2004

  5. Drug discovery process Preclinical • Target identification /validation (biological pathways) • Identification of hits • High-throughput screening (HTS): 1-2 million compounds (cmpds) • Development of lead series • HTS follow-up and hit expansion (commercial cmpds and synthetic • chemistry) • Lead optimization: 250 cmpds • Medicinal chemistry • Cell-based screens (efficacy towards the target, selectivity) • Drug absorption, metabolism (solubility, clogP, microsome stability, • P450, protein binding, hERG inhibition, various tox assays, etc) • Animal studies (pharmacology, toxicity)

  6. Drug discovery process Clinical • Phase I: 10 compounds on few healthy volunteers (safety and efficacy) • Phase II: On few patients to check for intended effect • Phase III: wide-scale tests on thousands of patients in carefully • controlled clinical testing • Average cost of making a new drug: $800 million to $2 billion • Average time to make a new drug: 12-15 years • Outcome: 1 out of 5 tested in phase III trials becomes a drug • Drugs represent 10% of healthcare costs!! Neal Masia (Pfizer), 2008

  7. Drugs to treat rare diseases or ones prevalent in low-income countries Malaria infects 300-500 million people in the world/year >1 million of them die due to resistance to the available drug, Chloroquine How many new drugs were made? Only 4 out of 1,400 new medicines developed worldwide between 1975 and 1999 were antimalarials. Richard Wilder and P. V. Venugopal, 2008

  8. NIH Roadmap: screening in academic institutions • Molecular Libraries Screening Centers Network (MLSCN): Launched in 2005. Goal: “to expand the availability and use of chemical probes to explore the function of genes, cells, and pathways in health and disease, and to provide annotated information on the biological activities of compounds contained in the central Molecular Libraries Small Molecule Repository in a public database (PubChem).” • MLPCN • Comprehensive Centers: Broad, Burnham, NCGC, and Scripps • Specialized Screening Centers: Johns Hopkins, Southern Research Institute, and UNM • Specialized Chemistry Centers: Kansas and Vanderbilt • : Largest repository of small molecule screening data. Went public in 2004. Has 115 contributing organizations; ~70 million substances.

  9. Public Small Molecule BioActivityDatabases • PubChem: http://pubchem.ncbi.nlm.nih.gov/ • ChEMBL: http://www.ebi.ac.uk/chembl/ • KinaseSARfari: http://www.sarfari.org/ • PDSP: http://pdsp.med.unc.edu/indexR.html • Binding DB: http://www.bindingdb.org/bind/index.jsp • DrugBank: http://www.drugbank.ca/

  10. June 05 2010 • NCBI PubChem Apr 05 2010

  11. HTS produces huge amounts of data • Often data from an HTS campaign is only used once • Costs up to $1 million • Need to get more value out of HTS • Countless HTS campaigns in pharma and now academia • “Drowning in data yet starving for knowledge” • “Information overload”: mainly a lack of structure and organization • Development of ontologies for knowledge representation • Biocuration of these HTS assays • Development of novel software tools

  12. Simple Queries that cannot be run on PubChem In what types of assays are my compounds active? Identify inhibitors of kinases in biochemical assays. Identify compounds active in multiple luciferase reporter gene assays. Identify compounds active in cell viability assays and organize by cell lines and assay types. List all assays that target GPCRs. Identify likely artifacts in ATP-coupled luciferase kinase activity assays. Identify active compounds in assays related to pathway X.

  13. No common terminology in PubChem

  14. No uniform PubChem endpoints • Each assay has two standardized endpoint names: outcome and score • All other data points are stored in an arbitrary / flexible format • Problem: many assays use the same endpoints with different names, making comparisons difficult: • As of 12/2009, there are more than 12,000 unique endpoint names in PubChem, many of which are equivalent

  15. BAO Goals (宝 = precious) Initial Aims: • Develop BioAssay Ontology (BAO) • Annotate PubChem data • Software development (BAO Search, BAO Annotator) • Enable variety of queries in a simple manner Longer-term Objective: • Map / integrate other ontologies • Enable “reasoning” across different domains • Generate new knowledge using computational agents • Integrate with clinical data

  16. What do we need to capture from an assay? BioAssay Perturbagen Purpose e.g. Single concentration, Concentration response, Profiling e.g. Compound e.g. RNAi Meta Target Analysis Endpoint e.g. Normalization method e.g. Assay quality 3 5 1 2 P 4 e.g. Activity at 10mM e.g. IC50 Technology e.g. Viability, reporter gene e.g. Readout e.g. Standard kit Format e.g. Cell type e.g. Reagents Measure Group For multiplexed or HCS assays!

  17. BAO Model of a BioAssay

  18. Virus 1% PubChem annotations: Assay system used Organism Based 10% Biochemical 43% Cell Based 46 % Formats of 1924 assays

  19. Prokaryote 4% PubChem annotations: Assay system used Cell Based Eukaryote 96% Pathogenic organisms 11% Sub-cellular organelles 3 % Other 1% Other 6 % Biochemical Organism Based Yeast 42 % Purified protein 96 % Mouse 41%

  20. Annotations of luciferase assays Total luciferase assays in PubChem : 316 (out of 2299) Luciferase caytalyzes the oxidation of luciferin to oxyluciferin with the release of light (energy efficient) Luciferase assays are used for four major purposes:

  21. Luciferase Assay Technologies

  22. Luciferase Assay Kits

  23. BAO: under construction • The design/ method to determine the action of the perturbagen. • Binding reporter • Energy transfer, scintillation, luminescent proximity • Enzyme reporter • Luciferin-coupled, ATP-coupled • Protein conformation reporter • Viability reporter • ATP, Caspase, NADH • Redistribution reporter • GFP, Calcium, cAMP • Inducible reporter • Luc, Bla, LacZ, GFP • Further specifications: detection technique, apparatus, signal activity direction

  24. Cloudy with a chance of remedy? remedy

  25. Stephan Schürer • Vance Lemmon • UbboVisser • MitsunoriOgihara • Robin Smith • DusicaVidovic • Kunie Sakurai • YuanyuanJia • Chris Mader • FelimonGayalino • Nakul Datar • Caty Chung • AmarKoleti • SamindaAbeyruwan • Nick Tsinoremas

More Related