Bioinformatics: Impact on Health and Drug Development

Bioinformatics: Impact on Health and Drug Development Symposium 6: Ballroom B 7th International ISSX Meeting Vancouver, BC Aug. 31, 2004

Bioinformatics: Impact on Health & Drug Development • 7:40 am – Bioinformatics in Drug Discovery and Development – D.S. Wishart • 8:20 am – PharmGKB: The Pharmacogenetics and Pharmacogenomics Knowledge Base – R. Altman • 9:00 am – Bioinformatics and Visual Genomics: Seeing Genes, Proteins and Metabolism – C. Sensen

Bioinformatics: Impact on Health & Drug Development • 9:40 am – Coffee Break • 10:20 am – Automated Docking and MD Simulations of Substrate Binding in Cytochrome P450 – N. Vermeulen • 11:00 am – Metabolic Profiling Using an LC/MS & NMR Based Approach – J. Shockcor • 11:40 am – Posters and Refreshments

Bioinformatics in Drug Discovery and Development David Wishart, University of Alberta 7th International ISSX Meeting Vancouver, BC Aug. 29-Sept. 2, 2004

The Pyramid of Life Metabolomics Proteomics Genomics 1400 Chemicals B I O I N F O R M A T I C S 10,000 Proteins 30,000 Genes

Drug Discovery & Development $80 $40 $50 $200 $50 million 3.5 yrs 1 yr 2 yrs 3 yrs 2.5 yrs Discovery Phase I Phase II Phase III FDA Approval Drug Development Pipeline Chemistry Genomics Proteomics Metabolomics B I O I N F O R M A T I C S

Bioinformatics (or Computational Biology) • Not just the study of DNA or protein sequence data • Inclusive definition – concerns the storage, display, reduction, management, analysis, extraction, simulation, modelling, fitting or prediction of biological, medical or pharmaceutical data

Key Informatics Challenges in Drug Development • Using genomic, proteomic, metabolomic & structural data to ID drug targets or drug leads • Using genomic, metabolomic and structural data to predict drug metabolism, xenobiotic toxicity and characterize adverse drug reactions

Drugs from Genomes Gene Therapies Protein Drugs Drug Targets

Two Types of Diseases • Diseases that arise from in-born sequence errors in germ cells or spontaneous (or age-related) mutations in somatic cells • Diseases that arise from an infectious vector (virus, bacterium or parasite) that has its origins outside Endogenous Disease Exogenous Disease

Endogenous Diseases • Select cohort with disease or condition • Isolate gene region showing distinct features • Sequence whole region of interest • Compare to Human UniGene Map • ID location of common mutations • Predict function & cell location of gene prdct • Predict/Determine structure of gene product • Design antagonists, agonists or replacement

Exogenous Diseases • Sequence pathogen or pathogens • Identify critical genes • metabolic enzymes • toxins or pseudo-toxins • targeting receptors or coat proteins • Select unique (low homology) genes • Use prior knowledge to ID lead compds • Develop vaccine candidates

Bioinformatics… • Both exogenous and endogenous diseases require methods for rapid and comprehensive genomic, proteomic and metabolomic annotation • Identifying drug targets or drug candidates requires linking metabolomic or chemical compound data with sequence and pathway data

Genome Annotation - Magpie C. Sensen

Metabolomes (KEGG) • Number of pathways 17,263 • Number of organisms 213 • Number of genes 754,236 • Number of compounds 11,165 • Number of glycans 10,895 • Number of chemical reactions 6,140 http://www.genome.jp/kegg/kegg1.html

Therapeutic Target DB (C.Y. Zong) http://xin.cz3.nus.edu.sg/group/cjttd/TTD_ns.asp

Database Integration KEGG Magpie DrugBank TTD

The DrugBank Home Page http://redpoll.pharmacy.ualberta.ca

DrugBank • A freely accessible, web-enabled, fully queryable database that links drug structure/activity data with protein structure/function/sequence data • Contains nomenclature, synthesis, structure, activity, chemistry info on FDA drugs • Contains nomenclature, structure, sequence, pharmacology, drug metabolism info on corresponding biomolecule targets • Extensive querying & search tools

DrugBank Browser http://redpoll.pharmacy.ualberta.ca

DrugBank DrugCard

DrugBank DrugCard • Common names, alternate names, brand names, IUPAC names, CAS #, mixtures, source, manufacturer, MSDS link, PIN, DIN • Structure, formula, solubility, toxicity, state, LogP, melting/boiling point, synthesis, 3D structure, SMILES, MOL-file, PDB file, NMR & MS spectra, l max • Drug class, indication, pharmacology, mechanism, drug target, prescription information, metabolites & metabolism, metabolism SNPs • Target sequence, GenBank link, target structure (2o, 3o or model), PDB file, target MW, target #AA, cellular location, chromosome, chromosome position, SNPs

DrugBank Querying • Sorting (by MW, indication, category) • Text query (boolean query, AND, OR, NOT, *) using GLIMPSE • Sequence query (BLAST search) • Structure query (draw structure, search for similar structures) • Relational data extraction (columns of numbers or text for graphing)

DrugBank Applications • Newly sequenced proteomes can be analyzed automatically for similarities to existing drug targets, giving researchers quick lead ideas • Newly determined protein structures can be “Autodocked” to a large database of known, well-behaved compounds to suggest lead ideas

DrugBank Applications • Newly synthesized or identified lead compounds can be compared to existing structures to assess/predict possible efficacy, cross reactivity, metabolism or physical properties • Existing drugs can be compared or analyzed for key trends, properties or features to help in drug design synthesis efforts

Key Informatics Challenges in Drug Development • Using genomic, metabolomic & structural data to ID drug targets or drug leads • Using genomic, metabolomic & structural data to predict or characterize drug metabolism, xenobiotic toxicity and adverse drug reactions

Predicting Drug Metabolism Through CyP450 Docking N. Vermeulen

Predicting Gene-Drug Interactions via Curated Community Knowledge R. Altman

Seeking Gene-Drug Relations through PolySearch http://redpoll.pharmacy.ualberta.ca

PolySearch • Supports PubMed text searching for gene, drug & disease associations (user provides disease/gene/drug name) • Automatically scores & ID’s genes and searches for known SNPs or mutations against std. SNP databases • Grabs gene sequences and generates primers around SNPs • Archives (MySQL database) or sends results as HTML page to user

PolySearch • Searches over 14 million PubMed records, >3400 diseases (and synonyms), 14,000 human genes (43,000 synonyms), >1000 compounds or drugs (>3000 compound synonyms) • Assesses quality using SCI list of impact factors for 8600+ journals • Example of growing use of text mining in bioinformatics

Characterizing ADR & Drug Metabolism via Spectroscopy • Not all ADRs can be predicted in vitro or in silico • Identifying drug metabolites and characterizing metabolic changes in blood or urine requires advanced computational/bioinformatics methods • Represents an emerging application of bioinformatics & computational biology

Metabonomics Efficacy Primary Molecules Filtration Toxicity Secondary Molecules Dilution Concentration Resorption Chemical Fingerprint

Characterizing ADR & Drug Metabolism via Spectroscopy Sample Injection

25 PC2 20 15 ANIT 10 5 0 -5 Control -10 -15 PAP -20 PC1 -25 -30 -20 -10 0 10 PAP ANIT Control Classifying ADR via PCA J. Shockcor

Chemical Shift Chromatography Mixture separation by HPLC (followed by ID via Mass Spec) Mixture separation by NMR (simultaneous separation & ID) Chemical Shift Chromatography

Mixture Compound A Compound B Compound C Spectral Fitting (Principles) Constrained Least Squares Fitting

NMR Analysis of Urine Chenomx Inc. – Eclipse 2.0

(+)-(-)-Methylsuccinic Acid 2,5-Dihydroxyphenylacetic Acid 2-hydroxy-3-methylbutyric acid 2-Oxoglutaric acid 3-Hydroxy-3-methylglutaric acid 3-Indoxyl Sulfate 5-Hydroxyindole-3-acetic Acid Acetamide Acetic Acid Acetoacetic Acid Acetone Acetyl-L-carnitine Alpha-Glucose Alpha-ketoisocaproic acid Benzoic Acid Betaine Beta-Lactose Citric Acid Creatine Creatinine D(-)Fructose D-(+)-Glyceric Acid D(+)-Xylose Dimethylamine DL-B-Aminoisobutyric Acid Current Compound List • L-Isoleucine • L-Lactic Acid • L-Lysine • L-Methionine • L-phenylalanine • L-Serine • L-Threonine • L-Valine • Malonic Acid • Methylamine • Mono-methylmalonate • N,N-dimethylglycine • N-Butyric Acid • Pimelic Acid • Propionic Acid • Pyruvic Acid • Salicylic acid • Sarcosine • Succinic Acid • Sucrose • Taurine • trans-4-hydroxy-L-Proline • Trimethylamine • Trimethylamine-N-Oxide • Urea • DL-Carnitine • DL-Citrulline • DL-Malic Acid • Ethanol • Formic Acid • Fumaric Acid • Gamma-Amino-N-Butyric Acid • Gamma-Hydroxybutyric Acid • Gentisic Acid • Glutaric acid • Glycerol • Glycine • Glycolic Acid • Hippuric acid • Homovanillic acid • Hypoxanthine • Imidazole • Inositol • isovaleric acid • L(-) Fucose • L-alanine • L-asparagine • L-aspartic acid • L-Histidine • L-homocitrulline

Acetic Acid Betaine Carnitine Citric Acid Creatinine Dimethylglycine Dimethylamine Hippulric Acid Lactic Acid Succinic Acid Trimethylamine Trimethlyamin-N-Oxide Urea Lactose Suberic Acid Sebacic Acid Homovanillic Acid Threonine Alanine Glycine Glucose Metabolic Microarray Normal Below Normal Above Normal Absent Patient 1 Patient 2 Patient 3 Patient 4 Patient 5 Patient 6 Patient 7 Patient 8 Patient 9 Patient 10 Patient 11 Patient 12 Patient 13 Patient 14 Patient 15

The Human Metabolome Project • $7.2 million Genome Canada project starting Sept. 1, 2004 (10 PI’s in analytical & clinical chemistry & bioinformatics) • Expect to ID and archive >1400 metabolites and metabolite ranges using NMR, MS, HPLC & informatics • Establishment of the Human Metabolome Databank (HMD)

The HMD • Web-accessible, freely available & continuously updated compilation of base-line metabolites in urine and plasma • Similar content to DrugBank, including pathway prediction and metabolic modeling • Compound ordering

Conclusions • Bioinformatics is being used to integrate genomic, metabolomic & structural data to help ID drug targets or drug leads • Bioinformatics combines genomic, metabolomic & structural data to help predict or characterize drug metabolism, xenobiotic toxicity and adverse drug reactions

Conclusions • Unlike genomics/proteomics data, most drug, drug metabolism, ADR and ADME data is still in books or journals – not in electronic form • This limits development of tools, databases and predictive software • As more data is made electronic, look to increased use of simulation and modelling software to predict ADME, ADR and toxicology

The Future… • Greater integration • More freeware and greater web-accessibility • Greater use of text mining and machine learning methods • Focus on predictions Meta- bolomics B I O I N F O R M A T I C S Proteomics Genomics

Acknowledgements • Anchi Guo (PDF) • Murtaza Hassanali (student) • Nelson Young (RA/Programmer) • Haiyan Zhang (Programmer/Analyst) • Bahram Habibi-Nazhad (PDF) • Jennifer Woolsey (student) • Chenomx Inc. (Edmonton) • Genome Canada, NSERC

Bioinformatics: Impact on Health and Drug Development