1 / 35

Bioinformatics from a drug discovery perspective

Bioinformatics from a drug discovery perspective. EMBRACE Workshop, 22-23 March 2007 Niclas Jareborg AstraZeneca R&D Södertälje. AstraZeneca Drug Discovery. Research Areas CV/GI (Cardiovasc/Gastrointest) , RIRA (Resp/Infl) , CNS/Pain, Cancer, Infection Discovery Sites UK

neo
Download Presentation

Bioinformatics from a drug discovery perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics from a drug discovery perspective EMBRACE Workshop, 22-23 March 2007 Niclas Jareborg AstraZeneca R&D Södertälje

  2. AstraZeneca Drug Discovery • Research Areas • CV/GI (Cardiovasc/Gastrointest), RIRA (Resp/Infl), CNS/Pain, Cancer, Infection • Discovery Sites • UK • Charnwood (RIRA), Alderley Park (Cancer, CV/GI, RIRA) • North America • Boston (Cancer, Infection), Willmington (CNS/Pain), Montreal (CNS/Pain) • Sweden • Lund (RIRA), Mölndal (CVGI), Södertälje (CNS/Pain) • India • Bangalore (Infection) • Bioinformatics • All RAs have their own bioinformatics teams • Infrastructure at Alderley Park (db:s, large Linux clusters) • IS organisation

  3. A target is defined as… • ... a biological target protein on which a chemical entity (e.g. a drug molecule) exerts its action • A drug target must be associated with a disease

  4. Drug discovery process Protein Compound library Target identification Assay Target validation Hit identification (HTS) Hit Genes Hit to lead (Lead identification) Lead optimisation Candidate drug Effort Clinical trials

  5. Target Definition • Alternative Splicing • Identify pharmacologically relevant target variant(s) • Sequence variation • Function • Target • Metabolizing enzyme • Binding of substance • Identify most common variant • Might differ in different populations!

  6. Target Definition • Expression • Is the target expressed in a relevant human tissue? • Databases • Microarrays • Immunhistochemistry • In situ hybridization • Proteomics • Literature

  7. Target Definition • Selectivity • How similar are related proteins? • Do similar proteins have functions that we do not want to affect? • Animal models • Orthologous genes • Same family size? • Splice variants • Same as in human? • Polymorphisms • Differences between inbred strains • Tissue expression • Overlap human? • Available transgenes or knock-outs

  8. Genetics & Bioinformatics Bioinformatics input to the drug discovery process Research Development Commercialisation MS1 MS4 MS2 MS3 MS5 Hit Identification Lead Identification Lead Optimisation CD Pre- nomination Development for Launch Target Identification Sales Registration Launch Primary screening Identify polymorphic and splice variants Support Biomarker identification Support choice of model organism(s) Support target identification flag up population variants in target Selectivity screening Identify paralogues

  9. Splice variants Tissue expression DNA and protein sequence Similarity to other species Genetic mutations In-house generated gene centric information resource

  10. Gene symbol Synonyms Patents Splice variants Literature Pathways Functional motifs Tissue expression DNA and protein sequence Similarity to other species Genetic mutations In-house generated gene centric information resource

  11. Target identification Targets from different experimental approaches as well as validation using different technologies ESTs sequencing campaigns Genetics/genome information Proteomics Literature Differential biology Target Candidates In silico Micro arrays (Affymetrix, glas etc.) Validation (in silico, lab bench) Validation as potential targets Specificity / selectivity

  12. Target identification ~30000 human genes What? Link to disease? Where? Novel? 1 potential target

  13. The human genome offers many potential drug targets

  14. Current Drug Targets - few target classes Based on 483 drugs in Goodman and Gilman's "The Pharmacological basis of therapeutics" Samuel Svensson, PhD AstraZeneca R&D Södertälje

  15. Number of druggable targets smaller than expected? Only a subfraction of gene products play a direct role in disease patophysiology ~30000 human genes Druggable genome ~2-3.000 genes; 500 GPCRs, 50 NHRs, >200 ion channels, >1.000 enzymes (e.g. 450 proteases, 500 kinases, >200 others) pathogens & commensal gut bacteria genes < 5.000 targets for small molecule drugs ~2-3.000 druggable targets

  16. Updating the (shrinking?) “Targetome” Down to 22K ? (see) PMID: 15174140 Some of the 120 InterPro domains are unpromising – many potentials still functional orphans – realistically nearer 2000 ? OMIM still only at 1900 and only low numbers of “robust” genetic association results

  17. Current trends • “Blue sky genomics” -> literature • Finding “unknown” targets -> prioritizing the lists • Moving from single target focus • Comparing and ranking of target candidates • Integration of relevant but disparate data sources • Better understanding of the target “neighbourhood” • Disease mechanism • Biomarkers • Toxicology

  18. Structured Unstructured Mature Technology Emerging Technology Sources of Contextual Information 80% 20% Current approach to retrieving information from unstructured sources is through manual extraction I.e. Finding documents and reading them! • Internal Chemical Dbs • Internal Biological Dbs • External, Commercial Dbs • GVK Bio, Ingenuity IPA… • External Public Dbs • EMBL, PDB, SNPdb, etc • Internal Docs: • Tox Reports, Clinical Trial Reports. • External Docs: • Patents; USPTO, WIPO, EP, etc • Literature; Medline, Embase • Press Releases: • competitor, supplier, collaborator, academic (etc) • Government Agencies • Conference Proceedings • News Feeds

  19. Dissecting the Decision Making Process Finding Extracting Integrating Creating • Locating relevant documents and information • Retrieving them in a useable format • Reading information • Locating the facts within documents • Understanding what it means • Putting the information into context • Turning information into knowledge • Developing new hypotheses • Input into decision making

  20. Issues with the Manual Approach Finding Extracting Integrating Creating • Difficult to capture breadth • Chance to miss things • “White space” in failing to find things • Limited time to read things • Focus on reviews and summaries • Based on individual scientists own knowledge • Narrow • Biased • Hypotheses are “per project” • Reactive not proactive

  21. Text mining • Sources • Literature • Patents • In-house reports • Information • Protein-protein interactions • Tissue expression • Pharmacological differences • Splice variants, Polymorphisms • Species • Toxicology • etc

  22. Emerging Systems:Text Mining • Extraction of facts from unstructured data sources • Natural Language Processing, Ontologies • Linguamatics I2E • Knowledgebase generation

  23. Co-Published Information Gene:Metabolite Semantic Relationships Gene:Disease Semantic Relationships Gene:Gene Semantic Relationships Gene:Chemical/Drug Semantic Relationships Hyperplasia ADP-ribose Neoplasia Increases Activates Synthesizes Thalidomide Associated with Co-published Inhibits Binds Inactivates Binds Co-published Binds Co-published Co-published Co-published Co-published Inc Expression Activates Activates Biomedical Entity-Relationship Data BCL2 PARP TNF CASP9 CASP3 CASP8 MTPN

  24. Pilot Systems:Pathway Analysis: Ingenuity IPA www.ingenuity.com

  25. BER System in Action Evidence Trail Literature Gene Expression ERSystem (Gene/Metabolite Knowledgebase) • Significant Biological • Entity List: • Gene List • Protein List • Metabolite List Biological environment of the list. Hypothesis Generation Proteomic Metabonomic Canonical pathways associated with the list Question: What is the underlying biology, pathology, physiology etc associated with this list of entities? What is it telling me? Genetic Diseases, Biological processes associated with the list

  26. Structuring the KnowledgeDelivers facts as networks of information: Knowledge Bases GI Tox Knowledge Map Species Human Rat Dog Etc. Observed in Clinical Observations Diarrhoea Vomiting Loose Stools Bloating Nausea Etc. Observed in Affects Linked with Compound Genes Is a Linked with Affects Pathology GI toxicity GI pathology Linked with Involved in Affects Involved in Cellular Processes

  27. CIRA TSR CVGI TSR Disease Interface Interface KB Interface Vizualisation Complex Data Query DataMart DataMart DataMart Representation ETL Disease/ Target KB ETL: Biz rules, Direct scoring Project Ontologies Ontologies Queries Automated ETL engines Focused NLP Extraction Extraction Genes Expression Targets Chem Literature Patent CI Data source integration

  28. Workflow technology • Enables scientists to use, modify and implement solutions that specialist groups help them put in place; removes (in principle) the need to make extensive IS projects for new data types.

  29. Systems biology Developing semantic relationships KNOWLEDGE BASES Current focus Fact Extraction (Text Mining) Modelling Document Retrieval and Storage Content Licensing & Access Knowledge Structuring The Knowledge Technology Ziggurat Modelling Create Builds on Information Structuring Integrate Builds on Decision Making Process Fact Extraction (Text Mining) Extract Builds on Document Retrieval and Storage Find Builds on Content Licensing & Access Unstructured Information

  30. Links to endogenous ligands & modulators Patented inhibitors Expression data, gene structure, SNPs & splicing Literature inhibitors and PDB ligands Families of known targets HTS, foussed screens & project SAR data Sequence  alignment  structure  hom. modelling Docking & virtual screening Cross-species (orthology) comparisons Fingerprint structure search Sequences  gene names  disease  literature links Competitor compounds Functional genomics mouse  fish  yeast Library and fragment data Linking non-homologs with analogous mechanisms and binding pockets AZ protein and ligand structures “Bio” and “Chemo” Informatics Joins to Aid Target Selection Sequences Structures Chemistry

  31. What do we need to do ? Clinical Practice Chemistry Biology

  32. Ligand-Protein Association via Experimental & Virtual Methods Term Association via Text Mining Hypothesis Generation Using Informatics/Modelling Proteins Testicular Degeneration Candidate Compound

  33. A multidimensional jigsaw puzzle • Target - Biological mechanisms - Disease • Target/Off-target - Biological mechanisms - Toxicology • Polymorphisms • Splice variants • Interaction partners • Tissues • Compounds • Animal models • etc etc etc…

  34. Current needs • Pathways / Systems biology • Mining of unstructured data • Connect biology and chemistry informatics domains • System / data integration • Ontologies! • Workflow technology

  35. AZ - EBI • AZ member of the Industry programme • Training and Education • Network meetings • Research, Standards

More Related