1 / 76

Small M olecules Resources at the EBI

Small M olecules Resources at the EBI. Dr. Louisa Bellis Chemical Content Curator, ChEMBL Grou p EMBL-EBI, UK Bioinformatics Resources for Immunologists 6 th September 2013. Agenda. Introduction Small molecule resources ChEBI ChEMBL Searching and browsing Hands-on Exercises.

golda
Download Presentation

Small M olecules Resources at the EBI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Small Molecules Resources atthe EBI Dr. Louisa Bellis Chemical Content Curator, ChEMBL Group EMBL-EBI, UK Bioinformatics Resources for Immunologists 6th September 2013

  2. Agenda Introduction Small molecule resources ChEBI ChEMBL Searching and browsing Hands-on Exercises

  3. Small Molecules within Bioinformatics Genomes Literature Expressions Nucleotide sequences Protein sequences Protein domains, families Enzymes 3D structures Small molecules Pathways Systems

  4. Annotation of bioinformatics data Essential for capturing understanding and knowledge associated with core data Often captured in free text, which is easier to read and better for conveying understanding to a human audience, but… • Difficult for computers to parse • Quality varies from database to database • Terminology used varies from annotator to annotator • Towards annotation using standard vocabularies: ontologies within bioinformatics

  5. Small Molecule Databases can be used to: • Investigate historical compounds and associated bioactivity data. • Create Structure-Activity Relationships (SARs) • Direct synthesis • Direct end product testing

  6. ChEBI and ChEMBL

  7. What is ChEBI? Chemical Entities of Biological Interest Freely available Focused on ‘small’ chemical entities (no proteins or nucleic acids) Illustrated dictionary of chemical nomenclature High quality, manually annotated Provides chemical ontology ~39,000 ChEBI 3* compounds Access ChEBI at http://www.ebi.ac.uk/chebi/

  8. ChEBI Data Overview Nomenclature Ontology metaboliteCNS stimulanttrimethylxanthines caffeine1,3,7-trimethylxanthine methyltheobromine Chemical data Database Xrefs Formula: C8H10N4O2Charge: 0 Mass: 194.19 MSDchem: CFFKEGG DRUG: D00528 Chemical Informatics Visualisation InChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3 SMILES: CN1C(=O)N(C)c2ncn(C)c2C1=O

  9. Chebi compound page

  10. ChEBI Chemical Structures Chemical structure may be interactively exploredusing MarvinView applet Available in formats Image Molfile InChI and InChIKey SMILES

  11. Automatic Cross-references

  12. The ChEBI ontology Organised into three sub-ontologies, namely Molecular structure ontology Subatomic particle ontology Role ontology (R)-adrenaline

  13. Molecular structure ontology

  14. Role ontology

  15. ChEBI ontology relationships • Generic ontology relationships • Chemistry-specific relationships

  16. Viewing ChEBI ontology

  17. What is ChEMBL? Database of bioactive, drug-like small molecules. Store 2D structures, calculated properties (logP, mol weight, Lipinski etc) Contains abstracted bioactivity data, e.g. binding data and IC50, from multiple primary scientific journals Covers about 33 years of compound synthesis and testing Annotated FDA-approved drugs Access ChEMBL at https://www.ebi.ac.uk/chembldb/

  18. Data Statistics • Focused towards compounds with drug-like properties by extraction from medicinal chemistry journals • Includes small molecules (~92%) and peptides (~7%) • Abstracted from 50,095 papers across 47 journals • 1,487,579 compound records (~450,000 directly from PubChem) • 1,295,510distinct compound structures • 11,420,351activities (>6.0 million directly from PubChem) • binding measurements, functional assays and ADMET • 9,844targets, with over 5,400 protein targets and over 2,440 human targets • Deposition of PubChem Substances and Bioassay assays

  19. ChEMBL Data Overview Compound Target Bioactivity >Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE Compound Assay SAR Data Ki=4.5 nM APTT 11 min

  20. Clinical Trials Phase 1 Phase 2 Phase 3 Launch Target Discovery Lead Discovery Lead Optimisation Preclinical Development • Medicinal Chemistry • Structure-baseddrug design • Selectivity screens • ADMET screens • Cellular/Animaldisease models • Pharmacokinetics • High-throughputScreening (HTS) • Fragment-basedscreening • Focused libraries • Screening collection • Target identification • Microarray profiling • Target validation • Assay development • Biochemistry • Clinical/Animaldisease models • Toxicology • In vivo safety pharmacology • Formulation • Dose prediction Safety & Efficacy Indication Discovery & expansion PK tolerability Efficacy Discovery Development Use Med. Chem. SAR ClinicalCandidates Drugs ChEMBL database ~15,000 candidates ~2,400 drugs > 10,000,000 bioactivities > 1,300,000 compounds ~30,000 distinct lead series

  21. ChEMBL Target Types Molecular Non-molecular Nucleic acid Protein Cell-line Tissue Subcellular-fraction Organism Single Protein Protein Complex Protein Family Muscarinic receptors Nicotinic acetylcholine receptor PDE5 HEK293 cells DNA Nervous Mitochondria Drosophila

  22. Chembl compound page

  23. Clickable structure Structural Representations Drug Information

  24. ChEMBL --> ChEBI Link:

  25. ChemSpider Links: The link works both ways. They link TO ChemSpider and FROM ChemSpider. They link on Standard InChI

  26. Wikipedia Links: We also have links with Wikipedia. These also use the Standard_Inchi as the common identifier. These links will link to the Compound Report Card in ChEMBL.

  27. Searching and Browsing

  28. Chemical names • Commonor trivial names are those that are highly used. • Advantages of common names include simplicity, easy to pronounce, universally recognised • The main disadvantage is ambiguity – the same common name may refer to more than one type of chemical. • Fluorene • Fluorine

  29. Systematic names • A systematic name is one which corresponds to the chemical structure such that the structure can be determined from the name, e.g. 1,2-dimethyl-naphthalene • Software packages exist which can generate structures from the systematic names (e.g. ACD/Name, ChemOffice, MarvinSketch). • More than one correct systematic name can be assigned to the same molecular structure, depending on the manner in which naming rules are applied (e.g. IUPAC names).

  30. Examples of common and systematic names Common names Systematic names caffeineguaraninetheine 1,3,7-trimethyl-3,7-dihydro-1H-purine-2,6-dione 7-methyltheophylline 1,3,7-trimethyl-2,6-dioxopurine

  31. The ChEBI web service Programmatic access to a ChEBI entry SOAP based Java implementation Clients currently available in Java and perl Methods include: getLiteEntity getCompleteEntity and getCompleteEntityByList getOntologyParents getOntologyChildren and getAllOntologyChildrenInPath getStructureSearch Documented at http://www.ebi.ac.uk/chebi/webServices.do.

  32. Web services • Allow users to create their own applications to query data User application

  33. The ChEBI web service • Programmatic access to a ChEBI entry • SOAP based Java implementation • Clients currently available in Java and perl • Methods • getLiteEntity • getCompleteEntity and getCompleteEntityByList • getOntologyParents • getOntologyChildren and getAllOntologyChildrenInPath • getStructureSearch • Documented at http://www.ebi.ac.uk/chebi/webServices.do.

  34. Web service client object model getLiteEntity getCompleteEntity getOntology (Parents and Children)

  35. ChEMBL Web Services • Programmatic access to the ChEMBL database • Provide Java, Perl and Python scripts to help you get started with the ChEMBL RESTful Web Service API • Can be used to bring back compounds, lists of compounds, images, targets and assays • https://www.ebi.ac.uk/chembldb/index.php/ws

  36. Examples of Web Services

  37. Interface searching

  38. ChEBI simple and advanced text search Narrow to category AND, OR and BUT NOT

  39. Structure drawing tools Search options

  40. Search Results Hover-over for a larger structure Click to go to entry page

  41. Types of structure search Identity – based on InChI Substructure – uses fingerprints to narrow search range, then performs full substructure search algorithm Similarity – based on Tanimoto coefficient calculated between the fingerprints InChI=1/H2O/h1H2 0010110010 1010110111 Tanimoto(a,b) = c / (a+b-c) = 4 / (4+7-4) = 0.57 a 0010110010 b 1010110111

  42. Browse via Periodic Table Molecular entities / Elements

More Related