An Introduction to Open Small-molecule Resources of High Utility for Systems Biologists - PowerPoint PPT Presentation

An introduction to open small molecule resources of high utility for systems biologists l.jpg
Download
1 / 39

An Introduction to Open Small-molecule Resources of High Utility for Systems Biologists. Tutorial for the International Conference on Systems Biology Göteborg, August 2008 Christopher Southan, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK. Context.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

An Introduction to Open Small-molecule Resources of High Utility for Systems Biologists

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


An introduction to open small molecule resources of high utility for systems biologists l.jpg

An Introduction to Open Small-molecule Resources of High Utility for Systems Biologists

Tutorial for the International Conference on Systems Biology

Göteborg, August 2008

Christopher Southan, European Bioinformatics Institute,

Wellcome Trust Genome Campus, Cambridge, UK


Context l.jpg

Context

  • Medicinal chemistry has a long history of providing a bridge between biology and chemistry by identifying compounds that produce biological effects

  • It is increasingly recognised that bioactive compounds are an essential part of the perturbation toolbox for systems biology

  • Advancing biological knowledge vial a broad spectrum of small molecule investigations can lead to improved understanding not only of systems biology but also disease mechanisms and new opportunities for therapeutic intervention


Systems chemical biology l.jpg

Systems Chemical Biology

Oprea et al. Nat Chem Biol. 2007 (8):447-50 PMID: 17637771

“The increasing availability of data related to genes, proteins and their modulation by small molecules has provided a vast amount of biological information leading to the emergence of systems biology and the broad use of simulation tools for data analysis. However, there is a critical need to develop cheminformatics tools that can integrate chemical knowledge with these biological databases and simulation approaches, with the goal of creating systems chemical biology.”


Chemical biology goes back a long way l.jpg

Chemical Biology goes back a long way ….


So does bioactive compound structure representation l.jpg

So does Bioactive Compound Structure Representation…..


But times have changed for chemical information l.jpg

But .... Times Have Changed for Chemical Information


Strophanthidin from 1952 to 2008 now just a click to hinxton l.jpg

Strophanthidin: from 1952 to 2008: Now just a click to Hinxton…


Or bethesda l.jpg

Or Bethesda….


The times have also changed for chemical biology l.jpg

The times have also changed for Chemical Biology


And the union of chemistry and biology l.jpg

And the Union of Chemistry and Biology


November 2004 the seeds of revolution l.jpg

November 2004: The Seeds of Revolution


Pubchem and chebi revolutionary consequences l.jpg

PubChem and ChEBI: Revolutionary Consequences

  • Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships


Pubchem and chebi revolutionary consequences13 l.jpg

PubChem and ChEBI: Revolutionary Consequences

  • Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships

  • Ability to search across links between biochemical data, biological effects and chemical structure information


Pubchem and chebi revolutionary consequences14 l.jpg

PubChem and ChEBI: Revolutionary Consequences

  • Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships

  • Ability to search across links between biochemical data, biological effects and chemical structure information

  • Deposition not just of HTS results but a wide range of other types of screening data directly linked to chemical structure information in public repositories


Pubchem and chebi revolutionary consequences15 l.jpg

PubChem and ChEBI: Revolutionary Consequences

  • Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships

  • Ability to search across links between biochemical data, biological effects and chemical structure information

  • Deposition not just of HTS results but a wide range of other types of screening data directly linked to chemical structure information in public repositories

  • Proliferation of cheminformatics tools, databases, nomenclatures, and ontologies in the public domain


Pubchem and chebi revolutionary consequences16 l.jpg

PubChem and ChEBI: Revolutionary Consequences

  • Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships

  • Ability to search across links between biochemical data, biological effects and chemical structure information

  • Deposition not just of HTS results but a wide range of other types of screening data directly linked to chemical structure information in public repositories

  • Proliferation of cheminformatics tools, databases, nomenclatures, and ontologies in the public domain

  • A quantum jump in the global enablement of chemical biology and medicinal chemistry


Post revolution how many compounds are out there l.jpg

Post-Revolution How Many Compounds are Out There ?

  • Chemical Structure Lookup Service – 36 million, 100 sources

  • ChemSpider 21.5 million 150 sources

  • PubChem - 19,296,269 70 sources

  • SureChem 9 million from US, European and WO patents,

But how many are verified as bioactive ?


Relationships in bioactive chemical space l.jpg

Relationships in Bioactive Chemical Space

metabolomes

&

natural products

drugs

chem

genomics &

sys biol probes

assay

data

drug-like cpds

from literature

& patents

Protein

Sequences


Searchable chemical structure designations and representations in databases l.jpg

SD/MOL files

IUPAC standard name

Sketched Image

SMILES

InChI codes

InChI strings

Experimental 3D structure

Code names (CID 121880)

Generic, trade and MeSH names

CAS numbers

Database acession numbers e.g. PubChem CID, SID, ChEBI ID, ChemSpider ID

Searchable Chemical Structure Designations and Representations in Databases

All can be exact-match searched, some allow simillarity searching, some also inter-convert


Sd molfile l.jpg

SD/MOLfile

The basic MDL chemical table files of atoms, bonds, connectivity and 3D coordinates

benzene

ACD/Labs0812062058

6 6 0 0 0 0 0 0 0 0 1 V2000

1.9050 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

1.9050 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

0.7531 -0.1282 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

0.7531 -2.7882 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

-0.3987 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

-0.3987 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

2 1 1 0 0 0 0

3 1 2 0 0 0 0

4 2 2 0 0 0 0

5 3 1 0 0 0 0

6 4 1 0 0 0 0

6 5 2 0 0 0 0


Experimental 3d structures l.jpg

Experimental 3D Structures

Cn3D view of PDB 1I7G  on the left PubChem tesaglitazar=CID 208901 on the right


Smiles simplified molecular input line entry notation for encoding molecular structures l.jpg

Interconverts with 2D sketchers

Can then be searched

Human readable

SMILES -simplified molecular input line entry notation for encoding molecular structures


Structure sketchers converters l.jpg

Structure Sketchers/Converters


Iupac systematic naming of organic chemical compounds l.jpg

IUPAC Systematic Naming of Organic Chemical Compounds

  • International Union of Pure and Applied Chemistry (IUPAC)

  • Should human readable and allow an unambiguous structural formula to be drawn

  • Usable for automated text-to-structure conversion

  • Taxol

    (2aR,4S,4aS,6R,9S,11S,12S,12aR,12bS)-1,2a,3,4,4a,6,9,10,11,12,12a,12b-Dodecahydro-

    4,6,9,11,12,12b-hexahydroxy-4a,8,13,13-tetramethyl-7,11-methano-5H-cyclodeca(3,4)benz(1,2b)oxet-5-one 6,12b-diacetate, 12-benzoate, 9-ester with (2R,3S)-N-benzoyl-3-phenylisoserine


Iupac international chemical identifier inchi textual identifier for chemical substances l.jpg

IUPAC International Chemical Identifier (InChI) Textual Identifier for Chemical Substances

  • A formalized string conversion of IUPAC names but not human readable

  • Express more information than the simpler SMILES notation and differ in that every structure has a unique InChI string

  • InChI algorithm converts structural information in a three-step process: normalization (to remove redundant information), canonicalization (to generate a unique number label for each atom), and serialization (to give a string of characters) but without explicit 3D information

  • The 25 character InChIKey is a hashed version of the full InChI designed to allow for easy web searches of chemical compounds (e,g, Google)


Cas registry number l.jpg

CAS Registry Number

  • Unique numeric identifier Contains up to 10 digits, divided by hyphens into three parts, e.g. 58-08-2 for caffeine (Google it)

  • Has no chemical significance

  • Widely used but not open-access because the source chemical information links to the CAS commercial databases e.g. SciFinder

  • Consequently the consistency of mappings to open identifiers cannot be verified


Pubchem identifiers cids and sids l.jpg

PubChem Identifiers: CIDs and SIDs

  • PubChem is the NCBI informatics backbone for the NIH Molecular Libraries Initiative

  • A suite of three databases, PubChem Compound unique structures with computed properties ) PubChem BioAssay ( results supplied by depositors) and PubChem Substance( deposited compound structures)

  • The ten MLI-funded screening centers are run cellular and target-based HTS’s using a compound collection of ~ 250 K and submitting the results to PubChem


Pubchem is now a global hub including bioinformatic dbs with in links l.jpg

PubChem is now a Global Hub Including bioinformatic dbs with in-links

MMDB, PDB ligands

55K

ChEBi, enzyme ligands

8K

P

u

b

C

h

e

m

ZINC, ready-to-dock

3.8 mill

KEGG, drugs and metabolites

14K

ChemBank, chemical genomics

0.4 mill

Human Metabolite db

2K

ChemIDplus, NIH tox data

383K

MEROPS protease inhibitors

ChemSpider 20 million

DrugBank, drugs and targets

4K

Drugs of the Future

3.4K

GPCR-Ligand Database

Nature Chemical Biology

0.8 K

LIPID MAPS, metabolism

8.8K


Searchable measures of chemical similarity l.jpg

Searchable Measures of Chemical Similarity

  • 1D: measured or computed molecular properties, e.g., molecular weight, number of rings, molecular surface area or volume, pKa, logP etc

  • 3D: map a molecular surface, chemical graphs, spectral descriptors, distribution of electrostatic charge around a molecule

  • 2D fingerprints are by far the most common, based on a bit-string encoding of substructural occurrences


Molecular fingerprints for similarity searching l.jpg

Molecular Fingerprints for Similarity Searching

  • Each bit in the fingerprint (or fragment bit-string) represents one molecular fragment. Typical length is ~1000 bits

  • The bit string for a molecule records the presence (“1”) or absence (“0”) of each fragment in the molecule

  • Compare fingerprints of two molecules to identify common bits and hence common substructures (and hence overall structural resemblance)


Tanimoto chemical similarity l.jpg

A

B

a

c

b

Tanimoto Chemical Similarity

  • Tally features:

    • Unique (a,b)

    • Both on (c)

    • Both off (d)

  • Similarity Formula

    • Tanimoto=c/(a+b+c)

Beware: Chemical Similarity searches are not standardised between databases


Slide32 l.jpg

PubChem Chemical Searching


Bio chem data joins l.jpg

Bio-Chem Data Joins


A pharmaceutical portfolio from pubchem l.jpg

A Pharmaceutical Portfolio from PubChem


Disambiguation l.jpg

Disambiguation

From: Wells et al. Reaching for high-hanging fruit in drug discovery at protein–protein interfaces

1R6N

1Y2F


Osra optical structure recognition l.jpg

OSRA: Optical Structure Recognition


Checking chemical patents l.jpg

Checking Chemical Patents

  • Taking Nutlin-3 as an example the SMILES entry from PubChem

    CC(C)OC1=C(C=CC(=C1)OC)C2=NC(C(N2C(=O)N3CCNC(=O)C3)C4=CC=C(C=C4)Cl)C5=CC=C(C=C5)Cl

    was pasted into the SureChem search box

  • There are nine exact matches including the granted patent application from Roche shown below


Exploring relationships in entrez l.jpg

Exploring Relationships in Entrez

BLAST

Sequence

Similarity

Protein

Sequence

Biological Terms

MeSH indexed

Literature

PubMed

VAST

Structure

Similarity

Protein 3D

Structures

Bioactivity

Assay

Results

2D Chemical

Structure

Similarity

(3D soon)

Small

Molecule

Structures

Protein

Sequences

Activity

Profile

Similarity


Linkage between swiss prot drugbank pubchem mmdb l.jpg

Linkage between Swiss-Prot-DrugBank-PubChem-MMDB

(411)

(15728) = 181

(2501)

see these marketed target links


  • Login