An introduction to open small molecule resources of high utility for systems biologists
Download
1 / 39

An Introduction to Open Small-molecule Resources of High Utility for Systems Biologists - PowerPoint PPT Presentation


  • 237 Views
  • Uploaded on

An Introduction to Open Small-molecule Resources of High Utility for Systems Biologists. Tutorial for the International Conference on Systems Biology Göteborg, August 2008 Christopher Southan, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK. Context.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' An Introduction to Open Small-molecule Resources of High Utility for Systems Biologists' - Gideon


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
An introduction to open small molecule resources of high utility for systems biologists l.jpg
An Introduction to Open Small-molecule Resources of High Utility for Systems Biologists

Tutorial for the International Conference on Systems Biology

Göteborg, August 2008

Christopher Southan, European Bioinformatics Institute,

Wellcome Trust Genome Campus, Cambridge, UK


Context l.jpg
Context Utility for Systems Biologists

  • Medicinal chemistry has a long history of providing a bridge between biology and chemistry by identifying compounds that produce biological effects

  • It is increasingly recognised that bioactive compounds are an essential part of the perturbation toolbox for systems biology

  • Advancing biological knowledge vial a broad spectrum of small molecule investigations can lead to improved understanding not only of systems biology but also disease mechanisms and new opportunities for therapeutic intervention


Systems chemical biology l.jpg
Systems Chemical Biology Utility for Systems Biologists

Oprea et al. Nat Chem Biol. 2007 (8):447-50 PMID: 17637771

“The increasing availability of data related to genes, proteins and their modulation by small molecules has provided a vast amount of biological information leading to the emergence of systems biology and the broad use of simulation tools for data analysis. However, there is a critical need to develop cheminformatics tools that can integrate chemical knowledge with these biological databases and simulation approaches, with the goal of creating systems chemical biology.”


Chemical biology goes back a long way l.jpg
Chemical Biology goes back a long way …. Utility for Systems Biologists




Strophanthidin from 1952 to 2008 now just a click to hinxton l.jpg
Strophanthidin: from 1952 to 2008: Utility for Systems BiologistsNow just a click to Hinxton…


Or bethesda l.jpg
Or Utility for Systems BiologistsBethesda….


The times have also changed for chemical biology l.jpg
The times have also changed for Utility for Systems BiologistsChemical Biology


And the union of chemistry and biology l.jpg
And the Union of Chemistry and Biology Utility for Systems Biologists


November 2004 the seeds of revolution l.jpg
November 2004: The Seeds of Revolution Utility for Systems Biologists


Pubchem and chebi revolutionary consequences l.jpg
PubChem and ChEBI: Utility for Systems BiologistsRevolutionary Consequences

  • Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships


Pubchem and chebi revolutionary consequences13 l.jpg
PubChem and ChEBI: Utility for Systems BiologistsRevolutionary Consequences

  • Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships

  • Ability to search across links between biochemical data, biological effects and chemical structure information


Pubchem and chebi revolutionary consequences14 l.jpg
PubChem and ChEBI: Utility for Systems BiologistsRevolutionary Consequences

  • Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships

  • Ability to search across links between biochemical data, biological effects and chemical structure information

  • Deposition not just of HTS results but a wide range of other types of screening data directly linked to chemical structure information in public repositories


Pubchem and chebi revolutionary consequences15 l.jpg
PubChem and ChEBI: Utility for Systems BiologistsRevolutionary Consequences

  • Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships

  • Ability to search across links between biochemical data, biological effects and chemical structure information

  • Deposition not just of HTS results but a wide range of other types of screening data directly linked to chemical structure information in public repositories

  • Proliferation of cheminformatics tools, databases, nomenclatures, and ontologies in the public domain


Pubchem and chebi revolutionary consequences16 l.jpg
PubChem and ChEBI: Utility for Systems BiologistsRevolutionary Consequences

  • Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships

  • Ability to search across links between biochemical data, biological effects and chemical structure information

  • Deposition not just of HTS results but a wide range of other types of screening data directly linked to chemical structure information in public repositories

  • Proliferation of cheminformatics tools, databases, nomenclatures, and ontologies in the public domain

  • A quantum jump in the global enablement of chemical biology and medicinal chemistry


Post revolution how many compounds are out there l.jpg
Post-Revolution Utility for Systems BiologistsHow Many Compounds are Out There ?

  • Chemical Structure Lookup Service – 36 million, 100 sources

  • ChemSpider 21.5 million 150 sources

  • PubChem - 19,296,269 70 sources

  • SureChem 9 million from US, European and WO patents,

But how many are verified as bioactive ?


Relationships in bioactive chemical space l.jpg
Relationships in Bioactive Chemical Space Utility for Systems Biologists

metabolomes

&

natural products

drugs

chem

genomics &

sys biol probes

assay

data

drug-like cpds

from literature

& patents

Protein

Sequences


Searchable chemical structure designations and representations in databases l.jpg

SD/MOL files Utility for Systems Biologists

IUPAC standard name

Sketched Image

SMILES

InChI codes

InChI strings

Experimental 3D structure

Code names (CID 121880)

Generic, trade and MeSH names

CAS numbers

Database acession numbers e.g. PubChem CID, SID, ChEBI ID, ChemSpider ID

Searchable Chemical Structure Designations and Representations in Databases

All can be exact-match searched, some allow simillarity searching, some also inter-convert


Sd molfile l.jpg
SD/MOLfile Utility for Systems Biologists

The basic MDL chemical table files of atoms, bonds, connectivity and 3D coordinates

benzene

ACD/Labs0812062058

6 6 0 0 0 0 0 0 0 0 1 V2000

1.9050 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

1.9050 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

0.7531 -0.1282 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

0.7531 -2.7882 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

-0.3987 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

-0.3987 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

2 1 1 0 0 0 0

3 1 2 0 0 0 0

4 2 2 0 0 0 0

5 3 1 0 0 0 0

6 4 1 0 0 0 0

6 5 2 0 0 0 0


Experimental 3d structures l.jpg
Experimental 3D Structures Utility for Systems Biologists

Cn3D view of PDB 1I7G  on the left PubChem tesaglitazar=CID 208901 on the right


Smiles simplified molecular input line entry notation for encoding molecular structures l.jpg

Interconverts with 2D sketchers Utility for Systems Biologists

Can then be searched

Human readable

SMILES -simplified molecular input line entry notation for encoding molecular structures


Structure sketchers converters l.jpg
Structure Sketchers/Converters Utility for Systems Biologists


Iupac systematic naming of organic chemical compounds l.jpg
IUPAC Systematic Naming of Organic Chemical Compounds Utility for Systems Biologists

  • International Union of Pure and Applied Chemistry (IUPAC)

  • Should human readable and allow an unambiguous structural formula to be drawn

  • Usable for automated text-to-structure conversion

  • Taxol

    (2aR,4S,4aS,6R,9S,11S,12S,12aR,12bS)-1,2a,3,4,4a,6,9,10,11,12,12a,12b-Dodecahydro-

    4,6,9,11,12,12b-hexahydroxy-4a,8,13,13-tetramethyl-7,11-methano-5H-cyclodeca(3,4)benz(1,2b)oxet-5-one 6,12b-diacetate, 12-benzoate, 9-ester with (2R,3S)-N-benzoyl-3-phenylisoserine


Iupac international chemical identifier inchi textual identifier for chemical substances l.jpg
IUPAC International Chemical Identifier ( Utility for Systems BiologistsInChI) Textual Identifier for Chemical Substances

  • A formalized string conversion of IUPAC names but not human readable

  • Express more information than the simpler SMILES notation and differ in that every structure has a unique InChI string

  • InChI algorithm converts structural information in a three-step process: normalization (to remove redundant information), canonicalization (to generate a unique number label for each atom), and serialization (to give a string of characters) but without explicit 3D information

  • The 25 character InChIKey is a hashed version of the full InChI designed to allow for easy web searches of chemical compounds (e,g, Google)


Cas registry number l.jpg
CAS Registry Number Utility for Systems Biologists

  • Unique numeric identifier Contains up to 10 digits, divided by hyphens into three parts, e.g. 58-08-2 for caffeine (Google it)

  • Has no chemical significance

  • Widely used but not open-access because the source chemical information links to the CAS commercial databases e.g. SciFinder

  • Consequently the consistency of mappings to open identifiers cannot be verified


Pubchem identifiers cids and sids l.jpg
PubChem Identifiers: CIDs and SIDs Utility for Systems Biologists

  • PubChem is the NCBI informatics backbone for the NIH Molecular Libraries Initiative

  • A suite of three databases, PubChem Compound unique structures with computed properties ) PubChem BioAssay ( results supplied by depositors) and PubChem Substance( deposited compound structures)

  • The ten MLI-funded screening centers are run cellular and target-based HTS’s using a compound collection of ~ 250 K and submitting the results to PubChem


Pubchem is now a global hub including bioinformatic dbs with in links l.jpg
PubChem is now a Global Hub Utility for Systems Biologists Including bioinformatic dbs with in-links

MMDB, PDB ligands

55K

ChEBi, enzyme ligands

8K

P

u

b

C

h

e

m

ZINC, ready-to-dock

3.8 mill

KEGG, drugs and metabolites

14K

ChemBank, chemical genomics

0.4 mill

Human Metabolite db

2K

ChemIDplus, NIH tox data

383K

MEROPS protease inhibitors

ChemSpider 20 million

DrugBank, drugs and targets

4K

Drugs of the Future

3.4K

GPCR-Ligand Database

Nature Chemical Biology

0.8 K

LIPID MAPS, metabolism

8.8K


Searchable measures of chemical similarity l.jpg
Searchable Measures of Chemical Similarity Utility for Systems Biologists

  • 1D: measured or computed molecular properties, e.g., molecular weight, number of rings, molecular surface area or volume, pKa, logP etc

  • 3D: map a molecular surface, chemical graphs, spectral descriptors, distribution of electrostatic charge around a molecule

  • 2D fingerprints are by far the most common, based on a bit-string encoding of substructural occurrences


Molecular fingerprints for similarity searching l.jpg
Molecular Fingerprints for Similarity Searching Utility for Systems Biologists

  • Each bit in the fingerprint (or fragment bit-string) represents one molecular fragment. Typical length is ~1000 bits

  • The bit string for a molecule records the presence (“1”) or absence (“0”) of each fragment in the molecule

  • Compare fingerprints of two molecules to identify common bits and hence common substructures (and hence overall structural resemblance)


Tanimoto chemical similarity l.jpg

A Utility for Systems Biologists

B

a

c

b

Tanimoto Chemical Similarity

  • Tally features:

    • Unique (a,b)

    • Both on (c)

    • Both off (d)

  • Similarity Formula

    • Tanimoto=c/(a+b+c)

Beware: Chemical Similarity searches are not standardised between databases


Slide32 l.jpg

PubChem Chemical Searching Utility for Systems Biologists


Bio chem data joins l.jpg
Bio-Chem Data Joins Utility for Systems Biologists


A pharmaceutical portfolio from pubchem l.jpg
A Pharmaceutical Portfolio from PubChem Utility for Systems Biologists


Disambiguation l.jpg
Disambiguation Utility for Systems Biologists

From: Wells et al. Reaching for high-hanging fruit in drug discovery at protein–protein interfaces

1R6N

1Y2F


Osra optical structure recognition l.jpg
OSRA Utility for Systems Biologists: Optical Structure Recognition


Checking chemical patents l.jpg
Checking Chemical Patents Utility for Systems Biologists

  • Taking Nutlin-3 as an example the SMILES entry from PubChem

    CC(C)OC1=C(C=CC(=C1)OC)C2=NC(C(N2C(=O)N3CCNC(=O)C3)C4=CC=C(C=C4)Cl)C5=CC=C(C=C5)Cl

    was pasted into the SureChem search box

  • There are nine exact matches including the granted patent application from Roche shown below


Exploring relationships in entrez l.jpg
Exploring Relationships in Entrez Utility for Systems Biologists

BLAST

Sequence

Similarity

Protein

Sequence

Biological Terms

MeSH indexed

Literature

PubMed

VAST

Structure

Similarity

Protein 3D

Structures

Bioactivity

Assay

Results

2D Chemical

Structure

Similarity

(3D soon)

Small

Molecule

Structures

Protein

Sequences

Activity

Profile

Similarity


Linkage between swiss prot drugbank pubchem mmdb l.jpg
Linkage Utility for Systems Biologists between Swiss-Prot-DrugBank-PubChem-MMDB

(411)

(15728) = 181

(2501)

see these marketed target links


ad