an introduction to open small molecule resources of high utility for systems biologists
Download
Skip this Video
Download Presentation
An Introduction to Open Small-molecule Resources of High Utility for Systems Biologists

Loading in 2 Seconds...

play fullscreen
1 / 39

An Introduction to Open Small-molecule Resources of High Utility for Systems Biologists - PowerPoint PPT Presentation


  • 237 Views
  • Uploaded on

An Introduction to Open Small-molecule Resources of High Utility for Systems Biologists. Tutorial for the International Conference on Systems Biology Göteborg, August 2008 Christopher Southan, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK. Context.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'An Introduction to Open Small-molecule Resources of High Utility for Systems Biologists' - Gideon


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
an introduction to open small molecule resources of high utility for systems biologists
An Introduction to Open Small-molecule Resources of High Utility for Systems Biologists

Tutorial for the International Conference on Systems Biology

Göteborg, August 2008

Christopher Southan, European Bioinformatics Institute,

Wellcome Trust Genome Campus, Cambridge, UK

context
Context
  • Medicinal chemistry has a long history of providing a bridge between biology and chemistry by identifying compounds that produce biological effects
  • It is increasingly recognised that bioactive compounds are an essential part of the perturbation toolbox for systems biology
  • Advancing biological knowledge vial a broad spectrum of small molecule investigations can lead to improved understanding not only of systems biology but also disease mechanisms and new opportunities for therapeutic intervention
systems chemical biology
Systems Chemical Biology

Oprea et al. Nat Chem Biol. 2007 (8):447-50 PMID: 17637771

“The increasing availability of data related to genes, proteins and their modulation by small molecules has provided a vast amount of biological information leading to the emergence of systems biology and the broad use of simulation tools for data analysis. However, there is a critical need to develop cheminformatics tools that can integrate chemical knowledge with these biological databases and simulation approaches, with the goal of creating systems chemical biology.”

pubchem and chebi revolutionary consequences
PubChem and ChEBI: Revolutionary Consequences
  • Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships
pubchem and chebi revolutionary consequences13
PubChem and ChEBI: Revolutionary Consequences
  • Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships
  • Ability to search across links between biochemical data, biological effects and chemical structure information
pubchem and chebi revolutionary consequences14
PubChem and ChEBI: Revolutionary Consequences
  • Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships
  • Ability to search across links between biochemical data, biological effects and chemical structure information
  • Deposition not just of HTS results but a wide range of other types of screening data directly linked to chemical structure information in public repositories
pubchem and chebi revolutionary consequences15
PubChem and ChEBI: Revolutionary Consequences
  • Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships
  • Ability to search across links between biochemical data, biological effects and chemical structure information
  • Deposition not just of HTS results but a wide range of other types of screening data directly linked to chemical structure information in public repositories
  • Proliferation of cheminformatics tools, databases, nomenclatures, and ontologies in the public domain
pubchem and chebi revolutionary consequences16
PubChem and ChEBI: Revolutionary Consequences
  • Arrival of the ”missing entity” of formal and linked chemical structure representation within the global web of bioinformatic relationships
  • Ability to search across links between biochemical data, biological effects and chemical structure information
  • Deposition not just of HTS results but a wide range of other types of screening data directly linked to chemical structure information in public repositories
  • Proliferation of cheminformatics tools, databases, nomenclatures, and ontologies in the public domain
  • A quantum jump in the global enablement of chemical biology and medicinal chemistry
post revolution how many compounds are out there
Post-Revolution How Many Compounds are Out There ?
  • Chemical Structure Lookup Service – 36 million, 100 sources
  • ChemSpider 21.5 million 150 sources
  • PubChem - 19,296,269 70 sources
  • SureChem 9 million from US, European and WO patents,

But how many are verified as bioactive ?

relationships in bioactive chemical space
Relationships in Bioactive Chemical Space

metabolomes

&

natural products

drugs

chem

genomics &

sys biol probes

assay

data

drug-like cpds

from literature

& patents

Protein

Sequences

searchable chemical structure designations and representations in databases
SD/MOL files

IUPAC standard name

Sketched Image

SMILES

InChI codes

InChI strings

Experimental 3D structure

Code names (CID 121880)

Generic, trade and MeSH names

CAS numbers

Database acession numbers e.g. PubChem CID, SID, ChEBI ID, ChemSpider ID

Searchable Chemical Structure Designations and Representations in Databases

All can be exact-match searched, some allow simillarity searching, some also inter-convert

sd molfile
SD/MOLfile

The basic MDL chemical table files of atoms, bonds, connectivity and 3D coordinates

benzene

ACD/Labs0812062058

6 6 0 0 0 0 0 0 0 0 1 V2000

1.9050 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

1.9050 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

0.7531 -0.1282 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

0.7531 -2.7882 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

-0.3987 -0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

-0.3987 -2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0

2 1 1 0 0 0 0

3 1 2 0 0 0 0

4 2 2 0 0 0 0

5 3 1 0 0 0 0

6 4 1 0 0 0 0

6 5 2 0 0 0 0

experimental 3d structures
Experimental 3D Structures

Cn3D view of PDB 1I7G  on the left PubChem tesaglitazar=CID 208901 on the right

smiles simplified molecular input line entry notation for encoding molecular structures
Interconverts with 2D sketchers

Can then be searched

Human readable

SMILES -simplified molecular input line entry notation for encoding molecular structures
iupac systematic naming of organic chemical compounds
IUPAC Systematic Naming of Organic Chemical Compounds
  • International Union of Pure and Applied Chemistry (IUPAC)
  • Should human readable and allow an unambiguous structural formula to be drawn
  • Usable for automated text-to-structure conversion
  • Taxol

(2aR,4S,4aS,6R,9S,11S,12S,12aR,12bS)-1,2a,3,4,4a,6,9,10,11,12,12a,12b-Dodecahydro-

4,6,9,11,12,12b-hexahydroxy-4a,8,13,13-tetramethyl-7,11-methano-5H-cyclodeca(3,4)benz(1,2b)oxet-5-one 6,12b-diacetate, 12-benzoate, 9-ester with (2R,3S)-N-benzoyl-3-phenylisoserine

iupac international chemical identifier inchi textual identifier for chemical substances
IUPAC International Chemical Identifier (InChI) Textual Identifier for Chemical Substances
  • A formalized string conversion of IUPAC names but not human readable
  • Express more information than the simpler SMILES notation and differ in that every structure has a unique InChI string
  • InChI algorithm converts structural information in a three-step process: normalization (to remove redundant information), canonicalization (to generate a unique number label for each atom), and serialization (to give a string of characters) but without explicit 3D information
  • The 25 character InChIKey is a hashed version of the full InChI designed to allow for easy web searches of chemical compounds (e,g, Google)
cas registry number
CAS Registry Number
  • Unique numeric identifier Contains up to 10 digits, divided by hyphens into three parts, e.g. 58-08-2 for caffeine (Google it)
  • Has no chemical significance
  • Widely used but not open-access because the source chemical information links to the CAS commercial databases e.g. SciFinder
  • Consequently the consistency of mappings to open identifiers cannot be verified
pubchem identifiers cids and sids
PubChem Identifiers: CIDs and SIDs
  • PubChem is the NCBI informatics backbone for the NIH Molecular Libraries Initiative
  • A suite of three databases, PubChem Compound unique structures with computed properties ) PubChem BioAssay ( results supplied by depositors) and PubChem Substance( deposited compound structures)
  • The ten MLI-funded screening centers are run cellular and target-based HTS’s using a compound collection of ~ 250 K and submitting the results to PubChem
pubchem is now a global hub including bioinformatic dbs with in links
PubChem is now a Global Hub Including bioinformatic dbs with in-links

MMDB, PDB ligands

55K

ChEBi, enzyme ligands

8K

P

u

b

C

h

e

m

ZINC, ready-to-dock

3.8 mill

KEGG, drugs and metabolites

14K

ChemBank, chemical genomics

0.4 mill

Human Metabolite db

2K

ChemIDplus, NIH tox data

383K

MEROPS protease inhibitors

ChemSpider 20 million

DrugBank, drugs and targets

4K

Drugs of the Future

3.4K

GPCR-Ligand Database

Nature Chemical Biology

0.8 K

LIPID MAPS, metabolism

8.8K

searchable measures of chemical similarity
Searchable Measures of Chemical Similarity
  • 1D: measured or computed molecular properties, e.g., molecular weight, number of rings, molecular surface area or volume, pKa, logP etc
  • 3D: map a molecular surface, chemical graphs, spectral descriptors, distribution of electrostatic charge around a molecule
  • 2D fingerprints are by far the most common, based on a bit-string encoding of substructural occurrences
molecular fingerprints for similarity searching
Molecular Fingerprints for Similarity Searching
  • Each bit in the fingerprint (or fragment bit-string) represents one molecular fragment. Typical length is ~1000 bits
  • The bit string for a molecule records the presence (“1”) or absence (“0”) of each fragment in the molecule
  • Compare fingerprints of two molecules to identify common bits and hence common substructures (and hence overall structural resemblance)
tanimoto chemical similarity
A

B

a

c

b

Tanimoto Chemical Similarity
  • Tally features:
    • Unique (a,b)
    • Both on (c)
    • Both off (d)
  • Similarity Formula
    • Tanimoto=c/(a+b+c)

Beware: Chemical Similarity searches are not standardised between databases

disambiguation
Disambiguation

From: Wells et al. Reaching for high-hanging fruit in drug discovery at protein–protein interfaces

1R6N

1Y2F

checking chemical patents
Checking Chemical Patents
  • Taking Nutlin-3 as an example the SMILES entry from PubChem

CC(C)OC1=C(C=CC(=C1)OC)C2=NC(C(N2C(=O)N3CCNC(=O)C3)C4=CC=C(C=C4)Cl)C5=CC=C(C=C5)Cl

was pasted into the SureChem search box

  • There are nine exact matches including the granted patent application from Roche shown below
exploring relationships in entrez
Exploring Relationships in Entrez

BLAST

Sequence

Similarity

Protein

Sequence

Biological Terms

MeSH indexed

Literature

PubMed

VAST

Structure

Similarity

Protein 3D

Structures

Bioactivity

Assay

Results

2D Chemical

Structure

Similarity

(3D soon)

Small

Molecule

Structures

Protein

Sequences

Activity

Profile

Similarity

linkage between swiss prot drugbank pubchem mmdb
Linkage between Swiss-Prot-DrugBank-PubChem-MMDB

(411)

(15728) = 181

(2501)

see these marketed target links

ad