smiles 2 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
SMILES 2 PowerPoint Presentation
Download Presentation
SMILES 2

Loading in 2 Seconds...

play fullscreen
1 / 38

SMILES 2 - PowerPoint PPT Presentation


  • 177 Views
  • Uploaded on

SMILES 2. C371 Lecture Based on Dr. David Wild’s C571 Presentations Fall 2004. Linear Notations. Represent the atoms, bonds, and connectivity as a linear text string SMILES Concise Orignally designed for manual command line entry into text-only systems Now widely used

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'SMILES 2' - nemo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
smiles 2

SMILES 2

C371 Lecture

Based on Dr. David Wild’s C571 Presentations

Fall 2004

linear notations
Linear Notations
  • Represent the atoms, bonds, and connectivity as a linear text string
  • SMILES
    • Concise
    • Orignally designed for manual command line entry into text-only systems
    • Now widely used
  • Can be input to a spreadsheet cell, on one line of a text file, or in an Oracle database text field
  • System to generate canonical form of SMILES
review of smiles
Review of SMILES
  • Atoms represented by normal chemical symbols (uppercase for aliphatics, lowercase for aromatic)
  • Adjacent atoms imply single bonds
  • Use = for double, # for triple bonds
  • Hydrogens usually implicit
  • Parentheses imply branching
  • Ring closure indicated by numbers
smiles review cont d
SMILES Review (cont’d)
  • Can make Hydrogens explicit
  • Non-organic atoms are put in square brackets, e.g., [Xe]
  • Charged species also in square brackets with a + or -, e.g., [Na+] or [O-]
  • Unknown atoms indicated by a *
  • Stereochemistry represented by @@
smiles for tyrosine
SMILES for Tyrosine

NC(Cc1ccc(O)cc1)C(=O)O

smiles for isatin
SMILES for Isatin

O=c2[nH]c1ccccc1c2=O

canonicalizing smiles morgan algorithm
Canonicalizing SMILES – Morgan Algorithm
  • Each atom has a connectivity value: how many atoms it is connected to
  • That value is replaced by the sum of the connectivity values of the its neighbors
  • Continues iteratively, until number of different values is maximized
  • Atoms are numbered in decreasing order of connectivity value
    • In case of a tie, other properties are used (e.g. atomic number, bond order, etc).
canonicalizing smiles cangen
Canonicalizing SMILES – CANGEN
  • Two-stage procedure used by Daylight
  • First stage CANON, generates a canonical connection table using a modified version of the Morgan Algorithm that produces a tree structure
  • Second stage GENES creates a unique SMILES using a depth-first search of a the molecular graph tree output by CANON
  • More information – JCICS 29,1989,97-101
representing reactions
Representing reactions

CH4 + 2O2 CO2 + 2H2O

  • Need to identify the 2D arrangement of products and reagents and distinguish them)
    • Possibly map which starting material atoms map to which product atoms.
  • Other information (e.g., yield, equilibrium constants, conditions generally stored separately
  • Not all reactions specified stoichiometrically
simple reaction smiles
Simple Reaction SMILES
  • Each reagent and product represented as SMILES
  • Reagents on the left of a “>>”; products on the right
  • Individual reagents and products are separated by a “.”

CH4 + 2O2 CO2 + 2H2O

Reaction SMILES: C.OO>>C(O)O.O

reaction smiles example
Reaction SMILES example
  • Agents specified between the two “>>”

Reaction SMILES: C.O=O>O=[O+]-[O-]>O=C=O.O

reaction smiles example13
Reaction SMILES example
  • Note implicit hydrogens

Reaction SMILES: C(=O)Cl.NC>>C(=O)NC.Cl

atom mapping smirks representation
Atom-mapping SMIRKS representation
  • Each reactant atom gets a tag (e.g “C” becomes “[C:1]”) which maps to the same product tag.
  • Hydrogens are explicit

SMIRKS:

[C:1](=[O:2])[Cl:3].[H:99][N:4]([H:100])[C:0]>>[C:1](=[O:2])[N:4]([H:100])[C:0].[Cl:3][H:99]

daylight rs smirks sites
Daylight RS/SMIRKS Sites
  • Basic reaction representation (Reaction SMILES)
    • http://www.daylight.com/dayhtml_tutorials/languages/smiles/index.html
  • SMIRKS introduction
    • http://www.daylight.com/dayhtml_tutorials/languages/smirks/index.html
  • SMIRKS theory
    • http://www.daylight.com/dayhtml/doc/theory/theory.rxn.html
  • SMIRKS depicter
    • http://www.daylight.com/daycgi_tutorials/react.cgi
representing generic structures
Representing generic structures
  • A generic structure is one which, by ambiguity, represents a (possibly infinite) set of possible structures
  • Ambiguity usually takes the form of “R” groups
  • Originally used for representing patents
  • Now used for representing combinatorial libraries too
  • Also known as Markush Structures
specifying a substructure query with smarts
Specifying a substructure query with SMARTS
  • SMARTS: a superset of SMILES extended to allow partial structures (substructures) and optional parts of molecules to be represented
  • Simple example

*C(=O)O

where the * represents an attachment point (i.e. any number of any atoms)

  • More information:
    • http://www.daylight.com/meetings/summerschool01/course/basics/smarts.html
    • http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html
try out a smarts search
Try out a SMARTS search
  • DepictMatch:
    • http://www.daylight.com/cgi-bin/contrib/depictmatch.cgi
  • Enter a set of SMILES and a SMARTS, and any part of the SMILES that is found in the SMARTS is highlighted
  • As an example, we’ll use the sample dataset described on the following two slides, and use *C(=O)O (carboxyl group) as our SMARTS and RC(=O)O (carboxyl attached to a ring)
sample dataset
Sample dataset

Acetaminophen

Alprenolol

Amphetamine

Captopril

Chlorpromazine

Diclofenac

Gabapentin

Salicylate

sample dataset smiles file
Sample Dataset SMILES file
  • CC(=O)Nc1ccc(O)cc1 Acetaminophen
  • CC(C)NCC(O)COc1ccccc1CC=C Alprenolol
  • CC(N)Cc1ccccc1 Amphetamine
  • CC(CS)C(=O)N1CCCC1C(=O)O Captopril
  • CN(C)CCCN1c2ccccc2Sc3ccc(Cl)cc13 Chlorpromazine
  • OC(=O)Cc1ccccc1Nc2c(Cl)cccc2Cl Diclofenac
  • NCC1(CC(=O)O)CCCCC1 Gabapentin
  • COC(=O)c1ccccc1O Salicylate
web oracle systems
Web / Oracle Systems
  • Advantages
    • Single database for structures and data
    • No software to install on client machines (except maybe plug-ins like Chime)
    • Not dependent on (expensive) contract with MDL
    • Highly customizable
  • Disadvantages
    • Requires extensive web-based interface software to be written, for registration, searching, etc
    • Company will have to maintain system internally
    • Requires current ISIS system to be abandoned
chemistry cartridges
Chemistry Cartridges
  • Daylight DayCart
    • http://www.daylight.com/products/daycart.html
  • Tripos Auspyx
    • http://www.tripos.com/sciTech/inSilicoDisc/chemInfo/auspyx.html
  • Accelrys Accord for Oracle
    • http://www.accelrys.com/accord/oracle.html
  • MDL Direct
    • http://www.mdl.com/products/framework/rel_chemistry_server/index.jsp
  • IDBS ActivityBase
    • http://www.id-bs.com/products/abase/
  • JChem Cartridge
    • http://www.jchem.com
example daycart
Example - DayCart
  • Store SMILES as string (VARCHAR2) in Oracle database
  • Cartridge provides extra functions and extensions to functions for searching based on chemical structures
  • Structure search implemented by EXACT function
  • Substructure search implemented by MATCHES function
  • Similarity search implemented by TANIMOTO and EUCLID functions
measuring similarity between molecules
Measuring similarity between molecules
  • Similar Property Principle: “Molecules with similar structure are likely to have similar biological activity”
  • Generally the Tanimoto Coefficient or Euclidean Distance between fingerprints is used
fingerprint similarity tanimoto

c

Tanimoto Similarity =

#a + #b - c

Fingerprint Similarity – Tanimoto
  • Also known as Jaccard Coefficient
  • ‘1s’ in common / ‘1s’ not in common
  • 0’s are treated as not significant
  • Similarity is between 0 (dissimilar) and 1 (same)
  • Good cutoff for likely biologically similar molecules is 0.7 or 0.8

c = ‘1’s in common

#a = ‘1’s in fingerprint A

#b = ‘1’s in fingerprint B

A 101101011

B 011101101

c = 4

#a = 6

#b = 6

  • Example:

Tanimoto Similarity =4 / ( 6 + 6 – 4 ) = 0.5

fingerprint similarity euclidean
Fingerprint similarity – Euclidean
  • Pythagorean distance
  • For binary dimensions, equivalent to the square root of the Hamming distance (i.e. square root of the number of bits that are different)
  • 0’s are treated as significant
  • Smaller values mean more similar
  • Example:

101101011

011101101

Different?xx xx

Euclidean distance = sqrt(4) = 2.0

sample dataset29
Sample dataset

Acetaminophen

Alprenolol

Amphetamine

Captopril

Chlorpromazine

Diclofenac

Gabapentin

Salicylate

sample dataset smiles file30
Sample Dataset SMILES file
  • CC(=O)Nc1ccc(O)cc1 Acetaminophen
  • CC(C)NCC(O)COc1ccccc1CC=C Alprenolol
  • CC(N)Cc1ccccc1 Amphetamine
  • CC(CS)C(=O)N1CCCC1C(=O)O Captopril
  • CN(C)CCCN1c2ccccc2Sc3ccc(Cl)cc13 Chlorpromazine
  • OC(=O)Cc1ccccc1Nc2c(Cl)cccc2Cl Diclofenac
  • NCC1(CC(=O)O)CCCCC1 Gabapentin
  • COC(=O)c1ccccc1O Salicylate
oracle table test for sample dataset
Oracle table Test for sample dataset

Smiles Name LogP

------ ---- ----

CC(=O)Nc1ccc(O)cc1 Acetaminophen 0.27

CC(C)NCC(O)COc1ccccc1CC=C Alprenolol 2.81

CC(N)Cc1ccccc1 Amphetamine 1.76

CC(CS)C(=O)N1CCCC1C(=O)O Captopril 0.84

CN(C)CCCN1c2ccccc2Sc3ccc(Cl)cc13 Chlorpromazine 5.20

OC(=O)Cc1ccccc1Nc2c(Cl)cccc2Cl Diclofenac 4.02

NCC1(CC(=O)O)CCCCC1 Gabapentin -1.37

COC(=O)c1ccccc1O Salicylate 2.60

daycart structure search using sql
DayCart structure search using SQL

select * from Test where

exact(Smiles, “CC(N)Cc1ccccc1”) = 1;

Smiles Name LogP

------ ---- ----

CC(N)Cc1ccccc1 Amphetamine 1.76

daycart substructure search
DayCart substructure search

select * from Test where

matches(Smiles, “*C(=O)O”) = 1;

Smiles Name LogP

------ ---- ----

CC(CS)C(=O)N1CCCC1C(=O)O Captopril 0.84

OC(=O)Cc1ccccc1Nc2c(Cl)cccc2Cl Diclofenac 4.02

NCC1(CC(=O)O)CCCCC1 Gabapentin -1.37

COC(=O)c1ccccc1O Salicylate 2.60

substructure search for carboxylic acid
Substructure search for carboxylic acid

Acetaminophen

Alprenolol

Amphetamine

Captopril

Chlorpromazine

Diclofenac

Gabapentin

Salicylate

daycart substructure value search
DayCart substructure / value search

select * from Test where

(matches(Smiles, “*C(=O)O”) = 1)

AND (LogP > 1.0));

Smiles Name LogP

------ ---- ----

OC(=O)Cc1ccccc1Nc2c(Cl)cccc2Cl Diclofenac 4.02

COC(=O)c1ccccc1O Salicylate 2.60

daycart similarity search
DayCart similarity search

Aspirin

select * from TEST where

tanimoto(SMILES, “CC(=O)Oc1ccccc1C(=O)O”) > 0.6;

SMILES NAME LOGP

------ ---- ----

COC(=O)c1ccccc1O Salicylate 2.60

CC(=O)Nc1ccc(O)cc1 Acetaminophen 0.27

CC(N)Cc1ccccc1 Amphetamine 1.76

similarity search for carboxylic acid
Similarity search for carboxylic acid

Acetaminophen

Alprenolol

Amphetamine

Captopril

Chlorpromazine

Diclofenac

Gabapentin

Salicylate

more examples of daycart
More examples of DayCart

http://www.daylight.com/meetings/summerschool02/course/admin/daycart_hints.html