Download
smiles n.
Skip this Video
Loading SlideShow in 5 Seconds..
SMILES PowerPoint Presentation

SMILES

192 Views Download Presentation
Download Presentation

SMILES

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. SMILES

  2. Simplified molecular input line entry specification The simplified molecular input line entry specification or SMILES is a specification for unambiguously describing the structure of chemicalmolecules using short ASCIIstrings SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules

  3. SMILES • Simplified Molecular Input Line Entry System (SMILES) • Widely used AND computationally efficient • Uses atomic symbols and a set of intuitive rules • Uses hydrogen-suppressed molecular graphs (HSMG)

  4. Canonical SMILES and Isomeric SMILES • The term Canonical SMILES refers to the version of the SMILES specification that includes rules for ensuring that each distinct chemical molecule has a single unique SMILES representation • A common application of Canonical SMILES is for indexing and ensuring uniqueness of molecules in a database • The term Isomeric SMILES refers to the version of the SMILES specification that includes extensions to support the specification of isotopes, chirality, and configuration about double bonds • A notable feature of these rules is that they allow rigorous partial specification of chirality.

  5. Graph-based definition • In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph • The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree • Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes • Parentheses are used to indicate points of branching on the tree

  6. SINGLE* DOUBLE TRIPLE AROMATIC* * can be omitted - = # : SMILES Bonds

  7. SMILES Branches • Represented by enclosure in parentheses • Can be nested or stacked • Examples: CC(O)CC is 2-Butanol OCC(C)C is iso-Butanol OC(C)(C)C is tert-Butanol

  8. Ethene Chloroethene 1,1-Dichloroethene cis-1,2-Dichloroethene Trichloroethene Perchloroethene C=C ClC=C ClC(Cl)=C ClC=CCl ClC(Cl)=CCl ClC(Cl)=C(Cl)Cl SMILES Bonds

  9. SMILES Symbols • String of alphanumeric characters and certain punctuation symbols • Terminates at the first space encountered when read left to right • The ORGANIC SUBSET: B, C, N, O, P, S, F, Cl, Br, I

  10. Other SMILES Atoms • Aliphatic or nonaromatic carbon: C • Atom in aromatic ring: lowercase letter • Designate ring closure with pairs of matching digits, e.g. c1ccccc1 is Benzene, whereas C1CCCCC1 is Cyclohexane

  11. SMILES Charges • Specify attached hydrogens and charges in square brackets • Number of attached hydrogens is the symbol H followed by optional digit

  12. [H+] [OH-] [OH3+] [Fe++] [NH4+] proton hydroxyl anion hydronium cation iron(II) cation ammonium cation SMILES Charges

  13. SMILES Cyclic Structures • Break one single or one aromatic bond in each ring • Number in any order • Designate ring-breaking atoms by the same digit following the atomic symbol

  14. Cyclic Structures • Numbers indicate start and stop of ring • Same number indicates start and end of the ring, entered immediately following the start/end atoms • Only numbers 1 – 9 are used • A number should appear only twice • Atom can be associated w. 2 consecutive numbers, e.g., Napthalene: c12ccccc1cccc2

  15. SMILES Conventions • Avoid two consecutive left parentheses if possible • Strive for the fewest number of possible branches • Tautomeric bonds are not designated; enter the appropriate form

  16. Further Restrictions • A branch cannot begin a SMILES notation • A branch cannot immediately follow a double- or triple-bond symbol • Example: C=(CC)C is invalid, but • C(=CC)C or C(CC)=C are valid SMILES

  17. Nitro Nitrate Nitrite Sulfonic acid Cyanide/Nitrile Azide Azido N(=O)(=O) ON(=O)(=O) ON(=O) S(=O)(=O)O C#N N=N#N N+=N- SMILES Fragments

  18. SMILES Metals [Al] [As] [Au] [Be] [Bi] [Cd] [Ca] [Fe] [Hg] [K] [Li] [Mg] [Na] [Ni] [Pt] [Sb] [Sn] [Zn] [Zr]

  19. Disconnected Structures • Tetramethyl ammonium bromide C[N+]C(C)C.[Br-]

  20. Isomeric and Chiral SMILES • Isomeric configuration indicated by forward and backward slashes: / \ • Examples: • trans-1,2-dibromoethene: Br/C=C/Br • cis-1,2-dibromoethene: Br/C=C\Br • Chirality indicated by the “@” symbol

  21. Another Application • SMILESCAS Database http://esc.syrres.com/interkow/smilecas.htm • Over 103,000 SMILES notations • Input CAS Registry Number • Leads to SMILES and thence to a structure search

  22. Example 1 CC(C(C)(C)(Br))C

  23. Example 2

  24. Example 3

  25. Example 4

  26. Example 5

  27. Example 6

  28. Example 7

  29. Example 8

  30. Example 9

  31. Example 10

  32. Example 11

  33. Example 12

  34. Example 13

  35. Example 14

  36. Example 15

  37. Example 16