1 / 20

SDF File analysis

SDF File analysis. Creation , composition , checking. Concerning chemical table files. Chemical table files  are files that contain information about chemicals Various formats RGfiles , Rxnfiles , RDfiles , XDfiles and Clipboard Molfile , SDF. MDL Molfile.

forest
Download Presentation

SDF File analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SDF File analysis Creation, composition, checking

  2. Concerningchemicaltablefiles • Chemical table files are files that contain information about chemicals • Variousformats • RGfiles, Rxnfiles, RDfiles, XDfiles and Clipboard • Molfile, SDF

  3. MDL Molfile • A file format for holding information about the atoms, bonds, connectivity and coordinates of a molecule • Most cheminformatics and somecomputationalsoftwaresareabletoread • Standard version: V2000 • Containing a header and a connectiontable

  4. MDL Molfilecontent Generated by Molgen 5.0 11 9 0 0 0 0 -0.0666 -1.5989 0.0514 C 0 0 0 0 0 0 0 0 0 0 0 0 0 1.2913 -1.6184 -0.1221 C 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.9621 -1.2620 -0.9586 O 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.0783 1.8974 -0.4702 O 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.4844 1.6346 0.9333 O 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.5244 -1.8601 1.0528 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1.7535 -1.3543 -1.1238 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1.9833 -1.8974 0.7324 H 0 0 0 0 0 0 0 0 0 0 0 0 0 -1.9833 -1.2177 -0.8648 H 0 0 0 0 0 0 0 0 0 0 0 0 0 0.8090 1.5332 -0.8167 H 0 0 0 0 0 0 0 0 0 0 0 0 0 -1.3677 1.1615 1.1238 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 1 3 1 0 0 0 0 1 6 1 0 0 0 0 2 7 1 0 0 0 0 2 8 1 0 0 0 0 3 9 1 0 0 0 0 4 5 1 0 0 0 0 4 10 1 0 0 0 0 5 11 1 0 0 0 0 M END $$$$

  5. MDL SDF file • SDF = structure-data file • Wrapsthemolfileformat

  6. SDF content §1 – molecularinformations ./MinCheck/C2_H6_N0_O3_F0_S0_1.log OpenBabel04161413273D Gaussian 09 # G3MP2B3 Opt(Cartesian,Tight,CalcAll,MaxStep=1,MaxCycles=300) QCISD 11 9 0 0 0 0 0 0 0 0999 V2000 0.4466 -1.5390 0.0292 C 0 0 0 0 0 0 0 0 0 0 0 0 1.4790 -2.1676 -0.5273 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.2693 -0.5704 -0.6322 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.3941 2.0659 0.3307 O 0 0 0 0 0 0 0 0 0 0 0 0 -1.5836 1.3451 0.7668 O 0 0 0 0 0 0 0 0 0 0 0 0 0.1141 -1.7508 1.0446 H 0 0 0 0 0 0 0 0 0 0 0 0 1.7979 -1.9482 -1.5413 H 0 0 0 0 0 0 0 0 0 0 0 0 2.0238 -2.9170 0.0345 H 0 0 0 0 0 0 0 0 0 0 0 0 -1.0239 -0.2837 -0.0806 H 0 0 0 0 0 0 0 0 0 0 0 0 0.0506 1.3459 -0.1697 H 0 0 0 0 0 0 0 0 0 0 0 0 -2.2708 1.8377 0.2828 H 0 0 0 0 0 0 0 0 0 0 0 0 1 6 1 0 0 0 0 2 1 2 0 0 0 0 2 8 1 0 0 0 0 3 9 1 0 0 0 0 3 1 1 0 0 0 0 4 5 1 0 0 0 0 7 2 1 0 0 0 0 10 4 1 0 0 0 0 11 5 1 0 0 0 0 M END

  7. SDF content §2 – input and calculatedparameters > <Scale factor> 0.96 > <Stoichiometry> C2H6O3 > <Charge> 0 > <Multiplicity> 1 > <Molecular mass> 78.03169 > <DegreeOfFreedom> 27 > <Permanent dipole moment(B3LYP, Debye)> 1.475 > <ABC(cm-1)> 14.133 1.731 1.655 > <Scaled freq(cm-1)> 49.1 59.1 80.1 182.8 222.6 335.5 460.0 529.6 663.0 762.0 812.3 911.3 928.1 944.3 1124.8 1287.3 1299.6 1321.8 1403.2 1483.7 1689.2 3041.9 3064.2 3147.0 3408.9 3472.7 3557.0 > <IR intensities(rel.)> 4.5 3.8 6.6 7.8 25.1 93.3 16.9 79.8 60.8 214.2 73.0 2.9 55.0 16.5 33.8 210.3 56.9 126.8 4.4 22.8 90.0 19.2 0.4 8.3 59.4 559.4 26.8 > <Temp(K)> 298.150 > <Pressure(atm)> 1.00000 > <DfHg_G3MP2B3(kJ/mol)> -269.7 > <Scaled S(J/molK)> 363.4 > <UNScaled CV(J/molK)> 98.9

  8. SDF content §3 – moleculardescriptors > <MPD> 2;1-1-2;1-1-9;1-1-13;2-3-13; 2;1-1-2;1-2-13;2-1-9;2-1-13; 9;1-1-2;1-1-13;2-1-2;2-1-13; 8;1-1-8;1-1-13;2-1-13; 8;1-1-8;1-1-13;2-1-13; 13;1-1-2;2-1-2;2-1-9; 13;1-1-2;2-1-2;2-1-13; 13;1-1-2;2-1-2;2-1-13; 13;1-1-9;2-1-2; 13;1-1-8;2-1-8; 13;1-1-8;2-1-8; > <MNA> -C(-H(-C)-C(-H-H-C)-O(-H-C)) -C(-H(-C)-H(-C)-C(-H-C-O)) -O(-H(-O)-C(-H-C-O)) -O(-H(-O)-O(-H-O)) -O(-H(-O)-O(-H-O)) -H(-C(-H-C-O)) -H(-C(-H-H-C)) -H(-C(-H-H-C)) -H(-O(-H-C)) -H(-O(-H-O)) -H(-O(-H-O)) > <SMI> C(=C)O.OO > <MolRT> 3 > <InChi> InChI=1S/C2H4O.H2O2/c1-2-3;1-2/h2-3H,1H2;1-2H > <InChiKey> JJZZTHKXWWHOAE-UHFFFAOYSA-N > <MCDL> CH;CHH;3OH[2,3;;;5] $$$$

  9. Molecularfragmentschemes • Developedinthe ’50s • Screens (struturalkeys, fingerprints) havebeendevelopedinthe ’70s • Generallytheyrepresentbigstringscan be storedeffectively -> compressed • Importantrole • inproviding efficient substructure searching capabilities in large chemical databases, • insimilarity searching, • in clustering large data sets, • inassessing chemical diversity, • in conducting SAR and QSAR studies

  10. Images of theoptimizedstructure(depicteddifferently) GaussView ChemDraw www.chemicalize.org (searchedafterInChI)

  11. MPD (MOLPRINT 2D) • MPD = MolecularPopulational Dynamics • A molecular similarity searching technique  based on atom environments • Atom environments are count vectors of heavy atoms present at a topological distance from each heavy atom of a molecule > <MPD> 2;1-1-2;1-1-9;1-1-13;2-3-13; 2;1-1-2;1-2-13;2-1-9;2-1-13; 9;1-1-2;1-1-13;2-1-2;2-1-13; 8;1-1-8;1-1-13;2-1-13; 8;1-1-8;1-1-13;2-1-13; 13;1-1-2;2-1-2;2-1-9; 13;1-1-2;2-1-2;2-1-13; 13;1-1-2;2-1-2;2-1-13; 13;1-1-9;2-1-2; 13;1-1-8;2-1-8; 13;1-1-8;2-1-8;

  12. MNA • MNA = MultilevelNeighbourhoodof Atoms • 2D molecular fragments suitable for use in QSAR modelling • Output: a complete descriptor fingerprint per molecule • Fragment: startingat the origin, each atom is appended to the descriptor immediately followed by a parenthesized list of its neighbours > <MNA> -C(-H(-C)-C(-H-H-C)-O(-H-C)) -C(-H(-C)-H(-C)-C(-H-C-O)) -O(-H(-O)-C(-H-C-O)) -O(-H(-O)-O(-H-O)) -O(-H(-O)-O(-H-O)) -H(-C(-H-C-O)) -H(-C(-H-H-C)) -H(-C(-H-H-C)) -H(-O(-H-C)) -H(-O(-H-O)) -H(-O(-H-O))

  13. SMILES (SMI) • SMILES = SimplifiedMolecularInputLineEntrySpecification • A linear text format which can describe the connectivity and chirality of a molecule • Specificallyrepresents a valence model of a molecule, not a computer data structure, a mathematical abstraction, or an "actual substance" > <SMI> C(=C)O.OO

  14. MolRT (easteregg, it’s molarity…)

  15. InChI • InChI = International Chemical Identifier, • Areliable computerized method to represent identities • Arepresentation of the chemical structure with details • Simple, but unique identifier for molecules (like a barcode) • Differentlayers separated with delimiters (/) • Main layer • Charge layer • Stereochemical layer • Isotopic layer • Fixed-H layer • Reconnected layer + = = • > <InChi> • InChI=1S/C2H4O.H2O2/c1-2-3;1-2/h2-3H,1H2;1-2H

  16. InChiKey • Ashortened and more browser-preferable form of InChIcode • Its lengths is fixed in 27 characters • The first 14 represent the molecular skeleton/connectivity matrix • Nextlayer contains8+1 characters • the first 8-character block encodes stereochemistry and isotopic substitution information • +1 character defines the kind of InChIKey (S=standard, N=non-standard) • Nextcharacter: used version of InChI • Finishingcharacter: protonationindicator > <InChiKey> JJZZTHKXWWHOAE-UHFFFAOYSA-N

  17. MCDL • MCDL = MolecularChemicalDescriptorLanguage; firstlypublishedin 2001 • Developed for linear representation of structural and other chemical information for chemical databases • Similarto InChI:both languages are modular, constitution, connectivity, and stereochemistry is represented by individual „modules” • MCDL provides direct placement of hydrogen atoms, whereas InChI uses a separate block > <MCDL> CH;CHH;3OH[2,3;;;5]

  18. Otherusefullinks and references • Todeschini, Roberto / Consonni, VivianaMolecular Descriptors for Chemoinformatics, 2., revised and enlarged Edition, 2009.ISBN 978-3-527-31852-0 - Wiley-VCH, Weinheim • Bender A, Mussa HY, Glen RC, Reiling S.: Similarity searching of chemical databases using atom environment descriptors(MOLPRINT 2D): evaluation of performance, J ChemInfComput Sci. 2004 Sep-Oct;44(5):1708-18. • GakhAA, Burnett MN.: Modular Chemical Descriptor Language (MCDL): composition, connectivity, andsupplementary modules, J ChemInfComput Sci. 2001 Nov-Dec;41(6):1494-9. • http://arxiv.org/ftp/arxiv/papers/1311/1311.3723.pdf • http://openbabel.org/wiki/Multilevel_Neighborhoods_of_Atoms • http://openbabel.org/wiki/SMILES • http://www.daylight.com/meetings/summerschool98/course/dave/smiles-intro.html • http://www.inchi-trust.org/ (and referencestherein) • http://www.iupac.org/home/publications/e-resources/inchi/download.html (and referencestherein) • http://www.chemspider.com/inchi-resolver/

  19. Yourobjectivesfortoday • Tocheckyour .sdf file fortwochosenisomers • Tocollectallthecodes • Tocomparethemwitheachother and finddifferences

  20. Thankyouforyourattention!

More Related