240 likes | 361 Views
This work discusses the importance of standardizing molecular structures in chemoinformatics to improve consistency and enhance functionality. It covers various processes such as canonicalization, uniformization, beautification, and modification of molecular structures without altering their chemical content. The roles of tautomers, mesomers, and aromatic structures are highlighted, along with practical applications in virtual synthesis and structure database management. The techniques utilized aim to make structures more visually appealing, simplify search results, and facilitate further analysis.
E N D
Standardizer Molecular Cosmetics for Chemoinformatics György Pirok Nóra Máte István Cseh Szilárd Dóránt Péter Kovács Szabolcs Csepregi Ferenc Csizmadia
Why standardize structures? • Canonicalisation • Uniformization of structures without changing the chemical content to recognize duplicates, functional groups (aromatization, mesomers, tautomers, ... ) • Beautification • Making the structures visually more attractive (dearomatization, cleaning coordinates, wedge orientation, ... ) • Modification • Conversion of structures by modifying its original content as a preparation step for further chemoinformatics tasks (transformations, removing stereo, removing R-groups, ...). often difficult to categorize the standardization actions
Canonicalisation • Hydrogens • Tautomers making hydrogens explicit converting to canonical tautomer form making hydrogens implicit transforming to user defined tautomer form • Resonant structures • Other aromatizing Kekülé rings removing small fragments converting to canonical mesomer form removing user defined fragments transforming to user defined mesomer form expanding stoichiometry setting the chiral flag
Beautification • Hydrogens • Cleaning making hydrogens implicit calculating 2D coordinates reallocating wedge bonds • Resonant structures template based cleaning converting aromatic rings to Kekülé format 3D geometry optimization • Groups contracting/expanding/ungrouping abbreviated and multiple groups
Template-based Cleaning2D-coordinate calculation of macrocycles or bridged systems
Template-based Cleaningaligning search results to the query query
Canonicalization During Database Import client server input structures JChem Base / Cartridge Standardizer canonicalization configuration canonicalized structures original structures Relational Database
Sending Query to the Database client server query structure JChem Base / Cartridge Standardizer query is compared to the canonicalized structures canonicalization configuration canonicalizedquery Relational Database
Displaying Result Structures client server beautified structures JChem Base / Cartridge Standardizer beautification configuration original structures Relational Database
Modification + custom transformations
Standardizer st = new Standardizer(new File("standardize.xml")); st.standardize(mol); standardize input.sdf -c config.xml -o output.smiles API and command line interface
How can ChemAxon Help • Free for non commercial websites • Free for academic teaching and research“Academic Package” • Free Academic Package to be extended to cover academic networks – campus-wide roll out
Acknowledments • Ferenc Csizmadia • Nóra Máté • István Cseh • Szabó Attila • Szilárd Dóránt • Péter Kovács • Szabolcs Csepregi