1 / 28

What’s new in JChem back-end and Markush storage, search and enumeration

What’s new in JChem back-end and Markush storage, search and enumeration. Szabolcs Csepregi. Solutions for Cheminformatics. Contents. ChemAxon chemical database tools Main features of JChem Base, Cartridge Example interfaces: JSP, ASP, AJAX examples Integration with other CXN products

loman
Download Presentation

What’s new in JChem back-end and Markush storage, search and enumeration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What’s new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics

  2. Contents ChemAxon chemical database tools Main features of JChem Base, Cartridge Example interfaces: JSP, ASP, AJAX examples Integration with other CXN products Markush structure storage, search and enumeration Recent developments, plans

  3. Chemical database products JChem Base • A library for adding chemical structures into relational database systems. Available in Java, JSP and .NET • Open-source web application example is available. JChem Cartridge for Oracle • Extends Oracle SQL with chemical operators and index. • SQL interface for ChemAxon functionality Instant JChem • An all-in-one desktop chemical database application. JChem Web Services – SOAP interface to JChem Base JC4XL – Excel integration (coming)

  4. Compatibility and integration Supported chemical file formats: • SMILES • MDL MOL/RXN/SDF/RDF (v2000 and v3000) • CML, MRV • IUPAC and traditional names • InChI, mol2, PDB, etc. Database engines: • Oracle,MySQL, MS SQL Server, MS Access, PostgreSQL, IBM DB2, Derby, etc. All operating systems through: • Java API (JChem Base) • .NET API (JChem Base + IKVM) – for Windows • SQL (Cartridge)

  5. Structure searching: features Substructure, Similarity, Full, Full fragment, etc. search types Wide range of query atoms Query properties R-group queries Full SMARTS support Coordination compounds Link nodes Pseudo atoms, Lone pairs Relative stereo Reaction search features Polymers Position variation Hit coloring ... www.chemaxon.com/conf/Structural_Search.ppt

  6. Structure searching: options Some selected structure search options: • Chemical Terms filter constraint • Tautomer search • Stereo on/off • Ignore charge/isotope/radical/valence/polymers • Vague bond matching modes: „or aromatic”; ignore bond types • Inverse hit list • Maximum search time / number of hits • SQL SELECT statement for pre-filtering • Ordering of results • etc.

  7. Structure search: performance Compound registration: Substructure search in PubChem (19.5 million compounds): JChem Base 5.2.0, Intel Quad Q6600 2.4GHz, 8GB RAM; Oracle 10.2.0.3

  8. Table types Control allowed chemical structures and available operations Molecule Reaction Markush Query Any structure

  9. Example web applications Open source JSP, ASP examples • Marvin applets are used for query drawing and structurevisualization AJAX example • Back-end is JChem Web Services • No Java is neededfor browsing Demo

  10. Integration Integration with other ChemAxon tools: • Custom, uniform chemical representation. (Standardizer – see separate presentation today.) • Automatically calculated properties by Chemical Terms Calculated columns (Calculator plugins) • Additional similarity calculations (Screen - JChem Base only) • Tautomer handling: • Tautomer search • Tautomer duplicate filter table/index option • Custom tautomer transforms or canonical tautomer using Standardizer • Query drawing and structure visualization (Marvin)Provides the most consistent interface and back-end.

  11. Integration Additional Cartridge functionality • JChem index (for non-JChem tables) • Communication with Oracle optimizer • Reaction based enumeration (Reactor) • Format conversions – image generation also • Markush enumeration (Calculator plugins) • Property predictions through Chemical Terms (Calculator plugins)

  12. Registration system • New component for registration system is under development (API only) • Main features: • Customizable business logic • Multilevel duplication control • Customizable corporate registration ID • Handling of salts, batches, lots, samples, and mixtures • Identification, split and registration of salt and solvent structures Storage of input structures in original format • Mock registration (dry run) • Pre-registration through a transitory area • Basic, customizable implementation examples • Separate examples for chemists and registrars • Web and Instant JChem interfaces will follow later

  13. Handling of Markush structures

  14. Markush structures • Combinatorial Markush structure registration and search features handled in search and enumeration • R-groups (nesting to any depth) • Atom lists, bond lists • Position variation bond • Link nodes • Repeating units • Homology groups (aryl, alkyl, etc.) • Built-in • User-defined • Compatible Markush enumerationplugin

  15. Markush Enumeration • Markush enumeration plugin • Full enumeration • Selected parts only • Random enumeration • Calculate library size:exact size of huge Markush libraries arbitrary precision or Magnitude • Scaffold alignmentand coloring • Markush code • Optional example homology groupenumeration

  16. Markush storage & search • Available in JChem Base and Instant JChem • No enumeration involved – can handle very complex Markush structures (tested up to 1040, but no explicit limits were built in.) • Substructure and Full structure search • Basic query features supported • Substructure hit visualization: „Markush structure reduction”

  17. Markush demo

  18. What’s new

  19. What’s new: JChem Base 5.1 • Position variation in queries • New fast & reliable tautomer duplicate search 5.2 • .NET API • Polymer storage and search • New query options and features including searching of attached data, group matching of undefined R-atoms, repeating units. • Improved substructure search performance • JChem Web Services • New metrics for similarity search (Tversky, etc.) (5.2.2)

  20. What’s new: JChem Base Polymer support details • Polymer brackets and properties(type, connectivity, etc.) considered during search and registration • Attached data search (optional) – attached to atoms/bonds/brackets • Source- and structure-based representation equivalence is checked (but can be switched off) • Addition to a double bond. E.g. polystyrene. • Polymerization through elimination of water or HCl. E.g. polyester, polyamide.

  21. What’s new: JChem Base Polymer support details (cont.) • Ladder type polymers • Phase-shifting (for ht SRU) (can be switched off) • End group matching: • * atoms: unspecified end groups • Search option to switch on/off end group matching • Copolymer types: co, alt, rnd, blk, grf, xl, mer, mod • Polymer mixtures • New search options

  22. What’s new: Cartridge-specific 5.1 • Tautomer duplicate filtering index option • Alter index option • Improved import speed (5.1.3) • Improved upgrade: no need to remove/recreate indices (5.1.4) 5.2 • Interactive installer • Increased substructure search performance (5.2.2) • Tversky similarity search (5.2.2)

  23. What’s new: Markush • New Features • Homology groups • 19 built-in groups • Customizable: • Examples (for built-in groups, enumeration only), • Full user-defined homology groupsdefined by R-group definition • Marvin templates for easier sketching • Import reagent files as R-groups • Position variation and Repeating units

  24. Plans

  25. Plans: JChem Base & Cartridge JChem Base • Further speed improvements (SSS, similarity) • New vague bond level options • R-group decomposition integration • Improved support for Screen molecular descriptors Cartridge • Screen molecular descriptors (BCUT, pharmacophore similarity, chemical hashed fp, etc) and metrics (Euclidean, Dice, etc.) for similarity search • User-defined descriptor fingerprints • Markush tables and search • JChem Server, JChem cluster

  26. Plans: Markush • .VMN import (format used by Merged Markush Service & Derwent World Patent Index) • Multiple graphical attachment points of R-groups • Homology variation queries • Overlap analysis of Markush structures • Homology group properties (# of atoms, branching points, # of heteroatoms, etc.) • Conditions for Markush variables

  27. Summary JChem Base and Cartridge are comprehensive and efficient Markush structure storage, search and enumeration now reaching patent features coverage Continuous development, improvements in the pipeline

  28. Find out more • Product descriptions & links www.chemaxon.com/products.html • Forum www.chemaxon.com/forum • Presentations and posters www.chemaxon.com/conf • Download www.chemaxon.com/download.html

More Related