solutions for cheminformatics n.
Skip this Video
Loading SlideShow in 5 Seconds..
Solutions for Cheminformatics PowerPoint Presentation
Download Presentation
Solutions for Cheminformatics

Solutions for Cheminformatics

405 Views Download Presentation
Download Presentation

Solutions for Cheminformatics

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Solutions for Cheminformatics Migration from ISIS environment Szabolcs Csepregi et al November 2008

  2. Migration - Topics ChemAxon - Product Overview From Isis/Host and MDL Direct to JChem Cartridge Alternatives to Cheshire (Standardizer) From ISIS/Base to Instant JChem From ISIS for Excel To JChem for Excel Migrating Custom Applications ChemAxon Web Services Appendix: ChemAxon for Developers (Resources)

  3. Product Map

  4. ChemAxon Embedded - examples Workflow Pipeline Pilot, Inforsense, and KNIME. ELN Agilent, Contur, DeltaSoft, Kinematic, etc. SAR Spotfire, Synaptic Science, Omniviz Databases Aureus, GVK, Jubilant Biosys, Patcore Web Thomson Reuters, Wiley, Houghton Mifflin, Cengage, Prentice Hall, Collaborative Drug Discovery, RCSB PDB, BindingDB, NIH/NLM ChemIDPlus, Molport etc

  5. MarvinSketch/View MarvinSpace The Marvin family MarvinSketch MarvinView MarvinSpace Available as Java applets for HTML pages and Java beans for standalone apps (full API) Publication quality macromolecule visualization Individual and structure table visualization Structure, query & reaction editing

  6. Marvin Development History 1999 2000 2001 2002 2003 1998 SDF, RDF, XYZ animations, CML, templates, compressed formats, Swing, 3Dmodels SMILES, SMARTS, PDB, Rgroups, isotopes, shortcuts, Marvin Beans Ball and stickJPG, PNG, SVG, Cut&Paste with Isis/ChemDraw, 2D cleaning, (de)aromatization, reaction drawing Applets, Molfiles, stereo support, Windows, Unix Mac support, signed applets, Java Web Start, atom mapping Partial charge, pKa, logP/logD,3Doptimization, radicals,abbreviated groups 2008 2007 2006 2004 2005 Tautomers, resonance, lone pairs, conformers, 3D sketching, MarvinSpace, Topologyanalysis,presentation quality graphics,... More Plugins, more R-groups, EMF, PDF and Mol2, Improved property storage in MRV, SDfiles and Rdfiles. .NET support in MarvinBeans. Name to structure, OLE 2, Chemical Terms Customizable GUI Marvin file format, enhanced stereo, shapes, text boxes, multiple groups, link nodes, TPSA, recursive SMARTS, Donor/Acceptor,electron arrows, Structure to name, Coordination compounds, Polymer drawing, OLE, Markush enumerationplugin Configurations

  7. Calculator Plugins Calculator Plugins A variety of structure based calculations are available from the Marvin GUI, cxcalc command line tool and the API. The calculations are widely used within several JChem tools and are available as functions of Chemical Terms expressions. • Elemental Analysis • IUPAC Name • Standard IUPAC Name • Protonation • pKa, Major Microspecies, Isoelectric Point • Partitioning • logP, logD • Charge • Charge, Polarizability, Orbital Electronegativity • Isomers • Tautomerization, Resonance, Stereoisomer • Conformation • Conformer, Molecular Dynamics • Geometry • Topology Analysis, Geometry, Polar Surface Area (2D), Molecular Surface Area (3D) • Markush enumeration • Other • Hydrogen Bond Donor-Acceptor, Huckel Analysis, Refractivity

  8. Chemical Naming Structure to Name/ Name to structure Supported nomenclatures : • Chains, Monocycles/ Traditional names with and without heteroatom/ Spiro ring systems/ Ethers/ Common characteristic groups, Ionic compounds/ Unlimited number of atoms and rings/ All atom types /Stereochemistry/ etc. Usage: • drag&drop or copy&paste to MarvinSketch • Label updated in real-time • Automatic format recognition • Batch from command line

  9. JChem family

  10. JChem development history 2000 2001 2002 2003 2004 Oracle, MySQL, SQLServer, Access, hashed fingerprints,substructure and similarity search Clustering, diversity DB2, PostgreSQL, Rgroup searching Reaction searching, fragmentation,reaction processing, standardization, pharmacophores, screening Cartridge, enhanced stereo searching, recursiveSMARTS,Chemical Terms, virtual synthesis 2005 2006 2008 2007 Position variation queries, Instant JChem: • - Federated search, • - Cartridge support... • JChem for Excel Calculated columns, Installer, Tautomer Duplicate filtering, Query tables,Markush tables, Speed enhancementsforJChem Cartridge, form design, relational data for Instant JChem ... R-decomposition,R-enumeration, reaction library, custom fingerprints, random synthesis, link nodes… Tautomer search,Instant JChem reaction similarity, LibraryMCS, GUI for Standardizer/ Reactor …

  11. DB2 Structural Search JChem Base JChem Base Features • Fast and sophisticated searching(chemical and non-chemical data, Chemical Terms filter, many options) • Custom standardization • Calculated columns • Combinatorial Markush structure tables Interfaces • Integration with most relational database engines • JChem Cartridge for tight Oracle SQL integration • JSP integration – open source web example • Desktop-ready through Instant JChem

  12. Searching in combinatorial Markush structures • Combinatorial Markush structure registration and search • Markush features handled insearch & enumeration: • R-groups (nesting to any depth) • Atom lists, bond lists • Position variation bond • Link nodes • Compatible Markush enumeration plugin • Not all query features supported • Detailed description: •

  13. Access JChem functionality via SQL functions All search features of JChem Base JChem index for chemical data in arbitrary database structure Chemical filters and property predictors using Chemical Terms Standardization (structure canonicalization) during registration Structure format conversions 2D, 3D image generation Library enumeration using virtual reactions and Markush structures JChem Cartridge JChem Cartridge for Oracle

  14. Instant JChem Instant JChem: Desktop application for local and remote chemical database management, search and structure based prediction • Simply connect to external databases and share your native database simultaneously • Powerful search functionalities • Scalable – explore large datasets (106 +) • Dynamically predict properties using Calculator Plugins • Apply canonicalization rules for import and viewing • Wide import / export options • Merge data sets into a single set • Very active development – what do you want to do?

  15. JChem for Excel Microsoft Excel integrated solution for Marvin and JChem functionality Use Excel’s powerful features: Functions, Sorting, Filtering, Charts… Implemented in C# .NET, and Visual Studio Proof that ChemAxon APIs can be used in a Java-less .NET environment Easy to install and deploy UNDER DEVELOPMENT

  16. Standardizer Canonicalization with Standardizer • Structure canonicalization • Mesomers • Tautomers • Solvent and counter ion removal • Aromatization, dearomatization • Explicit/implicit hydrogen conversion • Stoichiometry expansion • Stereo manipulations • 2D cleaning • Template based cleaning • Custom rules • Availability • JChemBase, Cartridge & IJC • API (Java and .NET) • Batch processing • GUI

  17. Drug discovery tools

  18. Migration - Topics ChemAxon - Product Overview From Isis/Host and MDL Direct to JChem Cartridge Alternatives to Cheshire (Standardizer) From ISIS/Base to Instant JChem From ISIS for Excel To JChem for Excel Migrating Custom Applications ChemAxon Web Services Appendix: ChemAxon for Developers (Resources)

  19. Contents • A short introduction of JChem Cartridge • MDL/Symyx features in JChem • Migration from MDL/Direct and ISIS/Host • Migration case studies and user feedback

  20. Purpose of JChem Cartridge • Access JChem functionality using SQL: • SELECT count(*) FROM nci WHERE jc_contains(structure, 'Brc1cnc2ccccc12') = 1 • Access JChem in any programming environment offering Oracle connectivity (.NET, Java, Perl, PHP, Python, Apache mod_plsql...)‏ • Execute SQL queries efficiently using extensible indexes • Precompute chemical information on structures by creating jc_idxtype indexes: • CREATE INDEX jcxnci ON nci(structure) INDEXTYPE IS jc_idxtype • The jc_idxtype implementation scans the indexed column for eligible structures in one single performance-optimized operation: domain index scan

  21. Features of JChem Cartridge • Adds chemistry knowledge into the SQL language of Oracle (SELECT, INSERT, UPDATE, ...)‏ • Substructure, superstructure, exact structure, similarity searching • Fast: typically 10k hits in 3M structures within a second • Complex chemical expressions using the Chemical Terms language that includes logP, pKa, ... • Automatic property calculation during registration • Standardization (canonicalization) during registration • Structure format conversions (MRV, Molfile, SDfile, RDfile, SMILES, CML, etc.)‏ • 2D, 3D image generation • Structure enumeration using reaction rules • Interaction with Oracle optimizer

  22. Operators and functions • jc_compare: substructure/similarity/exact searching combined with Chemical Terms expressions • jc_matchcount: number of occurences of the query structure in the target • jc_evaluate: Chemical Terms evaluation • jc_molweight: molecular weight • jc_formula: molecular formula • jc_react: structure enumeration based on virtual reactions • jc_standardize: structure canonization • jc_molconvert: conversion to different formats (image generation is supported) • jc_tanimoto: similarity search • jcf.hitColorAndAlign: substructure coloring and alignment Similarity search example displaying ID, SMILES code, and molweight: SELECT cd_id, cd_smiles, cd_molweight FROM my_structuresWHERE jc_tanimoto(cd_smiles, 'CC(=O)Oc1ccccc1C(O)=O') >= 0.8;

  23. Structure search features • Wide range of query atoms • Query properties • R-group queries • Full SMARTS support • Coordination compounds • Link nodes • Position variation • Pseudo atoms • Lone pairs • Relative stereo • Reaction search features • Hit coloring See detailed information on structure search:

  24. Search options • Chemical Terms filter constraint • Tautomer search • sp hybridization state check • Stereo on/off • Ignore charge/isotope/radical/valence/mixture brackets • Vague bond matching modes: „or aromatic”; ignore bond types • Inverse hit list • Maximum search time / numberof hits • SQL SELECT statement for pre-filtering • Ordering of results • etc.

  25. Compatibility and integration File formats: • SMILES • MDL molfile (v2000 and v3000) • MDL SDF • RXN • RDF • MRV • IUPAC name, InChI Operating systems: • Windows • Linux • Solaris • HP-UX • etc. DB engines: Oracle versions 9i R2 or above for alternative RDBMS systems, see the JChem Base presentation:

  26. Index parameters • Index parameters affect: • Fingerprint attributes • Standardizer configuration • Table space and storage options of the index table • Examples: • Standardization by stripping hydrogens and using basic aromatization: • CREATE INDEX jcxnci ON nci(structure) INDEXTYPE IS jc_idxtype PARAMETERS('STD_CONFIG=dehydrogenize:optional..aromatize:b')‏ • Add structural keys to fingerprint for more efficient substructure searching (structural keys are defined in table stfp_keys): • CREATE INDEX jcxnci ON nci(structure) INDEXTYPE IS jc_idxtype PARAMETERS('STRUCTURALFP_CONFIG=select structure from stfp_keys')‏

  27. Supported Column Types • VARCHAR2: typically for short formats, e.g. SMILES • CLOB • BLOB for longer formats, e.g. MDL molfile, Marvin (mrv)

  28. MDL Feature Compatibility • Generic atom types and bond types • Atom query properties • Atom Lists/ not lists • Aliases • Pseudo atoms • Atom values • Group and brackets • Abbreviated groups • Multiple groups • Repeating units • Polimers • Mixtures • Attached Data • Link Nodes • R-groups, R-logic • Stereochemistry • Chiral flag • Parity • Double bond stereo • Enhanced stereo (abs/end/or) • inv/ret • Reacting center on bonds • Reaction mapping • Topology (ring/chain) • Option for ISIS-like look • Others… The learning curve of chemists familiar with ISIS is very short. After having some practice, Marvin is reported a more productive drawing environment. The most of the MDL features are available in Marvin and JChem, and many others not available in MDL technology.

  29. MDL Feature Compatibility • Polymer search (coming in 5.2) • Attached data S-group search (coming in 5.2) • 3D special features • Exact change flag (reaction) What is missing:

  30. Migration from MDL/Direct cartridge • 2 alternatives: • JChem indexes need to be created on structure columns of existing tables, or • Structural data migrated to new tables with JChem Cartridge indexes • The MDL/Direct SQL operators need to be changed to JChem operators in all uses. • Non-chemistry tables: no need for migration

  31. Migration from ISIS/Host • Molecule source need to be accessible for JChem: • Through exporting SD/Rdfiles from ISIS and importing into new tables with JChem index, or • Setting option in ISIS/Host to include molfile in RCG tables, and: • Use SQL to insert mol field into JChem tables, or • Add JChem index on original tables • ISIS/Host interfaces need to be rewritten to use SQL only, referencing JChem operators. • Hviews and GUI-s need to be replaced separately. (See further slides later.) • Non-chemistry tables: no need for migration

  32. An Independent Comparison FMC migrated from MDL® ISIS/Base ISIS/Host to ChemAxon’s JChem. They later published their detailed scientific comparison. • Used 1.8 million vendor compounds to create a testing database • Prepared 115 different query structures for comparison • 51 simple sub-structure search • 51 similarity search • 64 complex search Identical search hits in almost all cases, major differences result from MDL’s incorrect aromatic bond definitions in case of 5 member aromatic rings. ChemAxon's approach is the chemically correct and their performance is higher (faster).

  33. Identical Results

  34. Differences

  35. Vague Bonds For the sake of perfect compatibility with MDL searching ChemAxon provides vague bond options to retrieve results according to MDL systems.

  36. Technical Comparison • Supported Platforms • ISIS®: Sun Solaris, Windows Servers • JChem: Sun Solaris, Windows Servers, Linux, Irix, MAC • Supported Databases • ISIS: Oracle • JChem: Oracle, MySQL, SQL Server, PostgreSQL, Access, DB2 • Processing SD Files • ISIS: 31 hours, Pipeline Pilot & ISIS • JChem: 11 hours, JChem • Technology Transparency • ISIS®: Unclear Data/Table Structures • JChem: Clear Understanding of • Flow of Data • Structure of Data • Execution Process Native Oracle Tables and Procedures • Performance • ISIS®: Slow similarity search • JChem: Fast similarity search

  37. Comparison Conclusions • Technical Conclusion • Clear and straightforward understanding of data representation and system architecture • Integrated system • Quicker and less error-prone • Less hassle for software development From technical point of view, ChemAxon is favorable • Business Conclusion ChemAxon was the better choice

  38. Migration Experience Questionnaire Five companies were interviewed about their JChem Cartridge migration experiences in the form of a questionnaire containing 14 questions. • A UK based service/biotech company • A Swedish biotech company • A US branch of a Swiss pharmaceutical company • A Japanese pharmaceutical company • A US branch of a Japanese pharmaceutical company

  39. Migration Experience 1. What was the platform you used before the migration? • All systems were run using the Daycart cartridge on Linux servers • MDL Cartridge running on Sun Solaris • Daycart • We used ISIS/Host as a server, the client was ISIS/Base customized using ISIS/PL • Daylight and IDBS Chembridge 2. How long did it take to migrate? • Very simple, hardly any time at all, just a few hours to uninstall old cartridge, install new cartridge and build indexes. Then modify a few SQL statements in the code to use the new cartridge functions. • It took a full weekend to switch over and convert all old databases. • Since we use SQL for structure searches, the actually change in the application code are few. Code changes takes about 1 day. However, we spent at least two weeks to compare the daylight and jcart. • It took 1 year for planning, and another 1 year for designing and developing the system. 1-year-migration time includes all of the operation that is needed. That means our technical people worked for this project 1 year. We migrated the data structure of HView, but the form was re-designed in order to fit our existing (wet) workflow. • Two months

  40. Migration Experience 3. How many technical people were required in the migration process? • It was fairly simple so just one developer with all round programming, database, and chemistry knowledge. • One person • 2 people • 6 technical people. 2 were contacting with users. For the system design, 11 users were involved from chemistry, HTS, eADME groups. • 1.5 4. Why did you decide on leaving the previous platform? (problems) • Purely the cost. We found the Daycart system to be very good, very stable, fast, and the API was well thought out. However, it was just too expensive for us. • Old technology not offering new functionality. High cost, in particular for new licenses. • Daycart (at least at that time) did not take MOL query, not all query structures could be correctly presented as smiles/smarts. • Two main reasons were the maintainance cost, and the accessibility. We had to suppress the raising system (software) cost, and at the same time we had to enlarge the number of users and client PCs from which we could use DB system. • Cost, maintenance and risk

  41. Migration Experience 5. What alternative platforms were considered/evaluated? • Prior to selecting ChemAxon we looked at all the cartridges available at the time • The Accord cartridge was also evaluated. Some others did not qualify for evaluation. • None • Accord (Accelrys), and ChemOffice (Cambridge Soft) were two major alternatives. • Symyx/MDL Direct Oracle cartridge 6. Why did you choose ChemAxon technology? (advantages) • Cost was a major factor, but also because we felt we could work with ChemAxon to develop the tools further as we wanted to use them. A very open approach. Another reason was that all the tools we needed were available from a single vendor, i.e. Oracle cartridge for searching, and sketching and viewing tools. • Almost as good as Accord but with better impact on improvement and support. • Marvin Sketch and JCart represent the molecules in MOL using exactly the same backend library. MOL is used instead of smiles/smarts. Much faster search. Price is good . • We could keep the cost lowest by using ChemAxon, and more than that, the affinity for the web technology was favorable to our future vision of the cheminformatics system. • The greatest advantage is the low cost and great support. We have always had MDL/Direct cartridge, but the greatest advantage is the low cost and stellar support speaks specifically to ChemAxon.

  42. Migration Experience 7. What were the most problematic issues occurred during the migration? (negative impressions) • Understanding the finer points of all the search functions / options i.e. precisely how things like aromaticity, stereochemistry, etc. are handled. We've also had to spend time considering how to restandardise structures and how to rewrite SQL. When doing a straight forward structure search (i.e. benchmarking), the JChem cartridge performs very well against other systems such as Daylight, however, if you want to incorporate joins between tables can considerably affect the query times even when using what we call ChemAxon SQL. • Structure matching bugs in the cartridge and undocumented actions needed to be performed. • JCart installation was not so smooth 3 years ago. Much better now. Most of the problem and issues are because some structures are interpreted different between the two software. Some are Daylight bugs and others are jchem bugs. JChem has fix all their share. • There were little problem, what I remember is that the response was slower than expected when the chemical object was included in the page. • Identifying all the integration points.

  43. Migration Experience 8. How could you overcome in these difficulties? (resolutions) • We spent a lot of time experimenting with the different functions/options so we completely understand what they do. • The structure search bugs was overcome by rewriting the registration procedures, undocumented actions were overcome by hard work. • Wait until major bugs in JChem are fixed. We live with about 0.01% of inconsistencies and work it out later. • The needless chemical objects were replaced by pictures. • Availability and quick turn around to patch any 9. Did you expect any other problem, that did not occur? (positive impressions) • We though there may be problems running two different cartridges on the same table but this worked fine • Not really. Most MDL features were available in JChem. This was one of the selection criteria, particularly important for chemical registration. • No • We expected that the transfer of the existing data might be problematic, and that the system change might be inconsistent with existing 'wet' workflow. That was why we organized 11 users as a system designing team, and I think the team worked well. • Migration went very smooth

  44. Migration Experience 10. What additional components were purchased together with the JChem Cartridge? • Most of them! • Descriptor calculations. • None. User probably should consider plug-ins for calculating HBD, HBA, logp, psa, etc. We did not because we need to stick to CLOGP in order to be consistent with the rest of the company. • Standardizer. • Standardizer. 11. How much technical support did you need from ChemAxon for the migration? • Initially quite a lot, though the products have been developed a lot since then. We haven't required much support for structure migration, but we've also migrated a load of SMIRKS and we've needed support for that mainly because of the way in which they were handled in the old system (non-standard). • A few needed support cases where filed on the support forum and fairly quickly resolved. • Lots, we had close communication with dev team during the migration. • Our technical people sent e-mail several times to your support team. • Little.

  45. Migration Experience 12. Were/Are you satisfied with the ChemAxon support? • Yes. Support has always been good. • Yes very satisfied. The support has always been very fast and accurate. • Yes. • Yes. • Yes. 13. Did the migration reach its original goals? • So far, yes! The systems are up and running. • Yes. • Yes. • Yes. • Yes.

  46. Migration Experience 14. Are you satisfied with the performance/functions of the ChemAxon powered system? • The number of functions available and flexibility of the JChem tools is excellent, and allows us to develop very interesting and useful drug discovery software for our scientists. • Yes. • Yes. • Yes. • Yes.

  47. Useful migration resources ChemAxon's Marvin & JChem (v 3.1.3) vs. MDL® ISIS/Draw ISIS/Host (v 4.0)Seong Jae Yu, David Roush*, Usha Ganesh, Young Moon, Henry Liu, FMC Corp. User Group Meeting presentations:

  48. Migration - Topics ChemAxon - Product Overview From Isis/Host and MDL Direct to JChem Cartridge Alternatives to Cheshire (Standardizer) From ISIS/Base to Instant JChem From ISIS for Excel To JChem for Excel Migrating Custom Applications ChemAxon Web Services Appendix: ChemAxon for Developers (Resources)

  49. Cheshire Alternatives from ChemAxon What is Cheshire? “Cheshire is a scripting language that enables you to write scripts to validate, modify, or gather information about chemical structures, such as molecules and reactions.” What alternatives can ChemAxon offer? • ChemAxon’s Java API (also available from .NET) • Chemical Terms • Standardizer

  50. Java API for Cheminformatics from ChemAxon ChemAxon’s class library consists of more than 1500 chemistry related classes tuned for usability and high performance.