1 / 33

Representation of Markush structures — from molecules towards patents

Representation of Markush structures — from molecules towards patents. Szabolcs Csepregi. Solutions for Cheminformatics. Contents. ChemAxon What are Markush structure s? How to get them? What can be done with them? Enumeration Storage, search Challenges in chemical representation

poss
Download Presentation

Representation of Markush structures — from molecules towards patents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Representation of Markush structures — from molecules towards patents Szabolcs Csepregi Solutions for Cheminformatics

  2. Contents • ChemAxon • What are Markush structures? • How to get them? • What can be done with them? • Enumeration • Storage, search • Challenges in chemical representation • Under development

  3. ChemAxon Cheminformatics toolkits and applications HQ: Budapest, Hungary Founded: 1998 Main customers: pharma, biotech, publishing 3rd party applications and web sites. (e.g. Integrity, Reaxis, PDB ligand search, ELN-s, registration systems, etc)

  4. ChemAxon Main products: • Structure drawing & visualization (Marvin family) • Chemical DB tools (JChem family) • Property predictions (Calculator plugins) • Drug discovery tools (Reactor, JKlustor, etc.) Development strategy: customer-driven

  5. What are Markush structures and how to get them?

  6. Markushstructures Generic notation for describing many molecules (= Markush library) in a compact form. Main usage: • Combinatorial chemistry • Chemistry-related patents

  7. Markush structures • Current features handled: • R-groups • Atom lists, bond lists • Position variation bond • Link nodes • Repeating units • Homology groups (aryl, alkyl, etc.)

  8. ChemAxon Markush project Goals: • Extend structural search capabilities to combinatorial Markush structures • Markush enumeration Complications: • Practical examples may be very complex, methods using explicit enumeration may be impossible • Extension of current molecular formats (generic features) Timeline • Pilot study started in 2005 Q4, • First prototype shown at UGM, 2006 June • Released in JChem 5.0, 2008 • Markush DARC format support 5.3.0 2010

  9. How to get Markush structures? Drawing – Marvin Sketch

  10. How to get Markush structures? Patent literature – Markush DARC format (*.vmn) Compatible with Thomson Reuters MMS patent Markush database(Test set available.)

  11. How to get Markush structures? Combinatorial chemistry – Reagent clipping • Replace reacting group with attachment point (Reactor tool) • Turn fragments to R-group definitions (Molconvert tool) • Add a scaffold (Molconvert tool)

  12. How to get Markush structures? Combinatorial chemistry – R-group decomposition • Filter and identify ligands in chemical library • Create Markush structure from R-table (R-group decomposition tool)

  13. What to do with them?

  14. Markush Enumeration • Markush enumeration plugin • Full enumeration • Selected parts only • Random enumeration • Calculate library size • Scaffold alignmentand coloring • Markush code • Optional example homology groupenumeration

  15. Markush storage & search • JChem Base and Instant JChem • No enumeration involved • Can handle complex Markush structures (1040or more) • Substructure and Full structure search • Broad translation of homology groups is supported. (Homology in DB, specific in query.)

  16. Markush storage & search Query Result in original Markush Substructure hit visualization

  17. Markush storage & search Query Result in original Markush Reduced result Substructure hit visualization: „Markush structure reduction”

  18. Main use cases Patent search hits refining / visualization, White space analysis, Patent busting, Markush structure curation, In-house storage of small Markush DB, etc...

  19. MMS evaluation Instant JChem project

  20. Challenges in chemical representation(solved)

  21. Representation - What we already had Single or double Generic notation in queries: • Atom lists, bond lists • R-group queries (Problem: RGFile R-logic and patent R-logic are different! - Solution: Just ignore R-logic.) • Link nodes • Some generic atoms (X) – represented as pseudo atoms.

  22. Challenge 1: Attachment point Attachment points for definitions Order of ligands for G15 (R15) R-group definitions Parent group(root) • Multiple – ligand order and attachment orderHeavily used in Markush DARC (up to 8 attachments!) • Represented as atom property

  23. Challenge 1: Attachment point G3’s attachment point „1” is mapped to G4’s attachment point „1” • Embedded R-groups: Grandparent relations may be needed between attachment points:

  24. Challenge 1: Attachment point Attachment points for definitions Order of ligands for R2 • Temporary representation: attached data • ligand order • attachment point in R-group definition • still an atom property • ligand order sometimes in parent group(grandparent relation)

  25. Challenge 1: Attachment point Attachment points for definitions Attachment point for R3 Order of ligands for R2 Order of ligands for R4 • Real attachment object with bond (under development) • eliminates need for grandparent relations table:

  26. Challenge 2: Abbreviations • Superatom S-groups were originally in Marvin (~700 built-in shortcuts) • Expand / Contract • Search code already handled them in specific structures. • M. DARC had 21 shortcuts + 31 peptides. • Attachment point next to abbreviations • Needed to be visible „outside” and handled correctly „inside”. • New attachment point solves this also:

  27. Challenge 3: Homology groups (generics) • Pseudoatom representation • Naming (Still looking for the most descriptive „long” names.) • Extra conditions: general atom property framework (under development)

  28. Challenge 4: Frequency variation • Link nodes • Repeating units: modified SRU • Multipliers: • special SRU, 1 outer bonds. • (Currently visualization only.) • Moieties: • special SRU, 0 outer bonds • to describe (variable) stoichiometry • (Currently visualization only.)

  29. Challenge 5: Position variation bond New special S-group type Relocatable multicenter atom represents group for bonds Also useful to represent multicenter charge and coordination compounds:

  30. What (else) keep us busy

  31. Under development • Further improvements in Markush DARC support: • Ring segment groups (XX form a ring) • New, more robust representation for attachment points • Homology properties (low alkyl, fused aryl, C1-3, N2-5, etc) • Ranking of results • New ways to navigate/zoom Markush structures • Maximum common substructure search • Biased enumeration and covering Markush – based on examples in patent. • Improve search speed to handle larger Markush sets. • Other Markush formats – Markush InChI standard committee • Overlap analysis of Markush structures • Conditions for Markush variables

  32. Summary Markush structure storage, search and enumeration at ChemAxon now patentcoverage Compatible patent data is available from Thomson Reuters Well thought out chemical representation Continuous development, improvements in the pipeline

  33. Acknowledgements Development team: Nóra Máté, Róbert Wágner, Szilárd Dóránt, Tamás Csizmazia, Tim Dudgeon, Erika Bíró, Ali Baharev, Ferenc Csizmadia, et al. Tim Miller, Steve Hajkowski, Gez Cross and Linda Clark at Thomson Reuters for useful discussions, help and example Markush DARC files Many early adopters and colleagues within the field for suggestions and feedback

More Related