jklustor clustering chemical libraries presented by maintained by mikl s vargyas n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas PowerPoint Presentation
Download Presentation
JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas

Loading in 2 Seconds...

play fullscreen
1 / 25

JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas - PowerPoint PPT Presentation


  • 219 Views
  • Uploaded on

JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas . Last update: 25 March 2010. JKlustor. Chemical clustering by similarity and structure. JKlustor. Description of the product.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas' - lotus


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
jklustor clustering chemical libraries presented by maintained by mikl s vargyas
JKlustorclustering chemical librariespresented by …maintained by Miklós Vargyas

Last update: 25 March 2010

jklustor
JKlustor

Chemical clustering by similarity and structure

jklustor1
JKlustor

Description of the product

JKlustor performs similarity and structure based clustering of compound libraries and focused sets in both hierarchical and non-hierarchical fashion.

Availability

  • part of Jchem
  • IJC (parts)
  • server version (accessible via API)
  • batch application programs
  • HTML user interface
  • one desktop application with GUI
  • GUI is available as an applet
summary of key features

Summary of key features

Summary of key features
  • Wide range of methods
    • Unsupervised, agglomerative clustering
    • Hierarchical and non-hierarchical methods
    • Similarity based and structure based techniques
  • Flexible search options
    • Tanimoto and Euclidean metrics, weighting
    • Maximum common substructure identification
    • chemical property matching including atom type, bond type, hybridization, charge
  • Interactive display
    • interactive hierarchy browser (dendrogram viewer)
    • SAR-table
    • R-table
  • Efficient
    • performance of tools varies between linear and quadratic scale
benefits
Benefits
  • Versatile
    • Choose the most appropriate method to the clustering problem
    • Combine methods to achieve best results
    • Use your trusted molecular descriptors in similarity calculation
    • Easy integration in corporate discovery pipelines
    • Cluster chemical files directly no need to import structures in database
  • Intuitive
    • Cluster formation is self-explanatory
slide6

Similarity based clustering

  • Hierarchical
    • Ward
  • Non-hierarchical
    • Sphere exclusion
    • k-means
    • Jarvis-Patrick
ward clustering features
Ward Clustering Features
  • Ward's minimum variance method results in tight, well separated clusters
  • Murtagh's reciprocal nearest neighbor (RNN) algorithm to speed it up
  • quadratic scaling of running time (with respect to number of input structures)
  • memory consumption scales linearly
  • best used with smaller sets (like focused libraries), copes with < 100K structures
sphere exclusion clustering features
Sphere Exclusion Clustering Features
  • based on fingerprints and/or other numerical data
  • running time linear with respect to number of input structures
  • memory scales sub-linearly
  • can easily cope with 1Ms of structures
  • suitable for diverse subset selection
k means clustering features
k-means Clustering Features
  • based on fingerprints and/or other numerical data
  • minimises variance within each clusters
  • number of clusters can directly be controlled
  • finds the centre of natural clusters in the input data
  • running time scales exponentially with respect to number of input structures
  • can cope with <100Ks of structures
jarp clustering features
Jarp Clustering Features
  • variable-length Jarvis-Patrick clustering
  • based on fingerprints and/or other numerical data
  • takes structures/fingerprint and data values from either files or form database tables
  • running time scales better than quadratic but worse than linear (with respect to number of input structures)
  • memory scales linearly
  • Jarp can cope with 100Ks of structures
  • depending on data and parameters may create large number of singletons
ward clustering example
Ward Clustering Example
  • 8 different sets of know active compounds mixed together
    • 5-HT3-antagonists
    • ACE inhibitors
    • angiotensin 2 antagonists
    • D2 antagonists
    • delta antagonists
    • FTP antagonists
    • mGluR1 antagonists
    • thrombin inhibitors
  • ChemAxon’s 2D Pharmacophore fingerprint was generated
  • Fingerprints of the mixture were clustered by Ward
    • 9 clusters were formed
      • 8 centroids (cluster representative element) corresponded to the 8 activity classes
      • 1 was a singleton
    • All 8 real clusters contained structures only from the activity class of the centroid (over 95% true positive classification)
ward clustering example2
Ward Clustering Example

Cluster of the D2 antagonists

slide14

Structure based clustering

  • Non-hierarchical
    • Bemis-Mucko frameworks
  • Hierarchical
    • LibraryMCS
bemis murcko frameworks features
Bemis-Murcko frameworks features
  • based on structure of molecules
  • cluster formation is apparent, visual, meets human expectations
  • running time linear with respect to number of input structures
  • memory scales sub-linearly
  • can easily cope with 1Ms of structures
  • suitable for quick overview of very large sets
  • spots scaffold hops
slide18

LibraryMCS

Identifies the largest subgraph shared by several molecular structures

librarymcs features
LibraryMCS features
  • based on structure of molecules
  • cluster formation is apparent, visual, meets human expectations
  • running time near-linear with respect to number of input structures
  • can cope with 100K-200K of structures
  • suitable for very thorough analysis
  • spots scaffold hops
  • substituent-activity (property analysis)
librarymcs integration at abbott
LibraryMCS integration at Abbott

“Clustering for the masses…”,

presented by Derek Debe at ChemAxon’s US UGM, Boston, 2008

jklustor roadmap
Jklustor roadmap
  • In the development pipeline
    • Bemis-Murcko generalisations
    • IJC integration
    • KNIME integartion
    • New GUI
    • Manual clustering
    • Multiple class membership
    • Disconnected MCS (MOS)
  • Planned
    • PipelinePilot integration
    • Spotfire integration
    • JChemBase, JChemCartridge integration
    • JC4XLS integration
  • Blue sky
    • Multitouch gestures
    • LibraryMCS for 1M compound libraries