Kernel Functions for Chemical Classification
Download
1 / 1

Chemicals are structured as graphs. Vertices and edges correspond to atoms and bonds. - PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on

Kernel Functions for Chemical Classification Aaron Smalter, Jun Huan, Gerald Lushington {asmalter,jhuan,[email protected] Chemical and Graph Classification. Support Vector Machine. Chemicals are structured as graphs. Vertices and edges correspond to atoms and bonds. Labeled, undirected.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Chemicals are structured as graphs. Vertices and edges correspond to atoms and bonds.' - alaire


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Kernel Functions for Chemical ClassificationAaron Smalter, Jun Huan, Gerald Lushington{asmalter,jhuan,[email protected]

Chemical and Graph Classification

Support Vector Machine

  • Chemicals are structured as graphs.

    • Vertices and edges correspond to atoms and bonds.

    • Labeled, undirected.

  • Graph classification is critical for drug development and screening.

    • Sifting through large databases of compounds requires efficiency.

    • Costs of chemical manufacture and assay experiments necessitate accuracy.

  • Traditional chemical classifiers use vector representations of chemicals, neglecting the rich structure of graph models.

  • SVM is a fast, accurate classifier designed for vector data.

  • Crucially, SVM internally represents data points as inner products between pair of input vectors.

    • SVM can then linearly classify non-linear data distributions by applying the kernel trick, and replacing the inner product <x,y> with some similarity measurement function, K(x,y)‏

    • The key is that this kernel function K can be defined on non-vector data, allowing direct operation on structured data such as graphs.

Figure 1. Using graphs to model chemicals.

Figure 2. A kernel function maps nonlinear data (left) into a linearly separable space (right).

Graph Kernel Functions

Our Work

  • Problem of chemical graph classification changes:

    • fromfinding vector representations of graphs, to defining high-quality kernel functions to compare graphs.

  • Previous kernel functions -

    • Decompose graphs into substructures such as paths, cycles, and trees.

    • Optimally assign vertices based on neighborhood similarity.

    • Respective limitations are:

      • Dependency on particular decompositions; pattern enumeration time.

      • Inefficient recursive comparison and a flaw rendering them not true kernel functions.

  • We can improve graph kernels with several ideas:

    • Embed frequent patterns by using their occurrences as features in the graph. [1]

    • Use wavelet functions to compress neighborhood information.[2]

    • Avoid finding an optimal assignment by using setmatching and summing the kernels between all vertex pairs.

Figure 4. Frequent patterns annotate graph vertices.

Figure 3. Finding an optimal assignment using a bipartite graph.

Figure 5. A wavelet function overlays a chemical graph.

Fig 6. Comparing graph kernels, our GPM method performs best overall.

This work supported by K-INBRE (NIH/NCRR award #P20 RR016475), the KU CMLD (NIH/NIGM award #P50 GM069663), and NIH grant #R01 GM868665.

[1] A. Smalter, J. Huan, G. Lushington. Chemical Compound Classification with Automatically Mined Structure Patterns. Proc. of the 6th Asia Pacific Bioinformatics Conference (APBC). 2008.

[2] A. Smalter, J. Huan, G. Lushington. Graph Wavelet Alignment Kernels for Drug Virtual Screening. Proc. of the 7th Annual Int. Conf. On Computational Systems Bioinformatics. 2008.


ad