1 / 1

Chemicals are structured as graphs. Vertices and edges correspond to atoms and bonds.

Kernel Functions for Chemical Classification Aaron Smalter, Jun Huan, Gerald Lushington {asmalter,jhuan,glushington}@ku.edu. Chemical and Graph Classification. Support Vector Machine. Chemicals are structured as graphs. Vertices and edges correspond to atoms and bonds. Labeled, undirected.

alaire
Download Presentation

Chemicals are structured as graphs. Vertices and edges correspond to atoms and bonds.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kernel Functions for Chemical ClassificationAaron Smalter, Jun Huan, Gerald Lushington{asmalter,jhuan,glushington}@ku.edu Chemical and Graph Classification Support Vector Machine • Chemicals are structured as graphs. • Vertices and edges correspond to atoms and bonds. • Labeled, undirected. • Graph classification is critical for drug development and screening. • Sifting through large databases of compounds requires efficiency. • Costs of chemical manufacture and assay experiments necessitate accuracy. • Traditional chemical classifiers use vector representations of chemicals, neglecting the rich structure of graph models. • SVM is a fast, accurate classifier designed for vector data. • Crucially, SVM internally represents data points as inner products between pair of input vectors. • SVM can then linearly classify non-linear data distributions by applying the kernel trick, and replacing the inner product <x,y> with some similarity measurement function, K(x,y)‏ • The key is that this kernel function K can be defined on non-vector data, allowing direct operation on structured data such as graphs. Figure 1. Using graphs to model chemicals. Figure 2. A kernel function maps nonlinear data (left) into a linearly separable space (right). Graph Kernel Functions Our Work • Problem of chemical graph classification changes: • fromfinding vector representations of graphs, to defining high-quality kernel functions to compare graphs. • Previous kernel functions - • Decompose graphs into substructures such as paths, cycles, and trees. • Optimally assign vertices based on neighborhood similarity. • Respective limitations are: • Dependency on particular decompositions; pattern enumeration time. • Inefficient recursive comparison and a flaw rendering them not true kernel functions. • We can improve graph kernels with several ideas: • Embed frequent patterns by using their occurrences as features in the graph. [1] • Use wavelet functions to compress neighborhood information.[2] • Avoid finding an optimal assignment by using setmatching and summing the kernels between all vertex pairs. Figure 4. Frequent patterns annotate graph vertices. Figure 3. Finding an optimal assignment using a bipartite graph. Figure 5. A wavelet function overlays a chemical graph. Fig 6. Comparing graph kernels, our GPM method performs best overall. This work supported by K-INBRE (NIH/NCRR award #P20 RR016475), the KU CMLD (NIH/NIGM award #P50 GM069663), and NIH grant #R01 GM868665. [1] A. Smalter, J. Huan, G. Lushington. Chemical Compound Classification with Automatically Mined Structure Patterns. Proc. of the 6th Asia Pacific Bioinformatics Conference (APBC). 2008. [2] A. Smalter, J. Huan, G. Lushington. Graph Wavelet Alignment Kernels for Drug Virtual Screening. Proc. of the 7th Annual Int. Conf. On Computational Systems Bioinformatics. 2008.

More Related