Marginalized kernels graph kernels
This presentation is the property of its rightful owner.
Sponsored Links
1 / 65

Marginalized Kernels & Graph Kernels PowerPoint PPT Presentation


  • 137 Views
  • Uploaded on
  • Presentation posted in: General

Marginalized Kernels & Graph Kernels. Max Planck Institute for Biological Cybernetics Koji Tsuda. Kernels and Learning. In Kernel-based learning algorithms, problem solving is now decoupled into: A general purpose learning algorithm (e.g. SVM, PCA, … ) – Often linear algorithm

Download Presentation

Marginalized Kernels & Graph Kernels

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Marginalized kernels graph kernels

Marginalized Kernels & Graph Kernels

Max Planck Institute for Biological Cybernetics

Koji Tsuda


Kernels and learning

Kernels and Learning

  • In Kernel-based learning algorithms, problem solving is now decoupled into:

    • A general purpose learning algorithm (e.g. SVM, PCA, …) – Often linear algorithm

    • A problem specific kernel

Simple (linear) learning algorithm

Complex Learning Task

Specific Kernel function


Current synthesis

Current Synthesis

  • Modularity and re-usability

    • Same kernel ,different learning algorithms

    • Different kernels, same learning algorithms

Data 1 (Sequence)

Learning Algo 1

Kernel 1

Gram Matrix

(not necessarily stored)

Data 2 (Network)

Learning Algo 2

Kernel 2

Gram Matrix


Lectures so far

Lectures so far

  • Kernel represents the similarity between two objects, defined as the dot-product in thefeature space

  • Various String Kernels

  • Importance of Positive Definiteness


Kernel methods the mapping

Kernel Methods : the mapping

f

f

f

Original Space

Feature (Vector) Space


Overview of this lecture

Overview of this lecture

  • Marginalized kernels

    • General idea about defining kernels using latent variables

    • An example in string kernel

  • Marginalized Graph Kernels

    • Kernel for labeled graphs (~ several hundred nodes)

    • Similarity for chemical compounds (drug discovery)

  • Diffusion Kernels

    • Closeness between nodes of a network

    • Used for function prediction of proteins based on biological networks (protein-protein interaction nets)


Marginalized kernels

Marginalized kernels

K. Tsuda, T. Kin, and K. Asai.

Marginalized kernels for biological sequences

Bioinformatics, 18(Suppl. 1):S268-S275, 2002.


Biological sequences classification tasks

Biological Sequences:Classification Tasks

  • DNA sequences (A,C,G,T)

    • Gene Finding, Splice Sites

  • RNA sequences (A,C,G,U)

    • MicroRNA discovery, Classification into Rfam families

  • Amino Acid Sequences (20 symbols)

    • Remote Homolog Detection, Fold recognition


Structures hidden in sequences i

Structures hidden in sequences (I)

  • Exon/intron of DNA (Gene)


Structures hidden in sequences ii

Structures hidden in sequences (II)

  • It is crucial to infer hidden structures and exploit them for classification

RNA

Secondary

Structure

Protein

3D Structures


Hidden markov models

Hidden Markov Models

  • Visible Variable : Symbol Sequence

  • Hidden Variable : Context

  • HMM has parameters

    • Transition Probability

    • Emission Probability

  • HMM models the joint probability


Hmm for gene finding

HMM for gene finding

Engineered HMM:

Some parameters are set to constants a priori

Reflect prior knowledge about the sequence


Training hidden markov models

Training Hidden Markov Models

  • Training examples consist of string-context pairs

    • E.g., Fragments of DNA sequences with known splice sites

  • Parameters are estimated by the maximizing likelihood


Using trained hidden markov models to estimate the context

Using trained hidden Markov models to estimate the context

  • A trained HMM can compute the posterior probability

  • Given the sequence x, what is the probability of the context h?

  • You can never predict the context perfectly!

x: A C C T G T A A A

0.0003

h: 1 2 1 2 2 2 2 1 1

0.0006

h: 2 2 1 1 1 1 2 1 1


Kernels for sequences

Kernels for Sequences

  • Similarity between sequences of different lengths

  • How do you use the trained HMM for computing the kernel?

ACGGTTCAA

ATATCGCGGGAA


Count kernel

Count Kernel

  • Inner product between symbol counts

  • Extension: Spectrum kernels (Leslie et al., 2002)

    • Counts the number of k-mers (k-grams) efficiently

  • Not good for sequences with frequent context change

    • E.g., coding/non-coding regions in DNA


Hidden markov models for estimating context

Hidden Markov Models for Estimating Context

  • Visible Variable : Symbol Sequence

  • Hidden Variable : Context

  • HMM can estimate the posterior probability of hidden variables from data


Marginalized kernels1

Marginalized kernels

  • Design a joint kernel for combined

    • Hidden variable is not usually available

    • Take expectation with respect to the hidden variable

  • The marginalized kernel for visible variables


Designing a joint kernel for sequences

Designing a joint kernel for sequences

  • Symbols are counted separately in each context

  • :count of a combined symbol (k,l)

  • Joint kernel: count kernel with context information


Marginalization of the joint kernel

Marginalization of the joint kernel

  • Joint kernel

  • Marginalized count kernel


Computing marginalized counts from hmm

Computing Marginalized Counts from HMM

  • Marginalized count is described as

  • Posterior probability of i-th hidden variable is efficiently computed by dynamic programming


2 nd order marginalized count kernel

2nd order marginalized count kernel

  • If adjacent relations between symbols have essential meanings,the count kernel is obviously not sufficient

  • 2nd order marginalized count kernel

    • 4 neighboring symbols (i.e. 2 visible and 2 hidden) are combined and counted


Protein clustering experiment

Protein clustering experiment

  • 84 proteins containing five classes

    • gyrB proteins from five bacteria species

  • Clustering methods

    • HMM + {FK,MCK1,MCK2}+K-Means

  • Evaluation

    • Adjusted Rand Index (ARI)


Kernel matrices

Kernel Matrices


Clustering evaluation

Clustering Evaluation


Applications since then

Applications since then..

  • Marginalized Graph Kernels (Kashima et al., ICML 2003)

  • Sensor networks (Nyugen et al., ICML 2004)

  • Labeling of structured data (Kashima et al., ICML 2004)

  • Robotics (Shimosaka et al., ICRA 2005)

  • Kernels for Promoter Regions (Vert et al., NIPS 2005)

  • Web data (Zhao et al., WWW 2006)

  • Multiple Instance Learning (Kwok et al., IJCAI 2007)


Summary marginalized kernels

Summary (Marginalized Kernels)

  • General Framework for using generative model for defining kernels

  • Fisher kernel as a special case

  • Broad applications

  • Combination with CRFs and other advanced stuff?


2 marginalized graph kernels

2. Marginalized Graph Kernels

H. Kashima, K. Tsuda, and A. Inokuchi.

Marginalized kernels between labeled graphs.

ICML 2003,pages 321-328, 2003.


Motivations for graph analysis

Serial Num

Name

Age

Sex

Address

0001

○○

40

Male

Tokyo

0002

××

31

Female

Osaka

Motivations for graph analysis

  • Existing methods assume ” tables”

  • Structured data beyond this framework

    → New methods for analysis


Graphs

Graphs..


Graph structures in biology

A

C

G

C

UA

CG

CG

U

U

U

U

Graph Structures in Biology

  • Compounds

  • DNA Sequence

  • RNA

H

C

C

C

H

H

O

C

C

H

C

H

H


Marginalized graph kernels

Marginalized Graph Kernels

(Kashima, Tsuda, Inokuchi, ICML 2003)

  • Going to define the kernel function

  • Both vertex and edges are labeled


Label path

Label path

  • Sequence of vertex and edge labels

  • Generated by random walking

  • Uniform initial, transition, terminal probabilities


Path probability vector

Path-probability vector


Kernel definition

A c D b E

B c D a A

Kernel definition

  • Kernels for paths

  • Take expectation over all possible paths!

  • Marginalized kernels for graphs


Marginalized kernels graph kernels

Transition probability :

Initial and terminal : omitted

  • : Set of paths ending at v

  • KV : Kernel computed from the paths ending at (v, v’)

  • KV is written recursively

  • Kernel computed by solving

    linear equations

    (polynomial time)

A(v’)

v

v’

A(v)

Computation


Graph kernel applications

Graph Kernel Applications

  • Chemical Compounds (Mahe et al., 2005)

  • Protein 3D structures (Borgwardt et al, 2005)

  • RNA graphs (Karklin et al., 2005)

  • Pedestrian detection

  • Signal Processing


Predicting mutagenicity

Predicting Mutagenicity

  • MUTAG benchmark dataset

    • Mutation of Salmonella typhimurium

    • 125 positive data (effective for mutations)

    • 63 negative data (not effective for mutations)

Mahe et al. J. Chem. Inf. Model., 2005


Classification of protein 3d structures

Classification of Protein 3D structures

  • Graphs for protein 3D structures

    • Node: Secondary structure elements

    • Edge: Distance of two elements

  • Calculate the similarity by graph kernels

Borgwardt et al. “Protein function prediction via graph kernels”, ISMB2005


Classification of proteins accuracy

Classification of proteins: Accuracy

Borgwardt et al. “Protein function prediction via graph kernels”, ISMB2005


Pedestrian detection in images f suard et al 2005

Pedestrian detection in images (F. Suard et al., 2005)


Classifying rna graphs y karklin et al 2005

Classifying RNA graphs (Y. Karklin et al.,, 2005)


Strong points of mgk

Strong points of MGK

  • Polynomial time computation O(n^3)

  • Positive definite kernel

    • Support Vector Machines

    • Kernel PCA

    • Kernel CCA

    • And so on…


Diffusion kernels biological network analysis

Diffusion Kernels: Biological Network Analysis


Biological networks

Biological Networks

  • Protein-protein physical interaction

  • Metabolic networks

  • Gene regulatory networks

  • Network induced from sequence similarity

  • Thousands of nodes (genes/proteins)

  • 100000s of edges (interactions)


Physical interaction network

Physical Interaction Network


Physical interaction network1

Physical Interaction Network

  • Undirected graphs of proteins

  • Edge exists if two proteins physically interact

    • Docking (Key – Keyhole)

  • Interacting proteins tend to have the same biological function


Metabolic network

Metabolic Network


Metabolic network1

Oxaloacetate

Metabolic Network

  • Node: Chemical compounds

  • Edge: Enzyme catalyzing the reaction (EC Number)

  • KEGG Database (Kyoto University)

  • Collection of pathways (subnetworks)

  • Can be converted as a network of enzymes (proteins)

(S)-Malate

Fumarate

4.2.1.2

1.1.1.37


Protein function prediction

Protein Function Prediction

  • For some proteins, their functions are known

  • But still functions of many proteins are unknown


Function prediction using a network

Function Prediction Using a Network

  • Determination of protein’s function is a central goal of molecular biology

  • It has to be determined by biological experiments, but accurate computational prediction helps

  • Proteins close to each other in the networks tend to share the same functional category

  • Use the network for function prediction!

  • (Combination with other information sources)


Prediction of one functional category

Prediction of one functional category

  • +1/-1: Labeled proteins with/without a specific function

  • ?: Unlabeled proteins


Indirect connection

Indirect connection


Diffusion kernels kondor and lafferty 2002

Diffusion kernels (Kondor and Lafferty, 2002)

  • Function prediction by SVM using a network

    • Kernels are needed !

  • Define closeness of two nodes

    • Has to be positive definite

How Close?


Definition of diffusion kernel

Definition of Diffusion Kernel

  • A: Adjacency matrix,

  • D: Diagonal matrix of Degrees

  • L = D-A: Graph Laplacian Matrix

  • Diffusion kernel matrix

    • :Diffusion paramater

  • Matrix exponential, not elementwise exponential


Computation of matrix exponential

Computation of Matrix Exponential

  • Definition

  • Eigen-decomposition


Adjacency matrix and degree matrix

Adjacency Matrix and Degree Matrix


Graph laplacian matrix l

Graph Laplacian Matrix L


Actual values of diffusion kernels

Actual Values of Diffusion Kernels

Closeness from the

“central node”


Interpretation stochastic process

Interpretation: Stochastic Process

  • For each node ,consider random variable

  • Initial condition

    • Zero mean, Variance

    • Independent to each other (covariance zero).

  • Each variable sends a fraction to the neighbors


Stochastic process 2

Stochastic Process (2)

  • Time Evolution Operator

  • Covariance

  • Reduce the time step 1 to

  • Diffusion parameter

  • Taking the limit


Interpretation via random walking

Interpretation via random walking

  • Random walking according to transition probability

  • Transition probability is constant

  • Remaining probability = Self loop

  • is equal to the probability of the walk that started at i being at j after infinite time steps


Experimental results by lee et al 2006

Experimental Results by Lee et al. (2006)

  • Yeast Proteins

  • 34 functional categories

    • Decomposed into binary classification problems

  • Physical Interaction Network only

  • Methods

    • Markov Random Field

    • Kernel Logistic Regression (Diffusion Kernel)

      • Use additional knowledge of correlated functions

    • Support Vector Machine (Diffusion Kernel)

  • ROC score

    • Higher is better


Experimental results by lee et al 20061

Experimental results by Lee et al. (2006)


Concluding remarks

Concluding Remarks

  • Kernel methods have been applied to many different objects

    • Marginalized Kernels: Latent variables

    • Marginalized Graph Kernels: Graphs

    • Diffusion Kernels: Networks

  • Still active field

    • Mining and Learning with Graphs (MLG) Workshop Series

    • Journal of Machine Learning Research Special Issue on Graphs (Paper due: 10.2.2008)

  • THANK YOU VERY MUCH!!


  • Login