Similarity based classifiers problems and solutions
This presentation is the property of its rightful owner.
Sponsored Links
1 / 74

Similarity-based Classifiers: Problems and Solutions PowerPoint PPT Presentation


  • 48 Views
  • Uploaded on
  • Presentation posted in: General

Similarity-based Classifiers: Problems and Solutions. Classifying based on similarities :. Van Gogh. Monet. Van Gogh Or Monet ?. the Similarity-based Classification Problem. (paintings). (painter). the Similarity-based Classification Problem.

Download Presentation

Similarity-based Classifiers: Problems and Solutions

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Similarity based classifiers problems and solutions

Similarity-based Classifiers:Problems and Solutions


Classifying based on similarities

Classifying based on similarities:

Van Gogh

Monet

Van Gogh

Or

Monet ?


The similarity based classification problem

the Similarity-based Classification Problem

(paintings)

(painter)


The similarity based classification problem1

the Similarity-based Classification Problem


The similarity based classification problem2

the Similarity-based Classification Problem

?


Examples of similarity functions

Examples of Similarity Functions

Computational Biology

  • Smith-Waterman algorithm (Smith & Waterman, 1981)

  • FASTA algorithm (Lipman & Pearson, 1985)

  • BLAST algorithm (Altschul et al., 1990)

    Computer Vision

  • Tangent distance (Duda et al., 2001)

  • Earth mover’s distance (Rubner et al., 2000)

  • Shape matching distance (Belongie et al., 2002)

  • Pyramid match kernel (Grauman & Darrell, 2007)

    Information Retrieval

  • Levenshtein distance (Levenshtein, 1966)

  • Cosine similarity between tf-idf vectors (Manning & Schütze, 1999)


Approaches to similarity based classification

Approaches to Similarity-based Classification


Approaches to similarity based classification1

Approaches to Similarity-based Classification


Can we treat similarities as kernels

Can we treat similarities as kernels?


Can we treat similarities as kernels1

Can we treat similarities as kernels?


Can we treat similarities as kernels2

Can we treat similarities as kernels?


Example amazon similarity

Example: Amazon similarity

96 books

96 books


Example amazon similarity1

Example: Amazon similarity

96 books

96 books


Example amazon similarity2

Example: Amazon similarity

96 books

Eigenvalues

Rank

96 books


Well let s just make s be a kernel matrix

Well, let’s just make S be a kernel matrix

0

0


Well let s just make s be a kernel matrix1

Well, let’s just make S be a kernel matrix

0

0


Well let s just make s be a kernel matrix2

Well, let’s just make S be a kernel matrix

0

0


Well let s just make s be a kernel matrix3

Well, let’s just make S be a kernel matrix

Flip, Clip or Shift?

Best bet is Clip.

0

0


Well let s just make s be a kernel matrix4

Well, let’s just make S be a kernel matrix

Learn the best kernel matrix for the SVM:

(Luss NIPS 2007, Chen et al. ICML 2009)


Approaches to similarity based classification2

Approaches to Similarity-based Classification


Let the similarities to the training samples be features

Let the similarities to the training samples be features

  • SVM (Graepel et al., 1998; Liao & Noble, 2003)

  • Linear programming (LP) machine (Graepel et al., 1999)

  • Linear discriminant analysis (LDA) (Pekalska et al., 2001)

  • Quadratic discriminant analysis (QDA) (Pekalska & Duin, 2002)

  • Potential support vector machine (P-SVM) (Hochreiter & Obermayer, 2006; Knebel et al., 2008)


Approaches to similarity based classification3

Approaches to Similarity-based Classification


Weighted nearest neighbors

Weighted Nearest-Neighbors

Take a weighted vote of the k-nearest-neighbors:

Algorithmic parallel of the exemplar model of human learning.

?


Weighted nearest neighbors1

Weighted Nearest-Neighbors

Take a weighted vote of the k-nearest-neighbors:

Algorithmic parallel of the exemplar model of human learning.


Design goals for the weights

Design Goals for the Weights

?


Design goals for the weights1

Design Goals for the Weights

?

Design Goal 1 (Affinity):wi should be an increasing function of ψ(x, xi).


Design goals for the weights2

Design Goals for the Weights

?


Design goals for the weights chen et al jmlr 2009

Design Goals for the Weights (Chen et al. JMLR 2009)

?

Design Goal 2 (Diversity):wi should be a decreasing function of ψ(xi, xj).


Linear interpolation weights

Linear Interpolation Weights

Linear interpolation weights will meet these goals:


Linear interpolation weights1

Linear Interpolation Weights

Linear interpolation weights will meet these goals:


Lime weights

LIME weights

Linear interpolation weights will meet these goals:

Linear interpolation with maximum entropy (LIME) weights (Gupta et al., IEEE PAMI 2006):


Lime weights1

LIME weights

Linear interpolation weights will meet these goals:

Linear interpolation with maximum entropy (LIME) weights (Gupta et al., IEEE PAMI 2006):


Lime weights2

LIME weights

Linear interpolation weights will meet these goals:

Linear interpolation with maximum entropy (LIME) weights (Gupta et al., IEEE PAMI 2006):


Lime weights3

LIME weights

Linear interpolation weights will meet these goals:

Linear interpolation with maximum entropy (LIME) weights (Gupta et al., IEEE PAMI 2006):


Kernelize linear interpolation chen et al jmlr 2009

Kernelize Linear Interpolation (Chen et al. JMLR 2009)


Kernelize linear interpolation

Kernelize Linear Interpolation

regularizes the variance of the weights


Kernelize linear interpolation1

Kernelize Linear Interpolation

only need inner products – can replace with kernel or similarities!


Kri weights satisfy design goals

KRI Weights Satisfy Design Goals

Kernel ridge interpolation (KRI) weights:


Kri weights satisfy design goals1

KRI Weights Satisfy Design Goals

Kernel ridge interpolation (KRI) weights:

affinity:


Kri weights satisfy design goals2

KRI Weights Satisfy Design Goals

Kernel ridge interpolation (KRI) weights:

diversity:


Kri weights satisfy design goals3

KRI Weights Satisfy Design Goals

Kernel ridge interpolation (KRI) weights:


Kri weights satisfy design goals4

KRI Weights Satisfy Design Goals

Kernel ridge interpolation (KRI) weights:

Remove the constraints on the weights:

Can show equivalent to local ridge regression:

KRR weights.


Weighted k nn example 1

Weighted k-NN: Example 1

KRI weights

KRR weights


Weighted k nn example 2

Weighted k-NN: Example 2

KRI weights

KRR weights


Weighted k nn example 3

Weighted k-NN: Example 3

KRI weights

KRR weights


Approaches to similarity based classification4

Approaches to Similarity-based Classification


Generative classifiers

Generative Classifiers


Generative classifiers1

Generative Classifiers


Similarity discriminant analysis cazzanti and gupta icml 2007 2008 2009

Similarity Discriminant Analysis (Cazzanti and Gupta, ICML 2007, 2008, 2009)


Similarity discriminant analysis cazzanti and gupta icml 2007 2008 20091

Similarity Discriminant Analysis (Cazzanti and Gupta, ICML 2007, 2008, 2009)

Reg. Local SDA

Performance:

Competitive


Some conclusions

Some Conclusions

Performance depends heavily on oddities of each dataset

Weighted k-NN with affinity-diversity weights work well.

Preliminary: Reg. Local SDA works well.

Probabilities useful .

Local models useful

- less approximating

- hard to model entire space, underlying manifold?

- always feasible


Some conclusions1

Some Conclusions

Performance depends heavily on oddities of each dataset

Weighted k-NN with affinity-diversity weights work well.

Preliminary: Reg. Local SDA works well.

Probabilities useful .

Local models useful

- less approximating

- hard to model entire space, underlying manifold?

- always feasible


Some conclusions2

Some Conclusions

Performance depends heavily on oddities of each dataset

Weighted k-NN with affinity-diversity weights work well.

Preliminary: Reg. Local SDA works well.

Probabilities useful .

Local models useful

- less approximating

- hard to model entire space, underlying manifold?

- always feasible


Some conclusions3

Some Conclusions

Performance depends heavily on oddities of each dataset

Weighted k-NN with affinity-diversity weights work well.

Preliminary: Reg. Local SDA works well.

Probabilities useful .

Local models useful

- less approximating

- hard to model entire space, underlying manifold?

- always feasible


Some conclusions4

Some Conclusions

Performance depends heavily on oddities of each dataset

Weighted k-NN with affinity-diversity weights work well.

Preliminary: Reg. Local SDA works well.

Probabilities useful .

Local models useful

- less approximating

- hard to model entire space, underlying manifold?

- always feasible


Lots of open questions

Lots of Open Questions

Making S PSD.

Fast k-NN search for similarities

Similarity-based regression

Relationship with learning on graphs

Try it out on real data

Fusion with Euclidean features (see our FUSION 2009 papers)

Open theoretical questions (Chen et al. JMLR 2009, Balcan et al. ML 2008)


Similarity based classifiers problems and solutions

Code/Data/Papers: idl.ee.washington.edu/similaritylearningSimilarity-based Classification by Chen et al., JMLR 2009


Training and test consistency

Training and Test Consistency

For a test sample x, given , shall we classify x as

No! If a training sample was used as a test sample, could change its class!


Data sets

Data Sets

Amazon

Aural Sonar

Protein

Eigenvalue

Eigenvalue

Eigenvalue

Eigenvalue Rank

Eigenvalue Rank

Eigenvalue Rank


Data sets1

Data Sets

Voting

Yeast-5-7

Yeast-5-12

Eigenvalue

Eigenvalue

Eigenvalue

Eigenvalue Rank

Eigenvalue Rank

Eigenvalue Rank


Svm review

SVM Review

Empirical risk minimization (ERM) with regularization:

Hinge loss:

SVM Primal:


Learning the kernel matrix

Learning the Kernel Matrix

Find for classification the best K regularized toward S:

SVM that learns the full kernel matrix:


Related work

Related Work

SVM Dual:

Robust SVM (Luss & d’Aspremont, 2007):

“This can be interpreted as a worst-case robust classification problem with bounded uncertainty on the kernel matrix K.”


Related work1

Related Work

Let

Rewrite the robust SVM as

Theorem (Sion, 1958)

Let M and N be convex spaces one of which is compact, and f(μ,ν) a function on M N, which is quasiconcave in M, quasiconvex in N, upper semi-continuous in μ for each ν N, and lower semi-continuous in ν for each μ M, then


Related work2

Related Work

Let

Rewrite the robust SVM as

By Sion’sminimax theorem, the robust SVM is equivalent to:

zero duality gap

Compare


Learning the kernel matrix1

Learning the Kernel Matrix

It is not trivial to directly solve:

Lemma (Generalized Schur Complement)

Let , and . Then

if and only if , z is in the range of K, and .

Let , and notice that since .


Learning the kernel matrix2

Learning the Kernel Matrix

It is not trivial to directly solve:

However, it can be expressed as a convex conic program:

  • We can recover the optimal by .


Learning the spectrum modification

Learning the Spectrum Modification

Concerns about learning the full kernel matrix:

  • Though the problem is convex, the number of variables is O(n2).

  • The flexibility of the model may lead to overfitting.


  • Login