Presented by: Peng Zhang 4/15/2011 - PowerPoint PPT Presentation

Presented by peng zhang 4 15 2011
1 / 28

  • Uploaded on
  • Presentation posted in: General

Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10 (2009) 341-376. Presented by: Peng Zhang 4/15/2011. Outline. Motivation Major Contributions Preliminaries Algorithms Discussions

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Presented by: Peng Zhang 4/15/2011

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Presented by peng zhang 4 15 2011

Low-Rank Kernel Learning with Bregman Matrix DivergencesBrian Kulis, Matyas A. Sustik and Inderjit S. DhillonJournal of Machine Learning Research 10 (2009) 341-376

Presented by:

Peng Zhang




  • Motivation

  • Major Contributions

  • Preliminaries

  • Algorithms

  • Discussions

  • Experiments

  • Conclusions



  • Low-rank matrix nearness problems

    • Learning low-rank positive semidefinite (kernel) matrices for machine learning applications

    • Divergence (distance) between data objects

    • Find suitable divergence measures to certain matrices

      • Efficiency

  • Positive semidefinite (PSD, or low rank) matrix is common in machine learning with kernel methods

    • Current learning techniques require positive semidefinite constraint, resulting in expensive computations

  • Bypass such constraint, find divergences with automatic enforcement of PSD

Major contributions

Major Contributions

  • Goal

    • Efficient algorithms that can find a PSD (kernel) matrix as ‘close’ as possible to some input PSD matrix under equality or inequality constraints

  • Proposals

    • Use LogDet divergence/von Neumann divergence constraints in PSD matrix learning

    • Use Bregman projections for the divergences

      • Computationally efficient, scaling linearly with number of data points n and quadratically with rank of input matrix

  • Properties of the proposed algorithms

    • Range-space preserving property (rank of output = rank of input)

    • Do not decrease rank

    • Computationally efficient

      • Running times are linear in number of data points and quadratic in the rank of the kernel (for one iteration)



  • Kernel methods

    • Inner products in feature space

    • Only information needed is kernel matrix K

      • K is always PSD

    • If is low rank

    • Use low rank decomposition to improve computational efficiency

Low rank kernel matrix learning


Intuitively these can be thought of as the difference between the value of F at point x and the value of the first-order Taylor expansion of F around point y evaluated at point x.


  • Bregman vector divergences

  • Extension to Bregman matrix divergences



  • Special Bregman matrix divergences

    • The von Neumann divergence (DvN)

    • The LogDet divergence (Dld)

All for full rank matrices



  • Important properties of DvN and Dld

    • X is defined over positive definite matrices

      • No explicit constrain as positive definite

    • Range-space preserving property

    • Scale-invariance of LogDet

    • Transformation invariance

    • Others

      • Beyond transductive setting, evaluate kernel function over new data points



  • Spectral Bregman matrix divergence

    • Generating convex function

      • Function of eigenvalues and convex function

  • Bregman matrix divergence by eigenvalues and eigenvectors



  • Kernel matrix learning problem of this paper

    • Non-convex

    • Convex when using LogDet/von Neumann, because rank is implicitly enforced

    • Interested in constraint as squared Euclidean distance between points

    • A is rank one, and the problem can be:

    • Learn a kernel matrix over all data points from side information (labels or constraints)



  • Bregman projections

    • A method to solve the ‘no rank constraint’ version of the previous problem

      • Choose one constraint each time

      • Perform Bragman projection so that current solution satisfies that constraint

      • Using LogDet and von Neumann divergences, projections can be computed efficiently

      • Convergence guaranteed, but may require many iterations



  • Bregman divergences for low rank matrices

    • Deal with matrices with 0 eigenvalues

      • Infinite divergences might occur because

      • These imply a rank constraint if the divergence is finite

Range …

Rank …



  • Rank deficient LogDet and von Neumann Divergences

  • Rank deficient Bregman projections

    • von Neumann:

    • LogDet:

Algorithm using logdet

Algorithm Using LogDet

  • Cyclic projection algorithm using LogDet divergence

    • Update for each projection

    • Can be simplified to

    • Range space is unchanged, no eigen-decomposition required

    • (21) costs O(n^2) operations per iteration

  • Improving update efficiency with factored n x r matrix G

    • This update can be done using Cholesky rank-one update

    • O(r^3) complexity

  • Further improve update efficiency to O(r^2)

    • Combines Cholesky rank-one update with matrix multiplication

Algorithm using logdet1

Algorithm Using LogDet

  • G = LLT; G = G0 B; B is the product of all L matrices from every iteration and X0 = G0G0T

  • L can be determined implicitly

Algorithm using logdet2

Algorithm Using LogDet

  • What’re the constraints? Convergence?


Convergence is checked by how much v has changed

May require large number of iterations


Algorithm using von neumann

Algorithm Using von Neumann

  • Cyclic projection algorithm using von Neumann divergence

    • Update for each projection

    • This can be modified to

    • To calculate , find the unique root of the function

Algorithm using von neumann1

Algorithm Using von Neumann

  • Slightly slower than Algorithm 2

Root finder, slows down the process




  • Limitations of Algorithm 2 and Algorithm 3

    • The initial kernel matrix must be low-rank

      • Not applicable for dimensionality reduction

    • Number of iterations may be large

      • This paper only optimized the computations for each iteration

      • Reducing the total number of iterations is future topic

  • Handling new data points

    • Transductive setting

      • All data points are up front

      • Some of the points have labels or other supervisions

      • When new data point is added, re-learn the entire kernel matrix

    • Circumvent

      • View B as linear transformation

      • Apply B to new points



  • Generalizations to more constraints

    • Slack variables

      • When number of constraints is large, no feasible solution to Bregman divergence minimization problem

      • Introduce slack variables

      • Allows constraints to be violated but penalized

    • Similarity constraints

      • , or

    • Distance constraints

    • O(r^2) per projection

    • If arbitrary linear constraints are applied, O(nr)



  • Special cases

    • DefiniteBoost optimization problem

    • Online-PCA

    • Nearest correlation matrix problem

  • Minimizing LogDet divergence and semidefinite programming (SDP)

    • SDP relaxation of min-balanced-cut problem

    • Can be solved by LogDet divergence



  • Transductive learning and clustering

  • Data sets

    • Digits

      • Handwritten samples of digits 3,8 and 9 from UCI repository

    • GyrB

      • Protein data set with three bacteria species

    • Spambase

      • 4601 email messages with 57 attributes, spam/not spam labels

    • Nursery

      • 12960 instances with 8 attributes and 5 class labels

  • Classification

    • k-nearest neighbor classifier

  • Clustering

    • Kernel k-means algorithm

    • Use normalized mutual information (NMI) measure



  • Learn a kernel matrix only using constraints

    • Low rank kernels learned by proposed algorithms attain accurate clustering and classification

    • Use original data to get initial kernel matrix

    • The more constraints used, the more accurate results

    • Convergence

      • von Neumann divergence

        • Convergence was attained in 11 cycles fo 30 constraints and 105 cycles for 420 constraints

      • LogDet divergence

        • Between 17 and 354 cycles

Simulation results

Simulation Results

Significant improvements

0.948 classification accuracy

For DefiniteBoost, 3220 cycles to convergence

Simulation results1

Simulation Results

Rank 57Rank 8

LogDet needs fewer constraints

LogDet converges much more slowly

(Future work)

But often it has fewer overall running time

Simulation results2

Simulation Results

  • Metric learning and large scale experiments

    • Learning a low-rank kernel with same range-space is equivalent to learning linear transformation of input data

    • Compare proposed algorithms with metric learning algorithms

      • Metric learning by collapsing classes (MCML)

      • Large-margin nearest neighbor metric learning (LMNN)

      • Squared Euclidean Baseline



  • Developed LogDet/von Neumann divergence based algorithms for low-rank matrix nearness problems

  • Running times are linear in number of data points and quadratic in the rank of the kernel

  • The algorithms can be used in conjunction with a number of kernel-based learning algorithms

Thank you

Thank you

  • Login