multiple kernel learning
Download
Skip this Video
Download Presentation
Multiple Kernel Learning

Loading in 2 Seconds...

play fullscreen
1 / 24

Multiple Kernel Learning - PowerPoint PPT Presentation


  • 201 Views
  • Uploaded on

Multiple Kernel Learning. Marius Kloft Technische Universität Berlin (now: Courant / Sloan-Kettering Institute). TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A. Machine Learning. Example Object detection in images. Aim

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Multiple Kernel Learning' - donoma


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
multiple kernel learning

Multiple Kernel Learning

Marius Kloft

Technische Universität Berlin

(now: Courant / Sloan-Kettering Institute)

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.: AA

machine learning
Machine Learning
  • Example
    • Object detection in images
  • Aim
    • Learning the relation of of two random quantities and
      • from observations
  • Kernel-based learning:
multiple views kernels
Multiple Views / Kernels
  • (Lanckriet, 2004)

Space

Shape

How to combine the views?

Weightings.

Color

computation of weights
Computation of Weights?
  • State of the art
    • Sparse weights
      • Kernels / views are completely discarded
          • But why discard information?
  • (Lanckriet et al., Bach et al., 2004)
from vision to reality
From Vision to Reality?
  • State of the art: sparse method
    • empirically ineffective
  • New methodology
  • (Gehler et al., Noble et al., Cortes et al., Shawe-Taylor et al., etc., NIPS WS 2008, ICML, ICCV, ...)
  • More effective in applications
  • More efficient and effective in practice
  • learning bounds of rate up to O(M/n)
presentation of methodology
Presentation of Methodology

Non-sparse Multiple Kernel Learning

new methodology
New Methodology
  • (Kloft et al., NIPS 2009,
  • ECML 2009/10, JMLR 2011)
  • Generalized formulation
    • arbitrary loss
    • arbitrary norms
      • e.g. lp-norms:
      • 1-norm leads to sparsity:
  • Computation of weights?
    • Model
      • Kernel
    • Mathematical program

Convex problem.

Optimization over weights

theoretical analysis
Theoretical Analysis
  • Theoretical foundations
    • Active research topic
      • NIPS workshop 2009
    • We show:
      • Theorem (Kloft & Blanchard). The localRademacher com-plexity of MKL is bounded by1:
  • Corollaries (Learning Bounds)
    • Bound:
      • depends on kernel
      • best known rate:
  • (Cortes, Mohri, Rostamizadeh, ICML 2010)
  • (Kloft & Blanchard, NIPS 2011, JMLR 2012;
  • Kloft, Rückert, Bartlett, ECML 2010)
proof sketch
Proof (Sketch)
  • Relating the original class with the centered class
  • Bounding the complexity of the centered class
  • Khintchine-Kahane’s and Rosenthal’s inequalities
  • Bounding the complexity of the original class
  • Relating the bound to the truncation of the spectra of the kernels
optimization
Optimization
  • (Kloft et al.,JMLR 2011)
  • Implementation
    • In C++ (“SHOGUN Toolbox”)
      • Matlab/Octave/Python/R support
    • Runtime:

~ 1-2 orders of magnitude faster

  • Algorithms
    • Newton method
    • sequential, quadratically constrained programming with level set projections
    • block-coordinate descent alg.
      • Alternate
        • solve (P) w.r.t. w
        • solve (P) w.r.t. :
      • Until convergence

(proved)

  • (Sketch)

analytical

toy experiment
Toy Experiment
    • Choice of p crucial
    • Optimality of p depends on true sparsity
      • SVM fails in the sparse scenarios
      • 1-norm best in the sparsest scenario only
      • p-norm MKL proves robust in all scenarios
  • Bounds
    • Can minimize bounds w.r.t. p
    • Bounds well reflect empirical results
  • Design
    • Two 50-dimensional Gaussians
      • mean μ1 := binary vector;
          • Zero-mean features irrelevant
      • 50 training examples
      • One linear kernel per feature
    • Six scenarios
      • Vary % of irrelevant features
      • Variance so that Bayes error constant
  • Results

test

error

  • 50%
  • 0%
  • 0% 44% 64% 82% 92% 98%

sparsity

  • 0% 44% 64% 82% 92% 98%

sparsity

Sparsity

applications
Applications
  • Lesson learned:
    • Optimality of a method depends on true underlying sparsity of problem
  • Applications studied:
  • Computer vision
      • Visual object recognition
  • Bioinformatics
      • Subcellular localization
      • Protein fold prediction
    • DNA Transcription start site detection
      • Metabolic network reconstruction

non-sparsity

application domain computer vision
Application Domain: Computer Vision
  • Visual object recognition
    • Aim: annotation of visual media (e.g., images)
    • Motivation:
          •  content-based image retrieval
  • aeroplane bicycle bird
application domain computer vision1
Application Domain: Computer Vision
  • Visual object recognition
    • Aim: annotation of visual media (e.g., images)
    • Motivation:
          •  content-based image retrieval
  • Multiple kernels
    • based on
      • Color histograms
      • shapes

(gradients)

      • local features

(SIFT words)

      • spatial features
application domain computer vision2
Application Domain: Computer Vision
  • Datasets
    • 1. VOC 2009 challenge
      • 7054 train / 6925 test images
      • 20 object categories
          • aeroplane, bicycle,…
    • 2. ImageCLEF2010 Challenge
      • 8000 train / 10000 test images
          • taken from Flickr
      • 93 concept categories
          • partylife, architecture, skateboard, …
  • 32 Kernels
    • 4 types:
    • varied over color channel combinations and spatial tilings(levels of a spatial pyramid)
    • one-vs.-rest classifier for each visual concept
application domain computer vision3
Application Domain: Computer Vision
  • Challenge results
    • Employed our approach for ImageCLEF2011 Photo Annotation challenge
      • achieved the winning entries in 3 categories
  • Preliminary results:
    • using BoW-S only gives worse results

→ BoW-S alone is not sufficient

  • (Binder, Kloft, et al., 2010, 2011)
application domain computer vision4
Application Domain: Computer Vision
  • Experiment
    • Disagreement of single-kernel classifiers
    • BoW-S kernels induce more or less the same predictions
  • Why can MKL help?
    • some images better captured by certain kernels
    • different images may have different kernels that capture them well
application domain genetics
Application Domain: Genetics
  • (Kloft et al., NIPS 2009, JMLR 2011)
  • Detection of
    • transcription start sites:
  • by means of kernels based on:
    • sequence alignments
    • distribution of nukleotides
      • downstream, upstream
    • folding properties
      • bindingenergies and angles
  • Empirical analysis
    • detection accuracy (AUC):
    • higher accuracies than sparse MKL and ARTS (small n)
      • ARTS = winner of international comparison of 19 models
  • (Abeel et al., 2009)
application domain genetics1
Application Domain: Genetics
  • (Kloft et al., NIPS 2009, JMLR 2011)
  • Theoretical analysis
    • impact of lp-Norm on bound
    • confirms experimental results:
      • stronger theoretical guarantees for proposed approach (p>1)
      • empirical and theoretical results approximately equal for
  • Empirical analysis
    • detection accuracy (AUC):
    • higher accuracies than sparse MKL and ARTS (small n)
      • ARTS = winner of international comparison of 19 models
  • (Abeel et al., 2009)
application domain pharmacology
Application Domain: Pharmacology
  • Results
    • Accuracy:
    • 1-norm MKL and SVM on par
    • p-norm MKL performs best
      • 6% higher accuracy than baselines
  • Protein Fold Prediction
    • Prediction of fold class of protein
      • Fold class related to protein’s function
          • e.g., important for drug design
    • Data set and kernels from Y. Ying
      • 27 fold classes
      • Fixed train and test sets
      • 12 biologically inspired kernels
          • e.g. hydrophobicity, polarity, van-der-Waals volume
further applications
Further Applications
  • Computer vision
      • Visual object recognition
  • Bioinformatics
      • Subcellular localization
      • Protein fold prediction
    • DNA Transcription splice site detection
      • Metabolic network reconstruction

non-sparsity

conclusion non sparse multiple kernel learning
Conclusion: Non-sparse Multiple Kernel Learning

Visual Object Recognition

winner of ImageCLEF 2011 Photo-annotation Challenge

Computational Biology

gene detector competitive to winner of int. comparison

Appli-

cations

Training with >100,000 data points and >1000 Kernels

Sharp learning bounds

slide23

Thank you for your attention

  • I will be happy to answer any additional questions
references
References
  • Abeel, Van de Peer, Saeys (2009).Toward a gold standard for promoter prediction evaluation. Bioinformatics.
  • Bach (2008).Consistency of the Group Lasso and Multiple Kernel Learning. Journal of Machine Learning Research (JMLR).
  • Kloft, Brefeld, Laskov, and Sonnenburg (2008).Non-sparse Multiple Kernel Learning. NIPS Workshop on Kernel Learning.
  • Kloft, Brefeld, Sonnenburg, Laskov, Müller, and Zien (2009).Efficient and Accurate Lp-norm Multiple Kernel Learning. Advances in Neural Information Processing Systems (NIPS 2009).
  • Kloft, Rückert, and Bartlett (2010). A Unifying View of Multiple Kernel Learning. ECML.
  • Kloft, Blanchard (2011).The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning. Advances in Neural Information Processing Systems (NIPS 2011).
  • Kloft, Brefeld, Sonnenburg, and Zien (2011). Lp-Norm Multiple Kernel Learning. Journal of Machine Learning Research (JMLR),12(Mar):953-997.
  • Kloftand Blanchard (2012). On the Convergence Rate of Lp-norm Multiple Kernel Learning. Journal of Machine Learning Research (JMLR), 13(Aug):2465-2502.
  • Lanckriet, Cristianini, Bartlett, El Ghaoui, Jordan (2004). Learning the Kernel Matrix with Semidefinite Programming. Journal of Machine Learning Research (JMLR).
ad