Transfer Learning Algorithms for Image Classification. Ariadna Quattoni MIT, CSAIL. Advisors: Michael Collins Trevor Darrell. Motivation. Goal:. We want to be able to build classifiers for thousands of visual categories. We want to exploit rich and complex feature representations.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
for thousands of visual categories.
complex feature representations.
multiple related tasks.
We study efficient transfer algorithms for image classification which
can exploit supervised training data from a set of related tasks:
automatically derived from unlabeled images + meta-data.
optimization algorithm for training jointly sparse classifiers in high
dimensional feature spaces.
Large dataset of
unlabeled images + meta-data
[Ando & Zhang, JMLR 2005]
Task specific parameters
A method for learning image representations fromunlabeled images + meta-data
tasks to make learning easier. Two settings:
Resource: Large amounts of supervised data for a set of related tasks.
Goal: Improve performance on a target task for which training
data is scarce.
Resource: Small amount of training data for a large number of
Goal: Improve average performance over all classifiers.
[Thrun 1996, Baxter 1997, Caruana 1997, Argyriou 2006, Amit 2007]
A new model and optimization algorithm for training jointly sparse classifiers:
solutions that utilize too many features
on Collection D
for feature 2
The L∞norm on each row promotes non-sparsity on each row.
An L1 norm on the maximum absolute values of the coefficients across tasks promotes sparsity.
Use few features
the optimization problem can be expressed as a linear program.
held out topic using the relevant
features R only.
arbitrary convex losses.
A convex function
4 Features, 6 problems, C=14
the entries of A.
Performance on predicting
(Grauman and Darrell 2005, Nister and Stewenius 2006)
The differences are statistically significant