200 likes | 403 Views
(Final Version). Simultaneous Image Classification and Annotation. Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR 2009 Presented by Eric Wang 7-3-09. Outline. Introduction Review of sLDA and Corr -LDA Model description
E N D
(Final Version) Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR 2009 Presented by Eric Wang 7-3-09
Outline • Introduction • Review of sLDA and Corr-LDA • Model description • Model Inference and Parameter Estimation • Empirical Results • Conclusion
Introduction • Images Classification refers to assigning a class label to each image which globally describes the image. • Image Annotation refers to assigning words which describe individual regions of the image. • The images considered in this paper are both classified with a class label and annotated with free text. • This paper will combine the basic framework of Corr-LDA, and a highly modified version of Supervised LDA (sLDA) to yield a model which simultaneously classifies images and annotates the individual regions.
Review of sLDA • For each document • In this model, the response variable y is a continuous random variable. • are treated as unknown constants to be estimated, rather than as random variables. Source: D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.
Review of sLDA • An application of sLDA considered by Blei et. al was regressing a corpus of textual movie reviews to number of stars given. Source: D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.
Review of Corr-LDA • For each document • Corr-LDA is a simple extension of LDA for images to annotate image regions. Source: D. M. Blei and M. I. Jordan. Modeling annotated data. In SIGIR, 2003.
Annotated sLDA Model • This step is identical to LDA, the topic proportions are drawn once per document. • In this paper, is not optimized.
Annotated sLDA Model • A region is characterized by one of 240 codewords (quantized from 128 dimensional SIFT features). • Regions are found by segmenting images using the N-cuts algorithm. • parameterizes a particular multinomial distribution (topic) over the quantized codewords.
Annotated sLDA Model • The class label c is completely determined by the topic indicators z_{1:N} using a modified sLDA framework. • The total number of classes is known a priori and the class indicators are treated separately from the annotations. This is a simpler approach than the one taken by L.J. Li, R. Socher and L. Fei-Fei in that there is no “switch” variable which determines whether a word is an annotation or label. • The softmax function is well studied and is also known as “multinomial logistic regression”
Annotated sLDA Model • The annotations are assigned to specific regions in the same manner as in Corr-LDA. • This will, for example, encourage words such as “blue” and “white” to be associated with regions (and thus, codewords) which capture sky. • Though not explicitly shown in the graphical model, and have symmetric Dirichlet priors.
Inference of Latent Variables These updates are identical to those used in Corr-LDA. • Let parameterize a multinomial over the K topics • Let parameterize a Dirichlet over topic distributions. • Let parameterize a multinomial over image regions • These updates are local to each document (thus the omission of d).
Inference of Latent Variables • This equation updates the posterior distribution over topics. • Note that this update depends on both class label c and the annotation information .
Inference of Latent Variables Parameter Estimation Updates of codebook word f in codebook topic i (proportional to a constant). has no closed form solution and is optimized via conjugate gradient
Inference of Latent Variables Parameter Estimation Updates annotation word w in annotation topic i (proportional to a constant). has no closed form solution and is optimized via conjugate gradient
Empirical Results • LableMe dataset • 8 classes: “highway,” “inside city,” “tall building,” “street,” “forest,” “coast,” “mountain,” and “open country.” • 200 256x256 training images per class. • UIUC dataset • 8 types of sports: “badminton,” “bocce,” “croquet,” “polo,” “rockclimbing,” “rowing,” “sailing” and “snowboarding.” • 1792 256x256 training images. • 240 codeword dictionary. • Annotations which appeared less than 3 times were removed.
Empirical Results: Classification • The black line represents of the performance of Bosch et. al 2006, which employs a non-annotated LDA on the image regions and a KNN to classify the images. • The blue line is the performance of Fei-Fei and Perona 2005, which uses unannotated labeled images • The models presented in this paper are much more resistant to overfitting than the models of Bosch et. al and Fei-Fei and Perona .
Emperical Results: Classification • Confusion matrices comparing the performance of multi-class sLDA with annotations and multi-class sLDA using 100 topic models • Annotations seem to improve performance slightly, although, as the last slide shows, the main benefit is more consistent performance as a function of the number of topics.
Empirical Results: Annotation • The F-Measure is used as a score. • Results are given over all numbers of topics considered above. • LabelMe: • 38.2% (corr-LDA) • 38.7% (multi-class sLDA with annotations) • UIUC-Sport: • 34.7% (corr-LDA) • 35.0% (multiclass sLDA with annotations).
Conclusion • Combining image annotation with classification provides state of the art image classification performance. • However, the addition of the classification framework provides only a small improvement to the annotation performance. • The authors’ primary contribution is showing that image classification and annotation are related and can be conducted simultaneously in the same framework. • Inference was done in a Variational EM framework.