350 likes | 551 Views
Image Ranking and Retrieval based on Multi-Attribute Queries CVPR2011. Behjat Siddiquie 1 Rogerio S. Feris 2 Larry S. Davis 1 1 University of Maryland, College Park 2 IBM T. J. Watson Research Center. Outline. 1 . Introduction 2. Multi Attribute Retrieval and Ranking
E N D
Image Ranking and Retrieval based on Multi-Attribute QueriesCVPR2011 Behjat Siddiquie1 RogerioS. Feris2 Larry S. Davis1 1University of Maryland, College Park 2IBM T. J. Watson Research Center
Outline • 1. Introduction • 2. Multi Attribute Retrieval and Ranking • 3. Experiments and Results • 4. Conclusion
Outline • 1. Introduction
Introduction • A person who has a mustache is almost definitely a male, or a person who is Asian is unlikely to have blonde hair • A new framework for multi-attribute image retrieval and ranking, which retrieves images based not only on the words that are part of the query, but also considers the remaining attributes within the vocabulary that could potentially provide information about the query
Introduction • There are three key contributions: • 1. Deals with ranking and retrieval within the sameformulation • 2. This is non- trivial, as the number of possible multi-labelqueries for a vocabulary of size L is • 3. Demonstrate that attributes within a single object category and even across multiple object categories are interdependent
Outline • 2. Multi Attribute Retrieval and Ranking - 2.1. Retrieval - 2.2. Ranking
Retrieval • Given a set of labels X, and a set of training images y • Corresponding to each label xi (xi ϵX) a mapping is learned to predict the set of images y (y ∁ y) that contain the label xi • Given a multi-attribute query Q, where Q ∁ X, our goal is to retrieve images from the set ythat are relevant to Q
Retrieval • The prediction function fw : Q y returns the set y* which maximizes the score over the weight vector w • w : composed of two components • : for modeling the appearance of individual attributes • :for modeling the dependencies between them
Retrieval • a(xi , yk)︰the feature vector representing image yk for attribute xi • 𝜑p(xj , yk)︰indicates the presence of attribute xjin image yk • a standard linear model for recognizing attribute xibased on the feature representation 𝜑a(xi , yk) • a potential function encoding the correlation between the pair of attributes xiand xj
Retrieval Train a model w which given a multi-label query Q X can correctly predict the subset of images in a test set which contain all the labels ϵQ C : a parameter controlling the trade-off between the training error and regularization (𝑄𝑡 ϵQ) : the training queries ξt : the slack variable corresponding to query Qt Δ( , ) : the loss function
Retrieval • Δ(𝑦𝑡 * , 𝑦𝑡)︰optimizing training error based on different performance metrics
Ranking • The prediction function fw : Q z, is a permutation z* , of the set of images y: • (y)is the set of all possible permutations of the set of images y
Ranking • A(r) ︰any non-increasing function • r(zk) ︰the rank of image zk • Suppose we care only about the ranks of the top K images, we can define A(r)as:
Ranking • Given a queryQ, we divide the training images into │Q│ + 1 sets basedon their relevance. The most relevant set consists of imagesthat contain all the attributes in the query Q, and areassigned a relevance rel(j) = |Q| • YoungAsianWomanWearing sunglasses • Rel(j) = 0 ~ 4
Ranking • Ensures that, in case there are no images containing all the query attributes, images that contain the most number of attributes are ranked highest • While we have assigned equal weights to all the attributes, one can conceivably assign -higher weights (race or gender) difficult to modify -lower weights (wearing sunglasses) easily changed
Ranking • Amax-margin framework, for training our ranking model: • Δ (z*, z) is a function denoting the loss incurred in predicting the permutation z instead of the correct permutation z*
Ranking • Δ (z*, z) = 1 - NDCG@100(z*, z) • The normalized discount cumulative gain(NDCG) score is a standard measure used for evaluating rankingalgorithms • rel(j): the relevance of the ranked image • Z : a normalization constant to ensure that the correct ranking results in an NDCG score of 1
Outline • 3. Experiments and Results - 3.1. Evaluation - 3.2. Labeled Faces in the Wild (LFW) - 3.3. FaceTracerDataset - 3.4. PASCAL
Evaluation • Retrieval: -1. Reverse Multi-Label Learning (RMLL) [19] -2. TagProp[9] [19] J. Petterson and T. S. Caetano. Reverse multi-label learning.NIPS, 2010. [9] M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Discriminative metric learning in nearest neighbor models for image auto-annotation. ICCV, 2009.
Evaluation • Ranking: -1. rankSVM[12] -2. rankBoost[7] -3. Direct Optimization of Ranking Measures(DORM) [18] -4. TagProp[9] [12] T. Joachims. Optimizing search engines using clickthrough data. KDD, 2002. [7] Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. JMLR, 2003. [18] Q. V. Le and A. J. Smola. Direct optimization of ranking measures. http://arxiv.org/abs/0704.3359, 2007. [9] M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Discriminative metric learning in nearest neighbor models forimageauto-annotation. ICCV, 2009.
Evaluation • Datasets: -1. Labeled Faces in the Wild(LFW) [11] -2. FaceTracer[15] -3. PASCAL VOC 2008 [4] [11] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report,2007. [15] N. Kumar, P. Belhumeur, and S. Nayar. Facetracer: A search engine for large collections of images with faces. ECCV, 2008. [4] M. Everingham, L. Van Gool, C. K. I.Williams, J.Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results.
Labeled Faces in the Wild (LFW) • A subset consisting of 9992 images from LFW was annotated with a set of 27 attributes (Table 1). We randomly chose 50% of these images for training and the remaining were used for testing
Labeled Faces in the Wild (LFW) • The attribute detector for hat or bald will give higherweights to features extracted from the topmostgrids in the configurations horizontal parts and layout
Mutually exclusive -(White , Asian) -(Eyeglasses , No-Eyewear) -(Short-Hair , Long-Hair) Rarely co-occur -(Kid , Beard) -(Lipstick , Male) Commonly co-occur -(Middle-aged , Eyeglasses) -(Senior , Gray-Hair)
Ranking Performance on the FaceTracer dataset FaceTracer contains many more images of babies and small children compared to LFW
Outline • 4. Conclusion
Conclusion • Presented an approach for ranking and retrieval of images based on multi-attributequeries. We utilize a structured prediction framework to integrate ranking and retrieval within the same formulation • Furthermore, our approach models the correlations between different attributes leading to improved ranking/retrieval performance
In future • Plan to explore image retrieval for more complex queries such as scene descriptions consisting of the objects present, along with their attributes and the relationships among them