Unsupervised Foreground Detection and Category Learning for Image Clustering

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin

Supervised learning methods yield good recognition performance in practice. But… • Supervision is Expensive • collect training examples, perform labeling, segmentation, etc. • Supervision has Bias • variability of the target data may not be captured (i.e., not general enough) We propose an UnsupervisedForeground Detection and Category Learning method based on image clustering

Related Work • Unsupervised Category Discovery • Topic models: pLSA, LDA - Fergus et al., Sivic et al., Quelhas et al., ICCV 2005, Fei-Fei & Perona, CVPR 2005, Liu & Chen, ICCV 2007 • Image Clustering - Grauman & Darrell, CVPR 2006, Dueck & Frey, ICCV 2007 • Image Clustering with localization - Kim et al., CVPR 2008 • Supervised Feature Selection / Part Discovery • Discriminative Feature Selection - Dorko & Schmid, ICCV 2003, Quack et al., ICCV 2007 • Weakly Supervised Learning - Weber et al., ECCV 2000, Fergus et al., CVPR 2003, Chum & Zisserman, CVPR 2007… • Query Expansion - Chum et al., ICCV 2007

Clusters formed from full image matches Problem

Clusters formed from foreground matches Mutual Relationship between Foreground Features and Clusters • If we have only foreground features, we can form good clusters… Clusters formed from full image matches

Mutual Relationship between Foreground Features and Clusters • If we have good clusters, we can detect the foreground…

Mutual Relationship between Foreground Features and Clusters • If we have good clusters, we can detect the foreground… • If we have only foreground features, we can form good clusters…

Our Approach Feature weights Feature index • Unsupervised task that iteratively seeks the mutual support between discovered objects and their defining features Refine feature weights given current clusters Update cluster based on weighted semi-local feature matches

X = {(f1(X),w1),(f2(X),w2),…,(fn(X),wn)} Y = {(f1(Y),w1),(f2(Y),w2),…,(fm(Y),wm)} Sets of local features

Optimal Partial Matching X = {(f1(X),w1),(f2(X),w2),…,(fn(X),wn)} Y = {(f1(Y),w1),(f2(Y),w2),…,(fm(Y),wm)} Earth Mover’s Distance [Rubner et al., IJCV 2000]: : features from sets , X and Y : distance between the descriptors : scalars giving the amount of weight mapped from ,

D(fi(X), fj(Y)) Feature Contribution to Match f1(X) f1(Y) f2(X) f2(Y) f3(X) Y X

Feature Contribution to Match D(fi(X), fj(Y)) f1(X) f1(Y) f2(X) f2(Y) f3(X) Y X Weight computation is influenced by both the flow (amount of mass transferred) and distance between the matching features: Contribution = weight / distance Contribution to Match Feature index

Feature Contribution to Match f1(X) f1(Y) f2(X) f2(Y) f3(X) Y X Weight computation is influenced by both the flow (amount of mass transferred) and distance between the matching features: Contribution = weight / distance Contribution to Match Feature index

Computing Feature Weights feature index contribution to match

Computing Feature Weights new feature weights

feature weights feature weights : Matching features have highweights and highsimilarity  High contribution to match score Computing Image Similarity

feature weights feature weights : Matching features have lowweights and lowsimilarity  low (negligible) contribution to match score Computing Image Similarity

feature weights feature weights : Matching features have low and highweights and highsimilarity. The amount of weight that is matched is always the smaller of the two feature weights.  Low contribution to match score Computing Image Similarity

Forming Clusters

Forming Clusters Compute Pair-wise Partial Matching Image Similarities

Forming Clusters Normalized Cuts Clustering

Mutual Relationship between Foreground Features and Clusters • If we have good clusters, we can detect the foreground… • If we have only foreground features, we can form good clusters… • Now we have the pieces to do both…

Cluster and Feature Weight Refinement: Iteration 1 Feature weights Images as Local Feature Sets Pair-wise Partial Matching Normalized Cuts Clustering Initial Set of Clusters Feature index

Cluster and Feature Weight Refinement: Iteration 1 Feature weights Feature index Compute Feature Weights New Feature Weights

Cluster and Feature Weight Refinement: Iteration 2 Feature weights Images as Local Feature Sets w/ New Weights Pair-wise Partial Matching Noticeable Change in Matching Normalized Cuts Clustering Feature index

Cluster and Feature Weight Refinement: Iteration 2 Feature weights New Set of Clusters Feature index Compute Feature Weights New Feature Weights

Cluster and Feature Weight Refinement: Iteration 3 Feature weights Pair-wise Partial Matching + Normalized Cuts Final Set of Clusters Feature index New Feature Weights

Semi-local features: Our proximity distribution descriptor: Local features may not produce good matches… Local features: Lazebnik et al., BMVC 2004, Sivic & Zisserman, CVPR 2004, Agarwal & Triggs, ECCV 2006, Pantofaru et al., Beyond Patches Wkshp 2006,Quack et al., ICCV 2007

Experiments • Goals: • Unsupervised Foreground Discovery • Unsupervised Category Discovery • Comparison with Related Methods • Datasets: Caltech-101, Microsoft Research Cambridge, Caltech-4 • Semi-local Features: Densely sampled SIFT, DoG SIFT, Hessian-Affine SIFT • Number of Clusters: # of Classes

Quality of Foreground Detection • Object categories with highest clutter were chosen • 2 supervised classifiers built: 1) trained on all features, 2) trained on foreground features • Ranked categories for which segmentation most helped supervised classification

Quality of Foreground Detection 10-classes subset - highly weighted features

Quality of Clusters Formed • Cluster quality for the 4-classes and 10-classes sets of Caltech-101 • Quality Measure: F-measure • Black dotted lines indicate the best possible quality that could be obtained if the ground truth segmentation were known

Comparison with clustering methods • Affinity Propagation: message passing algorithm which identifies good exemplars by propagating non-metric affinities [Dueck & Frey, ICCV 2007] • Partial Match Clusters: forms groups with partial-match spectral clustering but does not iteratively improve foreground feature weights and cluster assignments [Grauman & Darrell, CVPR 2006] Caltech-101 subsets: 7-class (N=441) and 20-class (N=1230) Caltech-4 dataset (N=3188), 10 runs with 400 randomly selected images

Comparison with topic models • Comparison of accuracy of foreground discovery • Positive Class: Caltech motorcycle class (826 images) • Negative Class: Caltech background class (900 images) • Foreground detection rate: threshold varied among top 20% most confident features [1] correspondence-based pLSA variant -[Liu & Chen, ICCV 2007] [2] pLSA with spatial information - [Liu & Chen, CVPR wkshop, 2006]

Assumptions and Limitations • Support of the pattern among multiple examples in the dataset • Some support must be detected in the initial iteration • Background can be consistently reoccurring: introduce semi-supervision

Contributions • Unsupervised foreground feature selection from unlabeled images • Automatic object category learning • Mutual reinforcement of foreground and category discovery benefits both • Novel semi-local descriptor

Future Work • Incremental updates to unlabeled dataset • Extension to multi-label cluster assignments • Automatic Model Selection: k • Automatically construct summaries of unstructured image collections

Questions?

Quality of Foreground Detection and Clusters Formed • Microsoft Research Cambridge (MSRC)–v1 dataset

Proximity Distribution Descriptor p: base feature Ellipses denote features, their patterns indicate the visual word types, numbers indicate rank order of spatial proximity to the base feature Motivated by Proximity Distribution Kernels [Ling & Soatto, ICCV 2007]

Unsupervised Foreground Detection and Category Learning for Image Clustering