B uilding and U sing a S emantivisual I mage H ierarchy

Buildingand UsingaSemantivisual Image Hierarchy Li-Jia Li Yongwhan Lim Li Fei-Fei Chong Wang David M. Blei CVPR, 2010

outline • Introduction • Building the hierarchy • Graphical modal • Learning • Semantivisual image hierarchy • Implementation • Visualizing the semantivisual hierarchy • Quantitative evaluation • Application • Annotation • Labeling • Classification

introduction • For images, a meaningful image hierarchy can make image organization, browsing and searching more convenient and effective • Good image hierarchies can serve as knowledge ontology for end tasks such as image retrieval, annotation or classiﬁcation. • Language-based • Low-level visual feature based

Bulidingthe hierarchy • Use a multi-modal model to represent images and textual tags on the semantivisualhierarchy • Each image is associated with a path of the hierarchy, where the image regions can be assigned to different nodes of the path • Each image is decomposed into a set of over-segmented regions R = [R1…Rr…RN] • each of the N regions is characterized by four appearance features

Bulidingthe hierarchy • Graphical model • Each image-text pair (R,W) is assigned to a path Cc = [Cc1,…,Ccl,…,CcL]

Bulidingthe hierarchy • Learning the semantivisual image hierarchy • Given a set of unorganized images and user tags associated with them • Gibbs sampling : samples concept index Z, coupling variable Sand path C • Sampling Z • Depend on 1) the likelihood of the region appearance 2) the likelihood of tags associated with this region 3) the concept indices of the other regions in the same image-text pair • ..

Buliding the hierarchy • Sampling S • Its conditional distribution solely depends on the likelihood of the tag • Sampling C • Inﬂuenced by the previous arrangement of the hierarchy and the likelihood of the image-text pair likelihood Prior probability induced by nCRP

A semantivisual image hierarchy-- Implementation • 4000 user upload images and 538 unique user tags • Each image is divided into small patches of 10×10pixels. • Each patch is assigned to a codeword in a codebook of 500 visual word obtained by K-means • Obtain 4 region codebook for color(HSV histogram), location, texture, normalized SIFT histogram • To speed up learning, we initialize the levels in a path according to tf-idf score . • We obtain a hierarchy of 121 nodes, 4 levels and 53 paths.

A semantivisual image hierarchy-- Visualizing the Semantivisual Hierarchy • General-to-specific relationship • Purely visual information cannot provide meaningful image hierarchy • Purely language-based hierarchy would miss close connection

A semantivisual image hierarchy-- A Quantitative Evaluation Of Image Hierarchies • Good clustering of images that share similar concepts ,i.e., image along the same path, should be more or less annotated with similar tags. • Good hierarchical structure given path, i.e., images and their associated tags at different levels of the path, should demonstrate good general-to-speciﬁc relationships. A path of L levels is selected from the hierarchy.

Application-- Hierarchical annotationofImage • Given our learned image ontology, we can propose a hierarchical annotation of an unlabeled query image. • nCRP cannot perform well on sparse tag words. Its proposed hierarchy has many words assigned to the root node, resulting in very few paths. • A simple clustering algorithm such as KNN cannot ﬁnd a good association between the test images and the training images in our challenging dataset with large visual diversity. • In contrast, our model learns an accurate association of visual and text data simultaneously

Application-- Image labeling • Serving as an image and text knowledge ontology, our semantivisualhierarchy and model can be used for image labeling without a hierarchical relation. Collect the top 5 predicted words of each image Our model captures the hierarchical structure of image and tags !!

Application-- Image classification • Another 4000 image are held out as test images. By encoding semantic meaning to the hierarchy, our semantivisual hierarchy delivers a more descriptive structure, which could be helpful for classiﬁcation.

Conclusion • Use image and their tags to construct a meaningful hierarchy that organizes images in a general-to-specific structure. • Our quantitative evaluation by human subjects shows that our hierarchy is more meaningful and accurate than others.

B uilding and U sing a S emantivisual I mage H ierarchy