html5-img
1 / 35

Session: Object Recognition II: using Language, Tuesday 15, June 2010

Session: Object Recognition II: using Language, Tuesday 15, June 2010. Rongrong Ji 1 , Hongxun Yao 1 , Xiaoshuai Sun 1 , Bineng Zhong 1 , and Wen Gao 1,2 1 Visual Intelligence Laboratory, Harbin Institute of Technology 2 School of Electronic Engineering and Computer Science, Peking University.

Download Presentation

Session: Object Recognition II: using Language, Tuesday 15, June 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Session: Object Recognition II: using Language, Tuesday 15, June 2010 Rongrong Ji1, Hongxun Yao1, Xiaoshuai Sun1, Bineng Zhong1, and Wen Gao1,2 1Visual Intelligence Laboratory, Harbin Institute of Technology 2School of Electronic Engineering and Computer Science, Peking University

  2. Overview • Problem Statement • Building patch-labeling correspondence • Generative semantic embedding • Experimental comparisons • Conclusions

  3. Overview • Problem Statement • Building patch-labeling correspondence • Generative semantic embedding • Experimental comparisons • Conclusions

  4. Introduction • Building visual codebooks • Quantization-based approaches • K-Means, Vocabulary Tree, Approximate K-Means et al. • Feature-indexing-based approaches • K-D Tree, R Tree, Ball Tree et al. • Refining visual codebooks • Topic model decompositions • pLSA, LDA, GMM et al. • Spatial refinement • Visual pattern mining, Discriminative visual phrase et al.

  5. Introduction • With the prosperity of Web community User Tags Our Contribution Near-Duplicated Retrieval Visual Database Local Feature Extraction Visual Codebook Construction Object Category Recognition

  6. Introduction • Problems to achieve this goal • Supervision @ image level • Correlative semantic labels • Model generality Flower ? A traditional San Francisco street view with car and buildings Rose

  7. Introduction • Our contribution • Publish a ground truth path-labeling set http://vilab.hit.edu.cn/~rrji/index_files/SemanticEmbedding.htm • A generalized semantic embedding framework • Easy to deploy into different codebook models • Modeling label correlations • Model correlative tagging in semantic embedding

  8. Introduction • The proposed framework Build patch-level correspondence Supervised visual codebook construction

  9. Overview • Problem Statement • Building patch-labeling correspondence • Generative semantic embedding • Experimental comparisons • Conclusions

  10. Building Patch-Labeling Correspondence • Collecting over 60,000 Flickr photos • For each photo • DoG detection + SIFT • “Face” labels (for instance)

  11. Building Patch-Labeling Correspondence • Purify the path-labeling correspondences • Density-Diversity Estimation (DDE) Purify le links in <Di, si> • Formulation • For a semantic label , its initial correspondence set is • Patches extracted from images with label • A correspondence from label to local patch

  12. Building Patch-Labeling Correspondence • Density • For a given , Density reveals its representability for : • Diversity • For a given , Diversity reveals it unique score for : Average neighborhood distance in m neighbors nm: number of images in neighborhood ni: number of total images with label si

  13. Building Patch-Labeling Correspondence High T: Concerns more on Precision, not Recall

  14. Building Patch-Labeling Correspondence • Case study: “Face” label (before DDE)

  15. Building Patch-Labeling Correspondence • Case study: “Face” label (after DDE)

  16. Overview • Problem Statement • Building patch-labeling correspondence • Generative semantic embedding • Experimental comparisons • Conclusions

  17. Generative Semantic Embedding • Generative Hidden Markov Random Field • AHidden Field for semantic modeling • An Observed Field for local patch quantization

  18. Generative Semantic Embedding • Hidden Field • Each produces correspondence links (le) to a subset of patch in the Observed Field • Links denotes semantic correlations between and WordNet::Similarity Pedersen et al. AAAI 04 lij

  19. Generative Semantic Embedding • Observed Field • Any two nodes follow visual metric (e.g. L2) • Once there is a link between and , we constrain by from the hidden field di leij sj

  20. Generative Semantic Embedding • In the ideal case • Each is conditionally independent given : Neighbors in the Hidden Field Feature set D is regarded as (partial) generative from the Hidden Field

  21. Generative Semantic Embedding • Formulize Clustering procedure • Assign a unique cluster label to each • Hence, D is quantized into a codebook with corresponding features • Cost for codebook candidate C P(C|D)=P(C)P(D|C)/P(D) P(D) is a constraint Semantic Constraint Visual Distortion

  22. Generative Semantic Embedding • Semantic Constraint P(C) • Define a MRF on the Observed Field • For a quantizer assignments C, its probability can be expressed as a Gibbs distribution from the Hidden Field as

  23. Generative Semantic Embedding • That is • Two data points and in contribute to if and only if Observed Field di dj ci=ci Hidden Field sx sy lxy

  24. Generative Semantic Embedding • Visual Constraint P(D|C) • Whether the codebook C is visually consistent with current data D • Visual distortion in quantization

  25. Generative Semantic Embedding • Overall Cost • Finding MAP of P(C|D) can be converted into maximizing its posterior probability The visual constraint P(D|C) The semantic constraints P(C)

  26. Generative Semantic Embedding Assign local patches to the closest clusters • EM solution • E step • M step Update the visual word center

  27. Generative Semantic Embedding

  28. Generative Semantic Embedding • Model generation • To supervised codebook with label independence assumption • To unsupervised codebooks • Making all l=0

  29. Overview • Problem Statement • Building patch-labeling correspondence • Generative semantic embedding • Experimental comparisons • Conclusions

  30. Experimental Comparisons Case study of ratios between inter-class distance and intra-class distance with and without semantic embedding in the Flickr dataset.

  31. Experimental Comparisons MAP@1 comparisons between our GSE model to Vocabulary Tree [1] and GNP [34] in Flickr 60,000 database.

  32. Experimental Comparisons Confusion table comparisons on PASCAL VOC dataset with method in [24].

  33. Overview • Problem Statement • Building patch-labeling correspondence • Generative semantic embedding • Experimental comparisons • Conclusions

  34. Conclusion • Our contribution • Propagating Web Labeling from images to local patches to supervise codebook quantization • Generalized semantic embedding framework for supervised codebook building • Model correlative semantic labels in supervision • Future works • Adapt one supervised codebook for different tasks • Move forward supervision into local feature detection and extraction

  35. Thank you!

More Related