- 69 Views
- Uploaded on
- Presentation posted in: General

Inferring Semantic Concepts from Community- Contributed Images and Noisy Tags

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Inferring Semantic Concepts from Community- Contributed Images and Noisy Tags

Jinhui Tang†, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua †

† National University of Singapore

‡ University of Illinois at Urbana-Champaign

- Motivation
- Sparse-Graph based Semi-supervised Learning
- Handling of Noisy Tags
- Inferring Concepts in Semantic Concept Space
- Experiments
- Summarization and Future Work

No manual annotation are required.

- With models:
- SVM
- GMM
- …

- Infer labels directly:
- k-NN
- Graph-based semi-supervised methods

- A common disadvantage:
- Have certain parameters that require manual tuning
- Performance is sensitive to parameter tuning

- The graphs are constructed based on visual distance
- Many links between samples with unrelated-concepts
- The label information will be propagated incorrectly.

- Locally linear reconstruction:
- Still needs to select neighbors based on visual distance

- Sparse Graph based Learning
- Noisy Tag Handling
- Inferring Concepts in the Concept Space

- Human vision system seeks a sparse representation for the incoming image using a few visual words in a feature vocabulary. (Neural Science)

- Advantages:
- Reducethe concept-unrelated links to avoid the propagation of incorrect information;
- Practical for large-scale applications, since the sparse representation can reduce the storage requirement and is feasible for large-scale numerical computation.

Normal Graph Construction.

Sparse Graph Construction.

- The ℓ1-norm based linear reconstruction error minimization can naturally lead to a sparse representation for the images *.

- The sparse reconstruction can be obtained by solving the following convex optimization problem:

minw||w||1 , s.t.x=Dw

w ∈ Rn : the vector of the reconstruction coefficients;

x∈ Rd : feature vector of the image to be reconstructed;

D∈ Rd*n (d < n) : a matrix formed by the feature vectors of the other images in the dataset.

* J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 31(2):210–227, Feb. 2009

- Handle the noise on certain elements of x:
- Reformulate x = Dw+ξ, where ξ ∈ Rd is the noise term.
- Then :

- Set the edge weight of the sparse graph:

- Result:

- The problem with :
- Muu is typically very large for image annotation
- It is often computationally prohibitive to calculate its inverse directly
- Iterative solution with non-negative constraints:
- may not be reasonable since some samples may have negative contributions to the other samples

- Solution:
- Reformulate:

- The generalized minimum residual method (usually abbreviated as GMRES) can be used to iteratively solve this large-scale sparse system of linear equations effectively and efficiently.

√: correct; ?: ambiguous; m: missing

- We cannot assume that the training tags are fixed during the inference process.
- The noisy training tags should be refined during the label inference.
- Solution: adding two regularization terms into the inferring framework to handle the noise:

- Solution:
- Set the original label vector as the initial estimation of ideal label vector, that is, set , and then solve
and we can obtain a refined fl.

- Fix fl and solve
- Use the obtained to replace the y in the previous graph-based method, and we can solve the sparse system of linear equations to infer the labels of the unlabeled samples.

- Set the original label vector as the initial estimation of ideal label vector, that is, set , and then solve

- It is well-known that inferring concepts based on low-level visual features cannot work very well due to the semantic gap.

- To bridge this semantic gap
- Construct a concept space and then infer the semantic concepts in this space.
- The semantic relations among different concepts are inherently embedded in this space to help the concept inference.

- Low-semantic-gap: Concepts in the constructed space should have small semantic gaps;

- Informative: These concepts can cover the semantic space spanned by all useful concepts (tags), that is, the concept space should be informative;

- Compact: The set including all the concepts forming the space should be compact (i.e., the dimension of the concept space is small).

- Basic terms:
- Ω : the set of all concepts;
- Θ : the constructed concept set.

- Three measures:
- Semantic Modelability: SM(Θ)
- Coverage of Semantic Concept Space: CE(Θ, Ω)
- Compactness: CP(Θ)=1/#(Θ)

- Objective:

- Simplification: fix the size of the concept space.

- Then we can transform this maximization to a standard quadratic programming problem.
- See the paper for more details.

- Image mapping: xi D(i)
- Query concept mapping: cxQ(cx)
- Ranking the given images:

- Dataset
- NUS-WIDE LiteVersion (55,615 images)

- Low-level Features
- Color Histogram (CH) and Edge Direction Histogram (EDH), combine directly.

- Evaluation
- 81 concepts
- AP and MAP

Ex1: Comparisons among Different Learning Methods

Ex1: Comparisons among Different Learning Methods

- Ex2: Concept Inference with and without Concept Space

Ex3: Inference with Tags vs. Inference with Ground-truth

We can achieve an MAP of 0.1598 by inference from tags in the concept space, which is comparable to the MAP obtained by inference from ground-truth of training labels.

- Exploited the problem of inferring semantic concepts from community-contributed images and their associated noisy tags.
- Three points:
- Sparse graph based label propagation
- Noisy tag handling
- Inference in a low-semantic-gap concept space

- Training set construction from the web resource

Thanks! Questions?