Using large scale web data to facilitate textual query based retrieval of consumer photos
This presentation is the property of its rightful owner.
Sponsored Links
1 / 36

Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos PowerPoint PPT Presentation


  • 28 Views
  • Uploaded on
  • Presentation posted in: General

Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos. Yiming Liu, Dong Xu, Ivor W. Tsang, Jiebo Luo Nanyang Technological University & Kodak Research Lab. Motivation. Digital cameras and mobile phone cameras popularize rapidly:

Download Presentation

Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Using large scale web data to facilitate textual query based retrieval of consumer photos

Using Large-Scale Web Data to Facilitate Textual QueryBased Retrieval of Consumer Photos

Yiming Liu, Dong Xu, Ivor W. Tsang, Jiebo Luo

Nanyang Technological University & Kodak Research Lab


Motivation

Motivation

  • Digital cameras and mobile phone cameras popularize rapidly:

    • More and more personal photos;

    • Retrieving images from enormous collections of personal photos becomes an important topic.

?

How to retrieve?


Previous work

compare

Previous Work

  • Content-Based Image Retrieval (CBIR)

    • Users provide images as queries to retrieve personal photos.

  • The paramount challenge -- semantic gap:

    • The gap between the low-level visual features and the high-level semantic concepts.

query

result

Image with high-level concept

Semantic

Gap

Feature vectors

in DB

Low-level

Feature vector


A more natural way for consumer applications

database

annotate

compare

result

rank

A More Natural Way For Consumer Applications

  • Image annotation is used to classify images w.r.t. high-level semantic concepts.

    • Semantic concepts are analogous to the textual terms describing document contents.

  • An intermediate stage for textual query based image retrieval.

  • Let the user to retrieve the desirable personal photos using textual queries.

query

Annotation Result:

high-level concepts

Sunset


Our goal

Web

Images

Contextual

Information

building

people, family

information

people, wedding

sunset

… …

… …

Consumer Photos

Our Goal

  • Leverage information from web image to retrieve consumer photos in personal photo collection.

  • A real-time textual query based consumer photo retrieval system without any intermediate annotation stage.

  • Web images are accompanied by tags, categories and titles.

No intermediate image annotation process.

Web Images


System framework

Large Collection of

Web images

(with descriptive words)

Raw Consumer Photos

Automatic Web

Image Retrieval

Relevant/

Irrelevant

Images

Consumer

Photo Retrieval

Relevance

Feedback

Classifier

WordNet

Top-Ranked

Consumer

Photos

Refined

Top-Ranked

Photos

System Framework

  • When user provides a textual query,

Textual

Query

  • Then, a classifier is trained based on these web images.

  • It would be used to find relevant/irrelevant images in web image collections.

  • And then consumer photos can be ranked based on the classifier’s decision value.

  • The user can also gives relevance feedback to refine the retrieval results.


Automatic web image retrieval

… …

Relevant

Web Images

… …

Inverted

File

boat

… …

Irrelevant

Web Images

barge

ark

… …

… …

dredger

houseboat

Semantic Word Trees

Based on WordNet

Automatic Web Image Retrieval

“boat”

  • For user’s textual query, first search it in the semantic word trees.

  • The web images containing the query word are considered as “relevant web images”.

  • The web images which do not contain the query word and its two-level descendants are considered as “irrelevant web images”.


Decision stump ensemble

Decision Stump Ensemble

  • Train a decision stump on each dimension.

  • Combine them with their training error rates.


Why decision stump ensemble

Why Decision Stump Ensemble?

  • Main reason: low time cost

    • Our goal: a (quasi) real-time retrieval system.

    • For basic classifiers: SVMs are much slower;

    • For combination: boosting is also much slower.

  • The advantage of decision stump ensemble:

    • Low training cost;

    • Low testing cost;

    • Very easy to parallelize;


Asymmetric bagging

Asymmetric Bagging

  • Imbalance: count(irrelevant) >> count(relevant)

    • Side effects, e.g. overfitting.

  • Solution: asymmetric bagging

    • Repeat 100 times by using different randomly sampled irrelevant web images.

100 training sets

irrelevant

images

relevant

images


Relevance feedback

Relevance Feedback

  • The user labels nl relevant or irrelevant consumer photos.

    • Use this information to further refine the retrieval results;

  • Challenge 1: Usually nl is small;

  • Challenge 2: Cross-domain learning

    • Source classifier is trained on the web image domain.

    • The user labels some personal photos.


Method 1 cross domain combination of classifiers

Method 1: Cross-Domain Combination of Classifiers

  • Re-train classifiers with data from both domain?

    • Neither effective nor efficient;

  • A simple but effective method:

    • Train an SVM on the consumer photo domain with user-labeled photos;

    • Convert the responds of source classifier and SVM classifier to probability, and add them up;

    • Rank consumer photos based on this sum value.

  • Referred as DS_S+SVM_T.


Method 2 cross domain regularized regression cdrr

Method 2: Cross-Domain Regularized Regression (CDRR)

  • Construct a linear regression function fT(x):

    • For labeled photos: fT(xi) ≈ yi;

    • For unlabeled photos: fT(xi) ≈ fs(xi);

Source

Classifier


Using large scale web data to facilitate textual query based retrieval of consumer photos

User-labeled images x1,…,xl

f T(x) should be the user’s label y(x)

f T(x) should be f s(x)

Other images

A regularizer to control the complexity of the target classifier fT(x)

  • Design a target linear classifier fT(x) = wTx.

  • This problem can be solved with least square solver.


Hybrid method

Hybrid Method

  • A combination of two methods.

  • For labeled consumer photos:

    • Measure the average distance davg to their 30 nearest unlabeled neighbors in feature space;

    • If davg < ε: Use DS_S+SVM_T;

    • Otherwise: Use CDRR.

  • Reason:

    • For consumer photos which are visually similar to user-labeled images, they should be influenced more by user-labeled images.


Experimental results

Experimental Results


Dataset and experimental setup

Dataset and Experimental Setup

  • Web Image Database:

    • 1.3 million photos from photoSIG.

    • Relatively professional photos.

  • Text descriptions for web images:

    • Title, portfolio, and categories accompanied with web images;

    • Remove the common high-frequency words;

    • Remove the rarely-used words.

    • Finally, 21377 words in our vocabulary.


Dataset and experimental setup1

Dataset and Experimental Setup

  • Testing Dataset #1: Kodak dataset

    • Collected by Eastman Kodak Company:

      • From about 100 real users.

      • Over a period of one year.

    • 1358 images:

      • The first keyframe from each video.

    • 21 concepts:

      • We merge “group_of_two” and “group_of_three_or_more” to one concept.


Dataset and experimental setup2

Dataset and Experimental Setup

  • Testing Dataset #2: Corel dataset

    • 4999 images

      • 192x128 or 128x192.

    • 43 concepts:

      • We remove all concepts in which there are fewer than 100 images.


Visual features

Visual Features

  • Grid-Based color moment (225D)

    • Three moments of three color channels from each block of 5x5 grid.

  • Edge direction histogram (73D)

    • 72 edge direction bins plus one non-edge bin.

  • Wavelet texture (128D)

  • Concatenate all three kinds of features:

    • Normalize each dimension to avg = 0, stddev = 1

    • Use first 103 principal components.


Retrieval without relevance feedback

Retrieval without Relevance Feedback

  • For all concepts:

    • Average number of relevant images: 3703.5.


Retrieval without relevance feedback1

Retrieval without Relevance Feedback

  • kNN: rank consumer photos with average distance to 300-nn in the relevant web images.

  • DS_S: decision stump ensemble.


Retrieval without relevance feedback2

Retrieval without Relevance Feedback

  • Time cost:

    • We use OpenMP to parallelize our method;

    • With 8 threads, both methods can achieve interactive level.

    • But kNN is expected to cost much time on large-scale datasets.


Retrieval with relevance feedback

Retrieval with Relevance Feedback

  • In each round, the user labels at most 1 positive and 1 negative images in top-40;

  • Methods for comparison:

    • kNN_RF: add user-labeled photos into relevant image set, and re-apply kNN;

    • SVM_T: train SVM based on the user-labeled images in the target domain;

    • A-SVM: Adaptive SVM;

    • MR: Manifold Ranking based relevance feedback method;


Retrieval with relevance feedback1

Retrieval with Relevance Feedback

  • Setting of y(x) for CDRR:

    • Positive: +1.0;

    • Negative: -0.1;

  • Reason:

    • The top-ranked negative images are not extremely negative;

    • Positive: “what is”; Negative: “what is not”.

negative

images

positive

images


Retrieval with relevance feedback2

Retrieval with Relevance Feedback

  • On Corel dataset:


Retrieval with relevance feedback3

Retrieval with Relevance Feedback

  • On Kodak dataset:


Retrieval with relevance feedback4

Retrieval with Relevance Feedback

  • Time cost:

    • All methods except A-SVM can achieve real-time speed.


System demonstration

System Demonstration


Query sunset

Query: Sunset


Query plane

Query: Plane


The user is providing the relevance feedback

The User is Providing The Relevance Feedback …


After 2 pos 2 neg feedback

After 2 pos 2 neg feedback…


Summary

Summary

  • Our goal: (quasi) real-time textual query based consumer photo retrieval.

  • Our method:

    • Use web images and their surrounding text descriptions as an auxiliary database;

    • Asymmetric bagging with decision stumps;

    • Several simple but effective cross-domain learning methods to help relevance feedback.


Future work

Future Work

  • How to efficiently use more powerful source classifiers?

  • How to further improve the speed:

    • Control training time within 1 seconds;

    • Control testing time when the consumer photo set is very large.


Thank you

Thank you!

  • Any questions?


  • Login