Object retrieval using visual query context
Sponsored Links
This presentation is the property of its rightful owner.
1 / 26

Object Retrieval Using Visual Query Context PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Object Retrieval Using Visual Query Context. Linjun Yang Bo Geng Yang Cai Alan Hanjalic Xian-Sheng Hua. Presented By: Shimon Berger. What is a Visual Query?. TinEye Google Image Search Google Goggles. Current Shortcomings. Bounding box Complex shapes User inaccuracy

Download Presentation

Object Retrieval Using Visual Query Context

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Object retrieval using visual query context

Object Retrieval Using Visual Query Context

Linjun Yang

Bo Geng

Yang Cai

Alan Hanjalic

Xian-Sheng Hua

Presented By: Shimon Berger

What is a visual query

What is a Visual Query?

  • TinEye

  • Google Image Search

    • Google Goggles

Current shortcomings

Current Shortcomings

  • Bounding box

    • Complex shapes

    • User inaccuracy

  • Issues with the image itself

    • Too small

    • Lacks texture

Bad query image vs good query image

Bad Query Image vs. Good Query Image

How can w e i mprove a visual query

How Can We Improve a Visual Query?

Objects in real-life aren’t bound by a box



  • Introduce a contextual object retrieval (COR) model

  • Evaluate experimentally using 3 image datasets

  • Demonstrate the benefit of introducing contextual data into the query

Existing methods

Existing Methods

  • Relevance feedback

  • “Bag of visual words”

    • Scale-invariant feature transform (SIFT)

  • Cosine retrieval model

  • Language modeling

Proposed cor model

Proposed COR Model

  • Based on the Kullbak-Leibler retrieval model

    • Detect interest points

    • Extract SIFT descriptors

    • Convert into visual words

    • Match words to documents in a database

  • Uses Jelinek-Mercer smoothing method

    • Captures important patterns, while removing noise

Cor model

COR Model

  • Begins with contrast-based saliency detection

    • Produces saliency score

    • Uses  as a control variable

  • Estimate search intent score for each visual word

    • Indicates probability of a given visual word to reflect user’s search intent

Cor search intent score

COR Search Intent Score

  • Standard LM approach uses binary search intent score

  • Two proposed algorithms to compute SI from bounding box with context:

    • Based on pixel distance from bounding box (spatial propagation)

    • Based on color coherence of the pixels (appearance propagation)

Spatial propagation cor a

Spatial Propagation (CORa)

  • Bounding box is usually rough and inaccurate

    • Lack of user effort

    • Limiting rectangular shape

  • Use smoothed approximation of bounding box

    • Dual-sigmoid function

    • Uses  as a control variable

Spatial propagation cor a1

Spatial Propagation (CORa)

Appearance propagation cor m

Appearance Propagation (CORm)

  • Assign high scores to object of interest, normally in foreground

  • Assign low scores to background objects, or objects of no interest

  • Similar to image matting

    • Separate foreground and background using alpha values

    • Separate relevant objects from irrelevant in bounding box

Appearance propagation cor m1

Appearance Propagation (CORm)

Three step approach:

  • Estimate foreground and background models guided by bounding box

    • GrabCut algorithm

  • Use models to select foreground and background pixels

  • Search intent score estimated based on pixel information

    • Use pseudo-foreground and -background pixels to account for spatial smoothness

    • Top 10% of foreground pixels from inside box and top 20% of background pixels from outside box

Cor m in experiments

CORmIn Experiments

  • CORmis broken down into 2 variations:

    • CORg

      • Only uses GrabCut algorithm, not all 3 steps

    • CORw

      • Uses alpha values based on weighted foregroundprobability



  • Experiments performed using 3 image datasets:

    • Oxford5K

    • Oxford5K+ImageNet500K

    • Web1M

  • # 1, 2 use 11 landmarks (55 total images) as queries

  • # 3 adds an additional 45 images

    • Randomly selected

    • Various categories



  • COR models compared to 2 baseline retrieval models:

    • Cosine

    • General language modeling (context-unaware)

  • Baseline models only use visual words from inside bounding box

  • All models evaluated in terms of average precision (AP)

    • AP over all queries are averaged to obtain mean average precision (MAP)



Ap for different landmarks on oxford5k dataset

AP for different landmarks on Oxford5K dataset.

Ap for different landmarks on oxford5k imagenet500k dataset

AP for different landmarks on Oxford5K+ImageNet500K dataset.

Ap for different queries on web1m dataset

AP for different queries on Web1M dataset.

Web1m dataset

Web1M Dataset

Best performance enhancement on landmarks:

Control parameters

Control Parameters

  •  is the control for saliency

  •  is the control for the reliability of the bounding box

Future work

Future Work

  • Context-aware multimedia retrieval

    • Using the contextual information shown here

    • Text surrounding query image

    • User logs and history

  • Login