Object retrieval using visual query context
1 / 26

Object Retrieval Using Visual Query Context - PowerPoint PPT Presentation

  • Uploaded on

Object Retrieval Using Visual Query Context. Linjun Yang Bo Geng Yang Cai Alan Hanjalic Xian-Sheng Hua. Presented By: Shimon Berger. What is a Visual Query?. TinEye Google Image Search Google Goggles. Current Shortcomings. Bounding box Complex shapes User inaccuracy

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Object Retrieval Using Visual Query Context' - gaille

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Object retrieval using visual query context

Object Retrieval Using Visual Query Context

Linjun Yang

Bo Geng

Yang Cai

Alan Hanjalic

Xian-Sheng Hua

Presented By: Shimon Berger

What is a visual query
What is a Visual Query?

  • TinEye

  • Google Image Search

    • Google Goggles

Current shortcomings
Current Shortcomings

  • Bounding box

    • Complex shapes

    • User inaccuracy

  • Issues with the image itself

    • Too small

    • Lacks texture

How can w e i mprove a visual query
How Can We Improve a Visual Query?

Objects in real-life aren’t bound by a box


  • Introduce a contextual object retrieval (COR) model

  • Evaluate experimentally using 3 image datasets

  • Demonstrate the benefit of introducing contextual data into the query

Existing methods
Existing Methods

  • Relevance feedback

  • “Bag of visual words”

    • Scale-invariant feature transform (SIFT)

  • Cosine retrieval model

  • Language modeling

Proposed cor model
Proposed COR Model

  • Based on the Kullbak-Leibler retrieval model

    • Detect interest points

    • Extract SIFT descriptors

    • Convert into visual words

    • Match words to documents in a database

  • Uses Jelinek-Mercer smoothing method

    • Captures important patterns, while removing noise

Cor model
COR Model

  • Begins with contrast-based saliency detection

    • Produces saliency score

    • Uses  as a control variable

  • Estimate search intent score for each visual word

    • Indicates probability of a given visual word to reflect user’s search intent

Cor search intent score
COR Search Intent Score

  • Standard LM approach uses binary search intent score

  • Two proposed algorithms to compute SI from bounding box with context:

    • Based on pixel distance from bounding box (spatial propagation)

    • Based on color coherence of the pixels (appearance propagation)

Spatial propagation cor a
Spatial Propagation (CORa)

  • Bounding box is usually rough and inaccurate

    • Lack of user effort

    • Limiting rectangular shape

  • Use smoothed approximation of bounding box

    • Dual-sigmoid function

    • Uses  as a control variable

Appearance propagation cor m
Appearance Propagation (CORm)

  • Assign high scores to object of interest, normally in foreground

  • Assign low scores to background objects, or objects of no interest

  • Similar to image matting

    • Separate foreground and background using alpha values

    • Separate relevant objects from irrelevant in bounding box

Appearance propagation cor m1
Appearance Propagation (CORm)

Three step approach:

  • Estimate foreground and background models guided by bounding box

    • GrabCut algorithm

  • Use models to select foreground and background pixels

  • Search intent score estimated based on pixel information

    • Use pseudo-foreground and -background pixels to account for spatial smoothness

    • Top 10% of foreground pixels from inside box and top 20% of background pixels from outside box

Cor m in experiments
CORmIn Experiments

  • CORmis broken down into 2 variations:

    • CORg

      • Only uses GrabCut algorithm, not all 3 steps

    • CORw

      • Uses alpha values based on weighted foregroundprobability


  • Experiments performed using 3 image datasets:

    • Oxford5K

    • Oxford5K+ImageNet500K

    • Web1M

  • # 1, 2 use 11 landmarks (55 total images) as queries

  • # 3 adds an additional 45 images

    • Randomly selected

    • Various categories


  • COR models compared to 2 baseline retrieval models:

    • Cosine

    • General language modeling (context-unaware)

  • Baseline models only use visual words from inside bounding box

  • All models evaluated in terms of average precision (AP)

    • AP over all queries are averaged to obtain mean average precision (MAP)

Web1m dataset
Web1M Dataset

Best performance enhancement on landmarks:

Control parameters
Control Parameters

  •  is the control for saliency

  •  is the control for the reliability of the bounding box

Future work
Future Work

  • Context-aware multimedia retrieval

    • Using the contextual information shown here

    • Text surrounding query image

    • User logs and history