Supporting Noise-Free Queries in Large Image Databases

Supporting Noise-Free Queries in Large Image Databases

Image Retrieval Database Images Query Image Image Database Feature Extraction Feature Extraction Select Compare Metadatabase Feature Vectors Query Result

Noise-Free Queries (NFQ’s) • NFQ is more precise. • User can specify semantic constraints: • Spatial constraints (relative distances) • Scaling constraints (relative sizes) Rectangular query Noise-free query Similar Less relevant

Challenges • How do we extract features if we do not know the matching areas beforehand ? • How do we index the images ? Noise-free query

One Solution – Local Color Histogram (LCH) • Each subimage has a color histogram. • Any combination of the histograms can be selected for comparison with the corresponding color histograms of the query image.

Limitations of LCH • Dilemma: • Using large partitions is not precise • Using small partitions is too expensive • Limitation: • difficult to handle scaling

Sampling-Based Approach • Idea: • Sampling 113 16x16 blocks • Comparing only the relevant blocks • Low storage overhead • Support NFQ’s • Robust to translation and scaling • Support spatial and scaling constraints Advantages

Handling Scaling Sampling query at 3 different rates A fixed sampling rate for all database images (a) A higher rate to find larger matching objects (b) The same rate to find matching objects of the same size (c) A lower rate to find smaller matching objects (d)

Subimages • We slide square windows of sizes 25, 41, 61, 85, and 113 sampled blocks over each database image. • 85 indexing subimages are captured at various sliding position.

Signature Computation • For each indexing subimage, we compute its signature as the seven average-variance pairs. • One from all the enclosed sampled blocks. • Four from sampled blocks in the four quarters, • Two from sampled blocks along the two diagonals • The first component of the signature is called the short signature.

Indexing • For each image, we map its 85 subimages into signature points in a 14-dimensional signature space. • For each image, we cluster its 85 signature points into five MBRs (Minimum Bounding Region). • We insert these MBRs into an R* tree (height balanced and reinsert when overflow).

Query Processing - Preparation • Sample the query image at different rates. For each sampling rate, do the following steps. • Determine the core area that contains the maximum number of relevant sampled blocks and least noise. • Determine the query rectangle • Compute the signature of the core area.

Query Processing – Search • Retrieve relevant clusters (or MBRs) from the R* tree using the query rectangle. • Eliminate irrelevant subimages (in the qualified MBRs) using the short signature. • Each subimage passing the above test is compared against the original NFQ by matching the corresponding sampled blocks. • Each image with a matching subimage is retrieved.

Query Processing – Summary • Sampling the NFQ at different rates • Determine the core area and compute its signature • Determine the query rectangle • Retrieve relevant clusters (or MBRs) from the R* tree • Eliminate candidate subimages using the short signature • Matching the sampled blocks of the remaining subimages

Performance Comparison • LCH • NFQ-capable • Correlogram • one of the best whole matching techniques • Can Correlogram support NFQ ?

Experimental Studies • Database: 15,808 images of various categories • Workload: 100 queries • Type 1: Query and database images have the same size; and the NFQ covers less than half of the query image (30 queries) • Type 2: Query and database images have the same size; and the NFQ covers more than one half of the query image (20 queries) • Type 3: query and database images have different sizes (50 queries)

Type-3 Queries • Only SamMatch can handle Type-3 queries. • In the following example, there is no easy way to match the two identical apples using LCH.

Performance Results (Type 1) SM Corr. LCH

Performance Results (Type 2)

Performance Results (Type 3) The sizes of queries are different from those of database images Query 4 2 3 5 12 18 Query 3 216 396 2

Performance Metric • Ai denotes a relevant image returned by the system • S is the scope of the query (i.e., maximum number of images returned) • q is the total number of relevant images in the database. Rationale: Low-ranked Images do not make it to the user.

Reliability Type-1 Results Type-2 Results Type-3 Results

Time & Space • Assumption: No quick-and-dirty filtering, no indexing (since the LCH and Correlogram designs do not use them). • SamMatch requires much less storage overhead • LCH uses 21 color histograms: 21  256  2 bytes • Correlogram uses 4 color histograms: 4  256  2 bytes • SamMatch uses one byte per sampled block: 113 bytes • In terms of exhaustive search, SamMatch is • one time faster than Correlogram, and • two times faster than LCH

Concluding Remarks • Reducing noise interference is essential to achieving more reliable image retrieval • SamMatch supports NFQs effectively and efficiently • Two times faster than LCH, and one time faster than Correlogram • Other benefits of SamMatch include: • Matching objects at different scales • Uncovering translations of the matching areas • Handling spatial and scaling constraints • SamMatch uses less than 1/16 the storage space required by LCH and Correlogram

Supporting Noise-Free Queries in Large Image Databases

Supporting Noise-Free Queries in Large Image Databases

Presentation Transcript

Supporting Noise-Free Queries in Large Image Databases

Supporting Top- k join Queries in Relational Databases

SPARQ2L : Towards Supporting Subgraph Extraction Queries in RDF Databases

Interactively Browsing Large image databases

Databases – Queries and Database Practice Queries

Supporting visual queries

Supporting SQL Queries for subsetting large-SCALE Datasets in Paraview

An Efficient Approach to Clustering in Large Multimedia Databases with Noise

Supporting top-k join queries in relational databases

Supporting top-k join queries in relational databases

Stereovision Image Noise

Supporting Top- k join Queries in Relational Databases

Probabilistic Similarity Queries in Uncertain Databases

Image Databases

Image Noise

Supporting Top- k join Queries in Relational Databases

Introduction to Databases Queries

Data, Databases, and Queries

Large Databases in Industry

Data, Databases, and Queries