Techniques and data structures for efficient multimedia similarity search
Download
1 / 24

Techniques and Data Structures for Efficient Multimedia Similarity Search - PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on

Techniques and Data Structures for Efficient Multimedia Similarity Search. Reading Assignment. Guojun Lu, "Multimedia Database Management Systems", Artech House, Publishers, 1999. Chapter 9: Techniques and Data Structures for Efficient Multimedia Similarity Search. Introduction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Techniques and Data Structures for Efficient Multimedia Similarity Search' - tad


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Reading assignment
Reading Assignment Similarity Search

  • Guojun Lu, "Multimedia Database Management Systems", Artech House, Publishers, 1999.

    • Chapter 9: Techniques and Data Structures for Efficient Multimedia Similarity Search


Introduction
Introduction Similarity Search

  • Feature space is multidimensional.

  • Objective: divide the multidimensional space into subspaces, so that only a few subspaces need to be searched for each query.


Query types
Query Types Similarity Search

  • Point queries

  • Range queries

  • k nearest-neighbor queries


Operations on data structures
Operations on Data Structures Similarity Search

  • Search

    • Most important as it is done “on-line”

    • Must be efficient

  • Insertion and Deletion

    • Less important as they can be performed “off-line”

    • Need not be efficient


Efficient search techniques
Efficient Search Techniques Similarity Search

  • Filtering

  • B and B+-trees

  • Clustering

  • MB+-trees

  • k-d trees

  • Grid files

  • R trees

  • TV trees


Filtering
Filtering Similarity Search

  • Reduce the search space by selecting specific items satisfying certain criteria.

  • Search on the multidimensional feature vector is carried out on the selected set of items only.


Filtering with structured attributes classification
Filtering with Structured Attributes/Classification Similarity Search

  • Structured attributes associated with multimedia data.

    • e.g. date

  • Subject classification

    • e.g. image-content type=“Natural Scene”


Filtering using the triangle inequality
Filtering using the triangle inequality Similarity Search

  • d is a distance metric and i,q, and k are feature vectors for three objects.

  • Most feature distance measures are “metrics”

    • d(i,k)=0 iff i=k

    • d(i,k)= d(k,i)

    • Triangle inequality above


Application of triangle inequality to multimedia retrieval
Application of Triangle Inequality to Multimedia Retrieval Similarity Search

  • MDB Representation

    • Select m feature vectors as a comparison base F, where m << |MDB|, the size of the database

    • iMDB fjF, calculate d(i,fj) and store in MDB.

  • MDB retrieval for query q

    • Calculate d(q,fj) fjF.

    • Compute l(i) = max 1jm |d(i,fj)- d(q,fj)|

    • Find d(i,q) iMDB  l(i)  T, where T is a specified threshold.


Example
Example Similarity Search

  • Find database items whose distance to query q < 3 where the distances between q and the 2 vectors f1 and f2 forming the base are 3 and 4, respectively


Filtering in the color histogram based retrieval
Filtering in the Color Histogram-Based Retrieval Similarity Search

  • Space reduction can take place by

    • Reducing the number of bins

    • Selecting only a subset of the database images for calculating the distances from the query image.

  • Space reduction is achieved through the following process:

    • Select potential image candidates.

    • Use full histogram comparison to calculate the distance between the query and the candidates.


Selecting potential image candidates
Selecting Potential Image Candidates Similarity Search

  • Use very few bins in comparing color histograms.

  • Use the average color of images.


B trees 1
B-Trees (1) Similarity Search

  • B-Trees are always balanced.

  • B-Trees keep similar-valued records together on a disk page, which takes advantage of locality of reference.

  • B-Trees guarantee that every node in the tree will be full at least to a certain minimum percentage. This improves space efficiency while reducing the typical number of disk fetches necessary during a search or update operation.


B tree definition
B-Tree Definition Similarity Search

A B-Tree of order m has these properties:

  • The root is either a leaf or has at least two children.

  • Each node, except for the root and the leaves, has between m/2 and m children.

  • All leaves are at the same level in the tree, so the tree is always height balanced.

    A B-Tree node is usually selected to match the size of a disk block.

  • A B-Tree node could have hundreds of children.


B tree search

50 Similarity Search

B-Tree Search

Search in a B-Tree is a generalization of search in a 2-3 Tree.

  • Do binary search on keys in current node. If search key is found, then return record. If current node is a leaf node and key is not found, then report an unsuccessful search.

  • Otherwise, follow the proper branch and repeat the process.

20

30

70

10

12

13

21

23

35

37

40

55

59

60

73

80


B trees
B Similarity Search+-Trees

The most commonly implemented form of the B-Tree is the B+-Tree.

Internal nodes of the B+-Tree neither store records nor pointers to records. They only store key values to guide the search.

Leaf nodes store records or pointers to records.

A leaf node may store more or less records than an internal node stores keys.

Q: What is the advantage of using a B+-tree over a B-tree implementation?


B tree example
B Similarity Search+-Tree Example

Search for 21?


B tree insertion
B Similarity Search+-Tree Insertion

Insert the following keys into an initially empty B+ tree of order 4 where each leaf node can hold a maximum of 5 records:

12 , 23 , 10 , 48 , 33 , 50 , 15 , 18 , 20 , 45 , 47 , 31 , 52 , 21 , 30

What is the cost of direct access to a B+ tree of order n with N database items?


B tree deletion delete 18
B Similarity Search+-Tree Deletion (Delete 18)


B tree deletion delete 12
B Similarity Search+-Tree Deletion (Delete 12)


B tree deletion delete 33
B Similarity Search+-Tree Deletion (Delete 33)


Clustering
Clustering Similarity Search

  • Similar information items are grouped together to form a cluster based on a certain similarity measurement.

  • Each cluster is represented by the centroid of the feature vectors of that cluster.

  • How is search carried out?


2d example
2D Example Similarity Search

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x


ad