Techniques and data structures for efficient multimedia similarity search
Download
1 / 24

Techniques and Data Structures for Efficient Multimedia Similarity Search - PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on

Techniques and Data Structures for Efficient Multimedia Similarity Search. Reading Assignment. Guojun Lu, "Multimedia Database Management Systems", Artech House, Publishers, 1999. Chapter 9: Techniques and Data Structures for Efficient Multimedia Similarity Search. Introduction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Techniques and Data Structures for Efficient Multimedia Similarity Search' - tad


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Reading assignment
Reading Assignment Similarity Search

  • Guojun Lu, "Multimedia Database Management Systems", Artech House, Publishers, 1999.

    • Chapter 9: Techniques and Data Structures for Efficient Multimedia Similarity Search


Introduction
Introduction Similarity Search

  • Feature space is multidimensional.

  • Objective: divide the multidimensional space into subspaces, so that only a few subspaces need to be searched for each query.


Query types
Query Types Similarity Search

  • Point queries

  • Range queries

  • k nearest-neighbor queries


Operations on data structures
Operations on Data Structures Similarity Search

  • Search

    • Most important as it is done “on-line”

    • Must be efficient

  • Insertion and Deletion

    • Less important as they can be performed “off-line”

    • Need not be efficient


Efficient search techniques
Efficient Search Techniques Similarity Search

  • Filtering

  • B and B+-trees

  • Clustering

  • MB+-trees

  • k-d trees

  • Grid files

  • R trees

  • TV trees


Filtering
Filtering Similarity Search

  • Reduce the search space by selecting specific items satisfying certain criteria.

  • Search on the multidimensional feature vector is carried out on the selected set of items only.


Filtering with structured attributes classification
Filtering with Structured Attributes/Classification Similarity Search

  • Structured attributes associated with multimedia data.

    • e.g. date

  • Subject classification

    • e.g. image-content type=“Natural Scene”


Filtering using the triangle inequality
Filtering using the triangle inequality Similarity Search

  • d is a distance metric and i,q, and k are feature vectors for three objects.

  • Most feature distance measures are “metrics”

    • d(i,k)=0 iff i=k

    • d(i,k)= d(k,i)

    • Triangle inequality above


Application of triangle inequality to multimedia retrieval
Application of Triangle Inequality to Multimedia Retrieval Similarity Search

  • MDB Representation

    • Select m feature vectors as a comparison base F, where m << |MDB|, the size of the database

    • iMDB fjF, calculate d(i,fj) and store in MDB.

  • MDB retrieval for query q

    • Calculate d(q,fj) fjF.

    • Compute l(i) = max 1jm |d(i,fj)- d(q,fj)|

    • Find d(i,q) iMDB  l(i)  T, where T is a specified threshold.


Example
Example Similarity Search

  • Find database items whose distance to query q < 3 where the distances between q and the 2 vectors f1 and f2 forming the base are 3 and 4, respectively


Filtering in the color histogram based retrieval
Filtering in the Color Histogram-Based Retrieval Similarity Search

  • Space reduction can take place by

    • Reducing the number of bins

    • Selecting only a subset of the database images for calculating the distances from the query image.

  • Space reduction is achieved through the following process:

    • Select potential image candidates.

    • Use full histogram comparison to calculate the distance between the query and the candidates.


Selecting potential image candidates
Selecting Potential Image Candidates Similarity Search

  • Use very few bins in comparing color histograms.

  • Use the average color of images.


B trees 1
B-Trees (1) Similarity Search

  • B-Trees are always balanced.

  • B-Trees keep similar-valued records together on a disk page, which takes advantage of locality of reference.

  • B-Trees guarantee that every node in the tree will be full at least to a certain minimum percentage. This improves space efficiency while reducing the typical number of disk fetches necessary during a search or update operation.


B tree definition
B-Tree Definition Similarity Search

A B-Tree of order m has these properties:

  • The root is either a leaf or has at least two children.

  • Each node, except for the root and the leaves, has between m/2 and m children.

  • All leaves are at the same level in the tree, so the tree is always height balanced.

    A B-Tree node is usually selected to match the size of a disk block.

  • A B-Tree node could have hundreds of children.


B tree search

50 Similarity Search

B-Tree Search

Search in a B-Tree is a generalization of search in a 2-3 Tree.

  • Do binary search on keys in current node. If search key is found, then return record. If current node is a leaf node and key is not found, then report an unsuccessful search.

  • Otherwise, follow the proper branch and repeat the process.

20

30

70

10

12

13

21

23

35

37

40

55

59

60

73

80


B trees
B Similarity Search+-Trees

The most commonly implemented form of the B-Tree is the B+-Tree.

Internal nodes of the B+-Tree neither store records nor pointers to records. They only store key values to guide the search.

Leaf nodes store records or pointers to records.

A leaf node may store more or less records than an internal node stores keys.

Q: What is the advantage of using a B+-tree over a B-tree implementation?


B tree example
B Similarity Search+-Tree Example

Search for 21?


B tree insertion
B Similarity Search+-Tree Insertion

Insert the following keys into an initially empty B+ tree of order 4 where each leaf node can hold a maximum of 5 records:

12 , 23 , 10 , 48 , 33 , 50 , 15 , 18 , 20 , 45 , 47 , 31 , 52 , 21 , 30

What is the cost of direct access to a B+ tree of order n with N database items?


B tree deletion delete 18
B Similarity Search+-Tree Deletion (Delete 18)


B tree deletion delete 12
B Similarity Search+-Tree Deletion (Delete 12)


B tree deletion delete 33
B Similarity Search+-Tree Deletion (Delete 33)


Clustering
Clustering Similarity Search

  • Similar information items are grouped together to form a cluster based on a certain similarity measurement.

  • Each cluster is represented by the centroid of the feature vectors of that cluster.

  • How is search carried out?


2d example
2D Example Similarity Search

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x


ad