1 / 22

Searching in High-Dimensional Spaces

Searching in High-Dimensional Spaces Index Structures for Improving the Performance of Multimedia Databases. Christian Böhm, Stefan Berchtold, Daniel A. Keim ACM Computing Surveys, 2001. Introduction. Multimedia databases have become increasingly important in many application areas

ilana
Download Presentation

Searching in High-Dimensional Spaces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Searching in High-Dimensional Spaces Index Structures for Improving the Performance of Multimedia Databases Christian Böhm, Stefan Berchtold, Daniel A. Keim ACM Computing Surveys, 2001

  2. Introduction • Multimedia databases have become increasingly important in many application areas • Content-based retrieval of similar objects • Similarity search • Feature transformation • Multimedia object → high dimensional points (feature vector) • Search of points in the feature space that are close to a given query point

  3. Similarity Queries • Basic idea of feature-based similarity search Feature Transformation Insert ε-Searchor NN-Search Complex Data Objects High-Dim. Feature Vectors High-Dim. Index NN Range query Nearest-neighbor query

  4. Effects in High-Dimensional Space • Curse of dimensionality • Can you imagine 5 or 10-dimension? • “Every d-dimensional sphere touching (or intersecting) the (d-1)-dimensional boundaries of the data space contains c” • What happen if d=16?

  5. Effects in High-Dimensional Space • Issues • Exponential growth of volume • Space partitioning • The majority of the data pages are located at the surface of the data space rather than in the interior • Coarse partitioning 0.917 0.5 0.917 0.25 0.5

  6. Common Principles • Structure & Regions • Hierarchical clustering • Spatially adjacent vectors are likely to reside in the same node

  7. Basic Algorithms • Index construction • Insert, Delete, and Update • Query processing • Exact match query • Range query • Nearest-neighbor query • Ranking query (generalized k-nearest-neighbor query) • Reverse nearest-neighbor query

  8. Nearest-Neighbor Query • No fixed criterion, known a priori, to exclude branches of the indexing structure • The criterion is the nearest-neighbor distance • But it is not known until the algorithm has terminated • Pessimistic estimation • The closest point among all points visited (closest point candidate)

  9. Nearest-Neighbor Query • RKV algorithm • MINDIST : the actual distance between the query point and page region • MINMAXDIST : estimation of the nearest neighbor distance • ‘Depth-first’ and ‘Branch and bound’ traversal MINMAXDIST MINDIST

  10. Nearest-Neighbor Query • HS algorithm • Access all pages of the index in the order of increasing distance to the query point • Active page list (APL)

  11. Nearest-Neighbor Query • Comparison • RKV • pr1 → pr12 → pr11 →… • HS • pr1 → pr2 → pr21

  12. Index Structures • Minimum bounding rectangles • R-tree family • X-tree • Bounding spheres • SS-tree • TV-tree • Combined regions • SR-tree • Etc. • Space filling curves • Pyramid-tree

  13. R, R*, R+-Tree • Overlap problem • For an overlap-free split, a dimension is needed in which the projections of the page regions have no overlap at some point • Existence of such a point becomes less likely as the dimension of the data space increases • R+ tree • An overlap-free variant of the R-tree using a forced-split strategy • High dimensionality leads to many forced-split operations. • Storage utilization < 50% a A

  14. X-Tree • Extension of the R*-tree • Designed for the management of high-dimensional objects • Overlap-free split (split history) • Supernodes (unbalanced split tree)

  15. kd-Tree • Advantage • Guarantee of no overlap • Disadvantages • Complete partitioning • Page regions are generally larger than necessary which yields a higher access probability • Unbalanced

  16. kd-Tree • kd-B-tree • Balanced kd-tree • Forced split • hB-tree • Splitting a node based on multiple attributes • Forced split is avoided • LSDh-tree • Coded region description • Reduce space requirement

  17. SS-Tree • Spheres as page regions • Split • Split axis is determined as the dimension yielding the highest variance • Not amenable to an easy overlap-free split

  18. Space Filling Curves • Range and nearest-neighbor queries based on distance calculations of page regions lb : 47 = 101111 ub : 60 =111100 longest common prefix : p =1 s = <p100…000> = 110000 = 48 q lb : 48 = 110000 ub : 60 =111100 longest common prefix : p =11 s = <p100…000> = 111000 = 56 I21 I I2 I22 I1

  19. Pyramid Tree • Divide the data space such that the resulting partitions are shaped like peels of an onion • Pyramid mapping • Optimized for range queries on high-dim. data • Not affected by the curse of dimensionality

  20. Summary & Comparison

  21. Summary & Comparison

  22. Conclusions • Effects occurring in indexing high-dim. spaces • Principal ideas of the index structures that have been proposed to overcome the problems • Research on high-dim. indexing has a major impact on many practical applications and commercial multimedia database system • Future Research Issues • Real case (not uniform and not independent data) • Partitioning strategies that perform well in high-dim. • Approximate processing of NN queries

More Related