1 / 13

Pivoting M-tree: A Metric Access Method for Efficient Similarity Search

Pivoting M-tree: A Metric Access Method for Efficient Similarity Search. Tomáš Skopal tomas.skopal @vsb.cz Department of Computer Science, V ŠB-Technical University of Ostrava. Presentation Outline. Similarity search in Metric Spaces M-tree PM-tree structure range queries

markku
Download Presentation

Pivoting M-tree: A Metric Access Method for Efficient Similarity Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pivoting M-tree:A Metric Access Method for Efficient Similarity Search Tomáš Skopaltomas.skopal@vsb.czDepartment of Computer Science, VŠB-Technical University of Ostrava

  2. Presentation Outline • Similarity search in Metric Spaces • M-tree • PM-tree • structure • range queries • hyper-ring storage • Experimental Results DATESO 2004

  3. Similarity search in Metric Spaces • Similarity search – methods for content-based retrieval in multimedia databases (in Information Retrieval resp.) • Similarity modelled by metricd: • Restriction to metric yields a paradigmatic discrepancy with several similarity theories – nevertheless, the triangular inequality is the basic tool for metric region construction leading to an efficient similarity search • Metric queries • range query (specified by pivot object Q and covering radius rQ) • k-NN query (specified by pivot object Q and number of nearest neighbours k) DATESO 2004

  4. Metric Access Methods • Designed to search in metric datasets in order to keep the search costs minimal (number of distance computation). • When searching large multimedia databases also the I/O search costs have to be minimized. • Many MAMs developed so far: M-tree, GH-tree, GNAT, LAESA, D-index, VP-tree, MVP-tree, SAT, ... • Majority of the MAMs is not suitable for similarity search in large datasets (either a static method or high I/O search costs) • only M-tree and (recently) D-index are suitable candidates DATESO 2004

  5. range query (euclidean 2D space) M-tree • dynamic, balanced, and paged metric tree (like e.g. B+-tree, R-tree) • the leaves are clusters of objects • routing entries in the inner nodes representmetric regions, recursively bounding the object clusters in leaves • during query evaluation, the triangular inequality allows discarding of irrelevantM-tree branches (metric regions resp.) DATESO 2004

  6. PM-tree, motivation • metric regions in M-treeare unnecessarily large indexing of large portions of empty space (the “dead” space) higher probability of intersection with query region less efficient search • reduction of metric region “volume” should lead to more effective discarding of irrelevant subtrees • the way is to specify a metric region bounding all the objects more “tightly” DATESO 2004

  7. PM-tree, structure Pivoting M-tree (PM-tree):a combination of M-tree with the pivot-based methods (LAESA-like) given a fixed set ofppivotsPi (selected from the dataset), a PM-tree region is additionaly defined byphyper-ring regions(Pi, HR[i]) each routing entry contains an array HR of p intervals <HR[i].min, HR[i].max> each interval HR[i] bounds the distances of objects to the respective pivot Pi intersection of the hyper-sphere and the hyper-rings forms a smaller region bounding all the objects the more pivots, the more thightly bounded region DATESO 2004

  8. query query PM-tree, query processing • prior to processing of a query (Q,rQ), distances d(Q, Pi) for all i ≤ p must be computed • metric region is relevant to a range query just in case that all the hyper-rings and the hyper-sphere intersect the range query region  the more hyper-rings, the lower probability of intersection with query  no additional distance computations are needed for the intersection test M-tree region PM-tree region DATESO 2004

  9. storage of HR array Oi, r, ptr(T), ... HR[1],HR[2],...,HR[p] PM-tree, hyper-ring storage • The routing entries of PM-tree nodes are enlarged by the additional pivot-based information stored in HR arrays • To keep the space overhead minimal, a compact storage of HR[i] intervals is necessary • A distance histogram for each pivot Pi is created, and interval <dimin, dimax> is chosen such that e.g. 90% of distances in the distance histogram fall into that interval • Each value HR[i].min, HR[i].max, is scaled to the <dimin, dimax> interval using a single byte, i.e. each hyper-ring HR[i] takes 2 bytes DATESO 2004

  10. Experimental results (synthetic) • synthetic dataset of 100,000 30-dimensional tuples distributed within 1000 clusters, L2 distance, query selectivity 50 objs. DATESO 2004

  11. Experimental results (images) • collection of 10,000 images represented by 256-dimensional vectors (gray histograms), L2 distance, query selectivity 50 objs. DATESO 2004

  12. Recent results(not included in proceedings) • Cost models for range queries in PM-tree (ADBIS‘04) • Experiments on image dataset (ADBIS‘04) • Optimal k-NN query algorithm for PM-tree + cost models (to be published...) DATESO 2004

  13. Reference [1] Skopal T., Pokorný J., Snášel V.: PM-tree: Pivoting Metric Tree for Similarity Search in Multimedia Databases, submitted to ADBIS 2004, Budapest, Hungary [2] Skopal T.: Pivoting M-tree: A Metric Access Method for Efficient Similarity Search, DATESO 2004, Desná [3] Skopal T., Pokorný J., Krátký M., Snášel V.: Revisiting M-tree Building Principles. ADBIS 2003, LNCS2798, Springer-Verlag, Dresden, Germany DATESO 2004

More Related