1 / 15

Improving the Performance of M-tree Family by Nearest-Neighbor Graphs

Improving the Performance of M-tree Family by Nearest-Neighbor Graphs. Tom áš Skopal, David Hoksza Charles University in Prague Department of Software Engineering Czech Republic. Presentation Outline. Metric Access Methods (MAMs) M-tree, PM-tree Query processing and Filtering

Download Presentation

Improving the Performance of M-tree Family by Nearest-Neighbor Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in PragueDepartment of Software Engineering Czech Republic

  2. Presentation Outline • Metric Access Methods (MAMs) • M-tree, PM-tree • Query processing and Filtering • Nearest-neighbor graphs → M*-tree, PM*-tree • filtering • pivot selection strategies • Experiments ADBIS 2007

  3. Metric Access Methods • Indexing methods designed for searching metric datasets • Similarities among objects are modeled by a distance function which fulfills metric properties • MAMs focus on minimizing number of distance computations by storing the distances in index, thus filtering non-relevant objects when querying • Methods • GNAT, (m)vp-tree, D-index, (L)AESA, … • M-tree, PM-tree ADBIS 2007

  4. M-tree (Metric tree) • dynamic, hierarchical index structure • data space divided into ball shaped data regions (hyper-spheres) • root node represent data region covering all data • children nodes represent regions covering parts of the space, … • built in bottom-up way like b-tree • when node is full, new node is created and the objects are separated be • data regions form balanced hierarchical structure • inner nodes → routing entries • leaf nodes → ground items ADBIS 2007

  5. Query Processing + Filtering • range and k nearest neighbor (kNN) queries • traversing from the root node • in case of kNN dynamically decreasing query radius • basic filtering→ filter out nodes whose parent data region doesn’t intersect the query region • parent filtering→ using precomputed distance of an object to the parent and of the parent to the query ADBIS 2007

  6. query query PM-tree (Pivoting Metric tree) • PM-tree = M-tree enhanced by p static global pivots and each hyper-sphere region enhanced by p hyper-ring regions – rings which restrict it’s volume • ith ring defined by nearest and furthest objects in the node according to ith pivot • query region overlaps node region only if it overlaps hyper-sphere and all hyper-rings → more effective basic filtering Q doesn’t overlap 2. ring Q Q M-tree region PM-tree region ADBIS 2007

  7. Pivot space • global pivots map regions/data into a pivot space of dimensionality p (ith coordinate → distance to ith pivot) • distances of a data region to p pivots produces p-dimensional minimum bounding rectangle • the overlap with rings can be understood in this sense as L∞ filtering (region is filtered out if it’s L∞ distance to Q is smaller then the query radius) ADBIS 2007

  8. M*-tree, PM*-tree • M*-tree = M-tree + nearest-neighbor (NN) graphs • present in every node • each object knows it’s NN (within it’s node) • example → • PM*-tree = PM-tree + nearest-neighbor (NN) graphs O6 = NN(O4) ADBIS 2007

  9. NN-graph Filtering • objects (NN graph nodes) play role of mutual local pivots • sacrifice • local pivot • object whose distance to the query is really computed by query evaluation • used for possible filtering of reverse nearest neighbours (rNNs) • filtering with NN-graph (one step of node processing) • fetch first record (Si) from sacrifices queue (SQ) • apply parent filtering to Si • If Si not filtered → sacrifice (compute Q-Si distance) • try to filter out rNNs(Si) (NN-graph filtering) • move non-filtered rNNs(Si) to the beginning of SQ (rNNs sets are disjoint → non-filtered become sacrifices) • apply basic filtering to Si ADBIS 2007

  10. Sacrifice selection • selection of sacrifices is important • good pivot filters many objects out • poor pivot filters good possible pivot(s) (future sacrifices) • Heuristics • M*-tree • hMaxRNNCount • first in SQ is object with highest number of rNNs • hMinRNNDistance • first in SQ is object nearest to its NN or rNN • hMinToParentDistance • first in SQ is object closest to parent object • PM*-tree • hMinLmaxDistance • first in SQ is object with minimum L∞ distance • hMaxLmaxDistance • first in SQ is object with maximum L∞ distance ADBIS 2007

  11. Experimental Results • Corel dataset • 65,615 feature vectors of images • L1 distance function • 8 dimensions • Polygons dataset • synthetic • 1,000,000 randomly generated 2D polygons (5-10 vertices) • Hausdorff set distance function • GenBank Dataset • 250,000 strings of proteins (of lengths 50-100) • edit distance function • Testing of • computation costs (number of distance computations) ADBIS 2007

  12. Experiments – Corel Dataset ADBIS 2007

  13. Experiments – Polygons Dataset ADBIS 2007

  14. Experiments- Genbank Dataset ADBIS 2007

  15. Conclusion • We have proposed • enhancing nodes of M-tree like structures by nearest-neighbors graphs • filtering technique based on NN-graphs → NN-graph filtering • We have implemented • M*-tree (enhancement of M-tree by NN-graphs) • PM*-tree (enhancement of PM-tree by NN-graphs) • Experimental results • we have shown up to 45% speed-up ADBIS 2007

More Related