1 / 12

Richard Swinbank 9 th July 2004 Bulk Loading the M-tree to Enhance Query Performance

Richard Swinbank 9 th July 2004 Bulk Loading the M-tree to Enhance Query Performance Alan P. Sexton & Richard Swinbank University of Birmingham. Bulk Loading the M-tree. The M-tree Hasn’t this been done already?! Our approach and motivation Outlier effects Symmetry and Deletion

nico
Download Presentation

Richard Swinbank 9 th July 2004 Bulk Loading the M-tree to Enhance Query Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Richard Swinbank 9th July 2004 Bulk Loading the M-tree to Enhance Query Performance Alan P. Sexton & Richard Swinbank University of Birmingham

  2. Bulk Loading the M-tree • The M-tree • Hasn’t this been done already?! • Our approach and motivation • Outlier effects • Symmetry and Deletion • Conclusions

  3. A B C D E a b c d e The M-tree • Like B+ tree; multiway, paged, post-and-grow • ‘Discriminators’ are metric balls, not intervals • No concept of position, only distance • Query performance depends critically on overlap A D d E a c b e C B

  4. Hasn’t this been done already?! • Ciaccia et al., 1998 • Seeded trees: top-down growth • Cheaper to build than insertion-built trees • Comparable query performance • B+ tree • Sort data • Build bottom-up • M-tree • Cluster data • Build bottom-up?

  5. Bulk Loading the M-tree • 25% - 40% query performance gain • Top : 1-NN query results • Bottom : Leaf radii for related trees

  6. Closest-pair clustering • Requirements • Upper (CMAX) and lower (CMAX/2) bound on cardinality • Minimise overlap of metric representation • Algorithm • Take closest pair of clusters (c1, c2) • If |c1| + |c2| <= CMAX, merge, otherwise remove larger cluster from working set • Repeat until working set is empty • Outlier effects

  7. Outlier effects M-tree insertion Closest-pair clustering

  8. Bulk Loading • Use closest-pair clustering to prepare a full level • Accumulate primary medoids to populate next level up • Algorithm • Cluster points • On-the-fly: • Write output clusters to disk: M-tree nodes • Generate parent entries: points for next level up • Repeat until next level is a single page • Bottom-up growth • Subtree containment

  9. Subtree containment on Bulk Load

  10. Subtree containment on Insert

  11. SM-tree vs. M-tree

  12. Conclusions • Closest-pair clustering algorithm • Mitigates outlier effects • Improves query performance • Bulk loading algorithm • Bottom-up, balanced growth • Insert/Delete symmetry: SM-tree • Further work • Questions?

More Related