1 / 12

Answering Metric Skyline Queries by PM-tree

Answering Metric Skyline Queries by PM-tree. Tomáš Skopal, Jakub Lokoč Department of Software Engineering , FMP, Charles University in Prague. Similarity search. content - based similarity search single- example queries range query kNN query multi - example queries

nerita
Download Presentation

Answering Metric Skyline Queries by PM-tree

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Answering Metric Skyline Queries by PM-tree Tomáš Skopal, Jakub LokočDepartment of Software Engineering, FMP, Charles University in Prague

  2. Similaritysearch • content-basedsimilaritysearch • single-examplequeries • rangequery • kNNquery • multi-examplequeries • combinationof single-examplequeriesis not sufficient • should support • partialmatching • compromise • metricskyline DATESO 2010, Štědronín - Plazy

  3. Metricskylinequery (MSQ) • traditionalskylineoperator • linearlyorder-edattributedomains • dominance relation • + MDDRs (minimum dominating-dominatedrectangles) • static schema • metricskyline • multi-examplequery (not justoperator) • attributesspecifiedatquerytime – ithattribute = distance ofdatabaseobject to ithqueryexampleQi • result set interpretation:objectssimilar to allqueryexamplesyetdistinct (dissimilar to eachother) • dynamicschema, cannotbereduced to theclassicskylineoperator for efficientskylineprocessing • i.e., thecoordinatesystemisestablishedatquerytime DATESO 2010, Štědronín - Plazy

  4. Genericalgorithmfor a hierarchic metric index • branch-and-boundalgorithm (originally developed for R-tree and classic/spatial skyline operator) • dynamic mapping of the metric space into L1 vector space (examples) • heuristics: data/regionsprocessed in L1orderguaranteeno falsedismissals • a priority heapisused, storing index entriesequippedby MDDRs to beinspected (higher priority = lower L1orderof MDDR) Thealgorithm: 0) The entry of the entire index is pushedon the heap (e.g., M-tree root node). • Anentrywiththelowest L1 distance ofits MDDR ispoppedfromtheheap. • Iftheentrycontainsjustone data object (e.g., entry in an M-treeleaf), itisadded to theskyline set, whileremovingallentriesfromtheheapdominated by theentry. Jump to 1. • Iftheentryis a region (e.g., entry in an M-treeinnernode), itschildnodeisfetched. TheMDDRsofthechildnode’s entries are checked for dominance by the already determined skyline set, while the dominated ones are filtered from further processing. • The MDDRs of the non-filtered child entries are derived, while those not dominated by the current skyline set are pushed into the heap. Jump to 1. L1 L1 DATESO 2010, Štědronín - Plazy

  5. M-tree • metric index based on B+-tree • innernodecontains routing entries • ballregions (object and radius) + distance to parent region + pointer to subtree • leaf node contains ground entries • object + distance to parent region • 2 types of filtering by querying • parent filtering (cheap) • stored distance to parent is used • basic filtering (expensive) • distance computation needed DATESO 2010, Štědronín - Plazy

  6. MSQ implementation using M-tree • uses the generic algorithm enhanced by specific M-tree MDDRs, mapping the M-tree regions from metric space into L1 vector space (dimensions are distances of data/regions to the query examples Qi) • 2 types of M-tree MDDR • Par-MDDR • the mapped oversized region ball (using the distance to parent) • B-MDDR • the mapped region ball DATESO 2010, Štědronín - Plazy

  7. PM-tree • combinationof M-treeand pivot tables (LAESA) • M-treeballsreduced by ringscentered in globalpivots Pi • routingandgroundentriesstorealsothe ring radii • enhancedfiltering • cheaply in pivot space (mappingof data/ballsinto L∞vectorspace) • mappingofthequeryobjectintothe pivot spaceistheonly extra computationcosts • if not filteredout in pivot space, regular M-treefiltering DATESO 2010, Štědronín - Plazy

  8. Papercontribution: PM-treeMSQ implementation • B-MDDR, Par-MDDR (inheritedfrom M-tree) • Piv-MDDR • using PM-treeringsthe MDDR canbetightened • for eachdimension (exampleQi) themaximallowerboundandminimalupperbound distance to the region isfound (to theringsintersection) • pivot skyline • skylineinitialized by pivotsmapped to the L1space • heavyoptimization(reductionofheapsize) • deferredheapprocessing • reinsertionsintoheapto save distance computations DATESO 2010, Štědronín - Plazy

  9. Experiments • subsetoftheCoPhIRdatabase, onemillion 76-dimensionaltuplesrepresenting 2 MPEG7 features on flickrimages, Euclidean distance used • Polygonsdatabase, 250k 30-dimensionaltuplesrepresenting 5-15 vertex 2D polygons, Hausdorff distance used • averageover 200 metricskylinequeries • eachmetricskyline querydefined by2-5 queryexamples DATESO 2010, Štědronín - Plazy

  10. Experiments DATESO 2010, Štědronín - Plazy

  11. Experiments DATESO 2010, Štědronín - Plazy

  12. Conclusions • PM-treebasedmetricskylinequeryimplementation • up to 2x faster in termsof distance computationsand I/O cost(wrtoriginal M-treeimplementation) • up to 20x faster in termsofheapoperations • needsup to 20x lessspace for theheap Thankyou for your attention! Questions? DATESO 2010, Štědronín - Plazy

More Related