1 / 38

Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN) Queries Xiaobin Ma Advisor: Shashi Shek

Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN) Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005 . Outline. Motivation Problem statement Related work and our contributions Proposed algorithm and cost model Experiment design and results

shada
Download Presentation

Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN) Queries Xiaobin Ma Advisor: Shashi Shek

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN) Queries Xiaobin Ma Advisor: Shashi Shekhar Dec, 2005

  2. Outline • Motivation • Problem statement • Related work and our contributions • Proposed algorithm and cost model • Experiment design and results • Conclusion and future work

  3. Motivation • GIS applications • Find shortest path • Through one point from each of different feature types

  4. A Running Example • Three feature types: • red(g), green(g), black(b) • q is query point • Route with solid red line is shortest route • Routes with dashed lines are other possible routes q

  5. Basic Concepts • <P1,P2,…,Pk> • ordered point sequence and P1,P2,…,Pk are from k different (feature) types of data sets • R(q, P1,P2,…,Pk) • a route from q through points P1,P2,…, and Pk • d(R(q, P1,P2,…,Pk)) • distance of route R(q, P1,P2,…,Pk) • Multi-Type Nearest Neighbor (MTNN) • ordered point sequence <P1’,P2’,…,Pk’> such that d(R(q,P1’,P2’,…,Pk’)) is minimum among all possible routes • d(R(q, P1’,P2’,…,Pk’)) is MTNN distance • MTNN query • A query finding MTNN

  6. Problem Statement for MTNN Query • Given: • A query point • Distance metric • k different (feature) types of spatial objects with data points numbers N1, N2, N3, … ,Nk respectively • R-tree for each data set • Find: Multi-type nearest neighbor (MTNN) • Objective: Minimize length of route from query point covering an instance of each feature • Constraint: • Correctness: The tour should be the shortest path for the query point and the given collection of spatial query feature types • Completeness: Only the shortest path is returned as the query result

  7. Related Work • Optimal sequence route (OSR) query [Kolahdozan et. al. Tech 05-840 USC] • Optimal algorithms (RLORD) • Focus on optimal algorithms for specified permutation of feature types • Point-based algorithms • Trip plan query (TPQ) [Li et. al. SSTD 05] • Heuristic algorithms • Give approximate results

  8. RLORD Example b17 • q is query point • Search order is <r, b, g> • R(q,r2,b2, g2) is greedy route • Radius of circle is d(R(q,r2,b2,g2)) g14 g9 b6 b13 g1 g5 g4 g7 b10 b4 r1 g16 b5 b11 g3 r12 b14 q r3 b3 b8 r10 r2 b9 r4 r11 r9 b2 r14 b15 r13 r6 b12 r7 b1 g2 r5 r8 g13 r15 g12 g10 g8 g1 g6 g11

  9. RLORD Running Iterations • Use backward search strategy O=<g,b,r> • First iteration - examine feature type g • <g2>, <g3>, <g4>, <g5>,<g7>,<g9>, <g10>, <g12>, <g13>, <g14>, <g15>, <g16> in a set R • Second iteration - examine next feature type in O • For every point bi in black set, • iterate on every partial route <gj>in R: • IF d(R(q, bi)) + d(R(bi,gj)) < d(R(q,r2,b2,g2)) • THEN put <bi,gj> into a set R1 • keep ordered sequence <bi,gj> in R1 such that d(R(bi,gj)) + d(R(gj)) is minimum • <b1,g13>, <b2,g2>, <b3,g3>, <b4,g3>, <b6,g14>, <b7,g14>, <b11,g3>, <b12,g13>, <b13,g14>, <b14,g3>, <b15,g13> in a set R2 • R <- R2 • Examine next feature type and repeat above procedure until all types of data are examined

  10. Our Contributions • Formalized a new nearest neighbor search problem – Multi-Type Nearest Neighbor (MTNN) query problem • Proposed a new algorithm, i.e., Page Level Upper Bound (PLUB) based algorithm • Evaluated the proposed algorithm via cost model and experiment

  11. Key Ideas of PLUB • Prune search space at page level • Create candidate leaf page sequences • Search candidate MTNN in these candidate leaf page sequences

  12. Page Level Upper Bound (PLUB) Algorithm • Step 1: First upper bound search • Use basic R-tree based nearest neighbor search algorithm to find an initial upper bound as current upper bound, using greedy strategy • Step 2: R-Tree search • Prune search space with current upper bound and form a set of leaf node candidate sequences, using page level pruning approach • Step 3: Subset search • Search candidate MTNN in leaf node candidate sequences • Go to step 2 until going thought all permutation of feature types, using candidate MTNN distance as current upper bound

  13. RLUB – An Example b17 G3 • Inputs • q: query point • Euclidean distance • R-tree for each feature B4 g14 g9 b6 b13 g1 G4 g5 g7 g4 b10 b4 r1 g16 b5 b11 r12 g3 b14 B2 R2 q r3 B3 b3 b8 r10 r2 b9 r4 R4 R1 r11 r9 b2 r14 b15 r13 B1 r6 b12 r7 b1 r8 g2 r5 g13 r15 R3 g12 G1 g10 g8 g1 g6 • R(q,r2,b2,g2) is greedy route • Radius of circle is d(R(q,r2,b2,g2)) = 3.37 • Rectangles are leaf pages in R-trees g11 G2

  14. RLUB – An Example • Leaf page upper bound calculation (current search bound 3.37) • Only leaf node sequence <R1,B1,G1> left b17 G3 B4 g14 g9 b6 b13 g1 G4 g5 g7 g4 b10 b4 r1 g16 b5 b11 r12 g3 b14 B2 R2 q r3 B3 b3 b8 r10 r2 b9 r4 R4 R1 r11 r9 b2 r14 b15 r13 B1 r6 b12 r7 b1 r8 g2 r5 g13 r15 R3 g12 G1 g10 g8 g1 g6 • R(q,r2,b2,g2) is greedy route • Radius of circle is d(R(q,r2,b2,g2)) = 3.37 • Rectangles are leaf pages in R-trees g11 G2

  15. RLUB – An Example b17 G3 • Search candidate MTNN in <R1,B1,G1>(time unit p-p) • 1st iteration • <g2><g10><g12> <g13> • Time 4 • 2nd iteration • <b12,g13,><b1,g13> <b2,g2><b15,g13> • Time 4x4+4=20 • 3rd iteration • <r10,b15,g13,><r9,b15,g13><r2,b2,g2> <r11,b1,g13> • Time 4x4+4=20 • Output • Shortest distance route R(q,r11,b1,g13) and distance value 3.16 B4 g14 g9 b6 b13 g1 G4 g5 g7 g4 b10 b4 r1 g16 b5 b11 r12 g3 b14 B2 R2 q r3 B3 b3 b8 r10 r2 b9 r4 R4 R1 r11 r9 b2 r14 b15 r13 B1 r6 b12 r7 b1 r8 g2 r5 g13 r15 R3 g12 G1 g10 g8 g1 g6 • R(q,r2,b2,g2) is greedy route • Radius of circle is d(R(q,r2,b2,g2)) = 3.37 • Rectangles are leaf pages in R-trees g11 G2

  16. Running Results of RLORD • First iteration (time unit p-p) • <g2>, <g3>, <g4>, <g5>,<g7>,<g9>, <g10>, <g12>, <g13>, <g14>, <g15>, <g16> • Time 11 • Second iteration • <b1,g13>, <b2,g2>, <b3,g3>, <b4,g3>, <b6,g14>, <b7,g14>, <b11,g3>, <b12,g13>, <b13,g14>, <b14,g3>, <b15,g13> • Time 11x12+12=144 • Third iteration • <r1,b11,g3>, <r2,b2,g2>, <r3,b11,g3>, <r8,b1,g13>, <r9,b15,g13>, <r10,b15,g13>, <r11,b1,g13>, <r12,b11,g3>, <r13,b1,g13>, <r14,b1,g13>, <r15,b1,g13> • Time 12x11+11=143 • R(q,r11,b1,g13) is shortest among all routes • Shortest distance value 3.16

  17. Running Time Comparison Table • R-R: rectangle to rectangle distance • P-P: point to point distance • RLORD has no R-R distance calculation, but has much more P-P calculation • Cost of R-R < 2 x cost of P-P

  18. Cost Model for PLUB (For One Permutation) • CR-T + CLF + CPN • CR-T : cost of R-tree traversal to find all R-tree leaf nodes intersected by the circle with radius of current upper bound, centered at query point q • CLF : cost of page level leaf node search for R-tree candidate leaf node sequences • CPN : cost of point level search for candidate MTNN in candidate leaf node sequences

  19. CR-T Model of PLUB • CR-T : R-tree traversal cost • CPR :cost of point to rectangle distance calculation • N t,i : number of all the tree nodes visited in feature type i tree traversal • CR-T = CPR x Σ N t,i (i= 1, …, k)

  20. CLF Model of PLUB • CLF: search of R-tree candidate leaf node sequences • NR-R : Number of leaf nodes visited in candidate leaf node sequences search • CR-R : cost of rectangle to rectangle distance calculation • CLF = NR-R x CR-R

  21. CPN Model of PLUB • CPN : search MTNN in candidate leaf node sequences • FLS : leaf node candidate sequence filtering ability ratio • nl : average point number in leaf node for all feature types • pi : page number of feature type i • CP-P :cost of point to point distance calculation • Cls : cost of search MTNN in single leaf node sequence • Cls = CP-P x (nl +(nl x nl) + nl + (nl x nl) + … + nl + (nl x nl) (k-1 items) = (k-1) (nl x (nl +1)) x CPP • CPN = Cls x Πpi x (1- FLS) i = 1,…,k

  22. Cost Model for R-Lord (For One Permutation) • CR-T‘+ CPS • CR-T‘: cost of R-tree based coarse pruning, i.e. find all data points inside initial upper bound • CR-T‘ = CR-T + CP-P x nl x (p1+ p2 +p3 +…+ pk-1+ pk ) • CPS : cost of candidate MTNN search in remaining subsets • CP-P :cost of point to point distance calculation • CPS = CP-P x nl x (p1 + nl x p1xp2 + (p2+ nl x p2xp3 )+ …+ (pk-1+ nl x pk-1 x pk )

  23. Cost Model Summary of PLUB and RLORD( one permutation) • In random or approximate random datasets, FLS is not big enough, PLUB takes more time. • In clustered datasets, FLS tends to be very big. When 1-FLS <(nl x (p1 + nl x p1xp2 +(p2+ nl x p2xp3 )+…+ (pk-1+ nl x pk-1 x pk ))) /((k-1) nl x (nl +1) x Π pi ) PLUB runs faster than RLORD • For clustered datasets, it becomes true when clusters becomes more compact • Left side: remaining ratio (r-ratio) • Right side: comparison ratio (c-ratio)

  24. Experiment Design

  25. Synthetic Data Sets Generation • Randomly generate cluster center in rectangle with bottom-left (0,0) and top-right point (10000,10000) • Constraint: the minimum distance between two cluster centers is minCCDist • Around every cluster center, generate cluster member points • Maximum distance from member point to cluster center is ClusterSize • Simplified maximum cluster center distance is determined by: • maxCCDist = 10000.0/(int)(sqrt(CN)+1) • Thus minimum cluster center distance when generating cluster center is as follows: • minCCDist = BCF x maxCCDist • Then the cluster size is: • ClusterSize = ICF x minCCDist

  26. Experiment Parameters • Feature Types:2-7 • Between-cluster Compactness Factor (BCF): 0.1-1.0 • In-cluster Compactness Factor (ICF):0.1-0.5 • Cluster Number(CN):20,50,100,200

  27. Synthetic Datasets Example • BCF=0.5,ICF=0.5,CN=20,Feature Type=2 • BCF=0.5,ICF=0.3,CN=20,Feature Type=2

  28. Experiment Setup & Data Sets • Setup • C / Pentium-IV 3.2GHz / Linux / 1GB Memory / Synthetic data • Synthetic data • Scalability test in terms feature types • Effect of data sets density • Effect of Between-cluster compactness factor • Effect of In-cluster compactness factor

  29. Scalability Test • Parameters • Fixed: BCF=0.1, ICF = 0.1, CN=20 • Variable: feature types (2-7) • Trend • PLUB is much faster when number of features is high

  30. Effect of Data Sets Density • Parameters • Fixed: FT = 7, BCF=0.1, ICF=0.5 • Variable: cluster number (20,50,100,200) • Trend • PLUB is always faster than RLORD for all densities of data sets

  31. Effect of Between-cluster Compactness Factor • Parameters • Fixed: FT = 7, ICF=0.3,CN=50, • Variable: BCF (0.1-1.0)

  32. Effect of Between-cluster Compactness Factor • Top: execution time v.s. BCF • Trend • PLUB is faster than RLORD when BCF is less than 0.7 • PLUB is slower than RLORD when BCF is bigger than 0.7

  33. Effect of Between-cluster Compactness Factor • Bottom: Remaining ratio (r-ratio) and comparison ratio (c-ratio) v.s. BCF • Trend • Ratios increase as BCF increase • Remaining ratio is less than comparison ratio when BCF is less than 0.8

  34. Effect of Between-cluster Compactness Factor • Contradiction? • Remaining ratio increases, which means the pruning ratio decreases, the execution time decreases • when BCF increases, there are less leaf nodes intersected with current search bound. Thus the total possible candidate leaf node sequences decrease dramatically

  35. Effect of Between-cluster Compactness Factor • Key information • when remaining ratio is less than comparison ratio, PLUB runs faster • when remaining ratio is greater than comparison ratio, PLUB takes more time than RLORD.

  36. Effect of In-cluster Compactness Factor • Parameters • Fixed: FT = 7, BCF=0.1,CN=50, • Variable: ICF (0.1-0.5) • Trend • PLUB is always faster than RLORD for ICF from 0.1 to 0.5

  37. Conclusion and Future Work • Formalized MTNN query problem • Proposed PLUB based algorithm for MTNN query • Compared PLUB and RLORD • Design heuristic algorithms to tackle MTNN query problem in large number of feature types

  38. References • [1] M. Kolahdouzan, M. Sharifzadeh and C. Shahabi. The Optimal Sequenced Route Query. IN USC, CS Dept, Tech. Report 05-840, 2005 • [2] Feifei Li, Dihan Cheng, Marios Hadjieleftherious, George Kollios and Shang-Hua Teng. On Trip Planning Queries in Spatial Databases. SSTD 2005.

More Related