1 / 56

Progressive Computation of The Min-Dist Optimal-Location Query

Progressive Computation of The Min-Dist Optimal-Location Query. Donghui Zhang , Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University of Hong Kong. VLDB ’ 06, Seoul, Korea. Motivation.

liam
Download Presentation

Progressive Computation of The Min-Dist Optimal-Location Query

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University of Hong Kong VLDB’06, Seoul, Korea

  2. Motivation • “What is the optimal location in Boston area to build a new McDonald’s store?” • Suppose a customer drives to the closest McDonald’s. • Optimality: Minimize AVG driving distance. Optimal Location Query

  3. Who will be interested? • Corporations • Chained restaurants (e.g. McDonald’s, Burger King, Starbucks) • Supermarkets (e.g. Wal-Mart, Costco, Stop & Shop) • Location-based service providers (e.g. Verizon, AT&T) • Computer Scientists especially in • Databases • Computational Geometry • Algorithms Optimal Location Query

  4. min-dist OL 600 200 200 600 • Without any new site: AD = (200+200+600+600)/4 = 400. Optimal Location Query

  5. min-dist OL 600 30 l1 30 600 • Without any new site: AD = (200+200+600+600)/4 = 400. • With new site l1: AD(l1) = (30+30+600+600)/4 = 315. Optimal Location Query

  6. min-dist OL 200 30 l2 30 200 • Without any new site: AD = (200+200+600+600)/4 = 400. • With new site l1: AD(l1) = (30+30+600+600)/4 = 315. • With new site l2 : AD(l2) = (200+200+30+30)/4 = 115. Optimal Location Query

  7. distance between o and its nearest site Formal Definition • Given a set S of sites, a set O of objects, and a query range Q , • min-dist OL is a location lQ which minimizes “Solution”: compute all AD(l). But… Optimal Location Query

  8. Challenging • There are infinite number of locations in Q! How to produce a finite set of candidates (yet keeping optimality)? • How to avoid computing AD(l) for all candidates? Optimal Location Query

  9. Solution Highlights • Algorithm to compute AD(l). • Theorems to limit #candidates. • Lower-bound of AD(l) for all locations l in a cell C. • Progressive algorithm. Optimal Location Query

  10. L1 Distance • d(o, s) = |o.x– s.x|+|o.y– s.y| Optimal Location Query

  11. Define l 1. Compute AD(l) • Remember • Let RNN(l) be the objects “attracted” by l. • AD(l)=AD if RNN(l)= RNN(l)= AD=AD(l) Optimal Location Query

  12. RNN(l)={o7, o8} AD(l) < AD 1. Compute AD(l) • Remember • Define • Let RNN(l) be the objects “attracted” by l. • AD(l)=AD if RNN(l)= l Optimal Location Query

  13. Average savings for customers in RNN(l) 1. Compute AD(l) • Remember • Define • Let RNN(l) be the objects “attracted” by l. • AD(l)=AD if RNN(l)= • AD(l)=AD - ? Optimal Location Query

  14. 1. Compute AD(l) • Theorem • S and O are “static” versus l. • AD can be pre-computed. • So is dNN(o, S) • To compute AD(l): • Find RNN(l) • oRNN(l), compute d(o, l) Optimal Location Query

  15. How to compute RNN(l)? • This is an implementation detail, dealing with computational geometry and spatial databases. • Naïve solution: o O , compare with all sites and l. • More efficient: • Compute Voronoi cell of l. • Retrieve objects inside the Voronoi cell using a range search on R-tree. Optimal Location Query

  16. l How to compute RNN(l)?(1) Compute Voronoi cell • Remember: RNN(l) is the set of objects close to l than to any existing site in S. • Consider all sites. Draw a spatial region close to l than to any site. Optimal Location Query

  17. How to compute RNN(l)?(2) Retrieve objects • Standard range search. • Any spatial access methods, e.g. R-tree. Optimal Location Query

  18. y axis 10 m g h l 8 k f e 6 i j d 4 b a 2 c x axis 10 0 8 2 4 6 Range query: find the objects in a given range. E.g. find all hotels in Boston. No index: scan through all objects. NOT EFFICIENT! Optimal Location Query

  19. Optimal Location Query

  20. Optimal Location Query

  21. Optimal Location Query

  22. y axis 10 m g h l 8 k f e E 2 6 i j E d 1 4 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 E E E E E E 1 E 3 4 5 6 7 2 e a c d g b f j m i l h k E E E E E 4 5 3 6 7 Optimal Location Query

  23. y axis 10 m g h l 8 k f e E 2 6 i j E d 1 4 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 E E E E E E 1 E 3 4 5 6 7 2 e a c d g b f j m i l h k E E E E E 4 5 3 6 7 Optimal Location Query

  24. y axis 10 m g h l 8 k f e E 2 6 i j E d 1 4 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 E E E E E E 1 E 3 4 5 6 7 2 e a c d g b f j m i l h k E E E E E 4 5 3 6 7 Optimal Location Query

  25. 2. Limit #candidates • Theorem: within the X/Y range of Q, draw grid lines crossing objects. Only need to consider intersections! Q Optimal Location Query

  26. 2. Limit #candidates • Theorem: within the X/Y range of Q, draw grid lines crossing objects. Only need to consider intersections! Q Optimal Location Query 5x6=30 candidates

  27. δ l 2. Limit #candidates • Proof idea: suppose the OL is not, move it will produce a better (or equal) result. • Consider RNN(l). • Move to the right  saves total dist. Optimal Location Query

  28. 2. VCU(Q) • A spatial region, enclosing the objects closer to Q than to sites in S. • It’s the Voronoi cell of Q versus sites in S. Q Optimal Location Query

  29. 5x6=30 candidates 2. Further Limit #candidates • Only consider objects in VCU(Q). Optimal Location Query

  30. 5x6=30 candidates 2. Further Limit #candidates • Only consider objects in VCU(Q). Optimal Location Query

  31. 4x4=16 candidates 2. Further Limit #candidates • Only consider objects in VCU(Q). Optimal Location Query

  32. Naïve Algorithm • Derive candidates. • Compute AD(l) for each. • Pick smallest. • Not efficient! Too many candidates! To compute AD(l) for each one, need: • compute RNN(l) • retrieve all these objects… Optimal Location Query

  33. Progressive Idea • Treat Q as a cell and consider its corners. Optimal Location Query

  34. Progressive Idea • Divide the cell. Optimal Location Query

  35. Progressive Idea • Divide the cell. Optimal Location Query

  36. Progressive Idea • Recursively divide a sub-cell. Optimal Location Query

  37. Progressive Idea • Recursively divide a sub-cell. • Able to check all candidates. Optimal Location Query

  38. AD(lo) =50 C Progressive Idea • Q: What do you save? • A: Cell pruning, if its lower bound AD(l0) of some candidate l0. Suppose 60 is a lower bound for AD(l), l Optimal Location Query

  39. 3. LB(C): lower bound for AD(l), lC AD(c1)=1000 AD(c2)=3000 c AD(c3)=4000 AD(c4)=2500 Optimal Location Query

  40. 3. LB(C): lower bound for AD(l), lC • Theorem: AD(c1)=1000 AD(c2)=3000 c AD(c3)=4000 AD(c4)=2500 is a lower bound, where p is perimeter. • e.g. LB(C)=3500-p/4 Optimal Location Query

  41. 3. LB(C): lower bound for AD(l), lC • A better lower bound Theorem: • Comparing with the previous lower bound: • Higher quality since the lower bound is larger. • More computation. Optimal Location Query

  42. 4. The Progressive Algorithm • Maintain a heap of cells ordered by LB(). Initially one cell: Q. • Maintain the best candidate lopt • Pick the cell with minimum LB() and partition it. • Compute AD() for the corners of sub-cells. • Compute LB() for the sub-cells. • Insert sub-cell ci to heap if LB(ci)<AD(lopt) • Goto 3. Optimal Location Query

  43. AD(best corner of Q) AD( real OL ) is inside the interval LB(Q) Time Progressiveness • The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining. Optimal Location Query

  44. AD( real OL ) is inside the interval Progressiveness • The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining. AD(best candidate) LB(Q) Time Optimal Location Query

  45. AD( real OL ) is inside the interval Progressiveness • The algorithm quickly reports a candidate OL with a confidence interval, and keeps refining. AD(best candidate) Min{ LB(C) | C in heap } Time • User may choose to terminate any time. Optimal Location Query

  46. Batch Partitioning • To partition a cell, should partition into multiple sub-cells. • Reason: to compute AD(l), need to access the R*-tree of objects. When access the R*-tree, want to compute multiple AD(l). • Tradeoff: if partition too much: wasteful! Since some candidates could be pruned. Optimal Location Query

  47. Performance Setup • O: 123,593 postal addresses in Northeastern part of US. Stored using an R*-tree. • S: randomly select 100 sites from O. • Buffer: 128 pages. • Dell Pentium IV 3.2GHz. • Query size: 1% in each dimension. Optimal Location Query

  48. review slide 4x4=16 candidates 2. Further Limit #candidates • Only consider objects in VCU(Q). Optimal Location Query

  49. Effect of VCU Computation Optimal Location Query

  50. review slide 3. LB(C): lower bound for AD(l), lC • Theorem: AD(c1)=1000 AD(c2)=3000 c AD(c3)=4000 AD(c4)=2500 is a lower bound, where p is perimeter. • e.g. LB(C)=3500-p/4 Optimal Location Query

More Related