430 likes | 542 Views
This paper presents a novel approach for identifying the top-t most influential sites within a specified spatial region based on the influence metric, defined as the total weight of objects that view a site as their nearest neighbor. We introduce the minExistDNN metric, designed to optimize the query process, allowing for the efficient computation of influences without the need to pre-calculate for all sites. Through systematic browsing of data structures and pruning techniques, our results demonstrate significant improvements in performance, making this method valuable for applications such as urban planning and resource allocation.
E N D
On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern University Boston, USA VLDB 2005, Trondheim, Norway
Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway
Problem Definition • Given: • a set of sites S • a set of weighted objects O • a spatial region Q • an integer t. • Top-t most influential sites query: • find t sites in Q with the largest influences. • influence of a site s = total weight of objects that consider s as the nearest site. VLDB 2005, Trondheim, Norway
Motivation • Which supermarket in Boston is the most influential among residential buildings? • Sites: supermarkets; • Objects: residential buildings; • Weight: # people in a building; • Query region: Boston; • Which wireless station in Boston is the most influential among mobile users? VLDB 2005, Trondheim, Norway
o2 o4 s2 s3 o5 o1 s4 s1 o3 o6 Example • Suppose all objects have weight = 1, Q is the whole space, and t = 1. • The most influential site is s1, with influence = 3. VLDB 2005, Trondheim, Norway
Example • Now that Q is the shadowed rectangle and t = 2. • Top-2 most influential sites: s4 and s2. o2 o4 s2 s3 o5 o1 s4 s1 o3 o6 VLDB 2005, Trondheim, Norway
Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway
o2 o4 s2 s3 o5 o1 s4 s1 o3 o6 Related Work • Bi-chromatic RNN query: considers two datasets, sites and objects. • The RNNs of a site s S are the objects that consider s as the nearest site. VLDB 2005, Trondheim, Norway
o2 o4 s2 s3 o5 o1 s4 s1 o3 o6 Related Work • Solutions to the RNN query based on pre-computation [KM00, YL01]. VLDB 2005, Trondheim, Norway
Related Work • Solution to RNN query based on Voronoi diagram [SRAE01]. • Compute the Voronoi cell of s: a region enclosing the locations closer to s than to any other sites. • Querying the object R-tree using the Voronoi cell. VLDB 2005, Trondheim, Norway
Related Work [SRAE01] o2 o4 s2 s3 o5 o1 s4 s1 o3 o6 VLDB 2005, Trondheim, Norway
Our Problem vs. RNN Query • RNN query: • A single site as an input. • Interested in the actual set of the RNNs. • Top-t most influential sites query: • A spatial region as an input. • Interested in the aggregate weight of RNNs. VLDB 2005, Trondheim, Norway
Straightforward Solution 1 • For each site, pre-compute its influence. • At query time, find the sites in Q and return the t sites with max influences. • Drawback 1: Costly maintenance upon updates. • Drawback 2: binding a set of sites closely with a set of objects. VLDB 2005, Trondheim, Norway
Straightforward Solution 2 • An extension of the Voronoi diagram based solution to the RNN query. • Find all sites in Q. • For each such site, find its RNNs by using the Voronoi cell, and compute its influence. • Return the t sites with max influences. VLDB 2005, Trondheim, Norway
Straightforward Solution 2 • Drawback 1: All sites in Q need to be retrieved from the leaf nodes. • Drawback 2: The object R-tree and the site R-tree are browsed multiple times. • For each site in Q, browse the site R-tree to compute the Voronoi Cell. • For each such Voronoi Cell, browse the object R-tree to compute the influence. VLDB 2005, Trondheim, Norway
Features of Our Solution • Systematically browse both trees once. • Pruning techniques are provided based on a new metric, minExistDNN. • No need to compute the influences for all sites in Q, or even to locate all sites in Q. VLDB 2005, Trondheim, Norway
Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway
O2 O1 S1 S2 O1 only affects S1, while O2 affects both S1 and S2. Motivation • Intuitively, if some object in Oi may consider some site in Sj as an NN, OiaffectsSj. • To estimate the influences of all sites in a site MBR Sj, we need to know whether an object MBR Oi will affectSj. VLDB 2005, Trondheim, Norway
minDist(O1,S2)=8 S2 O1 S1 maxDist(O1,S1)=10 maxDist – A Loose Estimation • If maxDist(O1, S1) < minDist(O1, S2), O1 does not affect S2. • Why not good enough? VLDB 2005, Trondheim, Norway
minDist(o1,S2) = 6 S2 o1 S1 minMaxDist(o1, S1) = 5 minMaxDist – A Tight Estimation? • An object o does not affect S2, if there exists S1 such that minMaxDist(o1, S1) < minDist(o1, S2) VLDB 2005, Trondheim, Norway
minDist(O1,S2) = 6 s1 S2 O1 7 S1 6 s2 o1 minMaxDist(O1, S1) = 5 minMaxDist – A Tight Estimation? • Not true for an object MBR O1. VLDB 2005, Trondheim, Norway
A Tight Estimation? • A metric m(O1, S1) should: • guarantee that, each location in O1 is within m(O1, S1) of a site in S1, • and be the smallest distance with this property. VLDB 2005, Trondheim, Norway
New Metric – minExistDNNS1(O1) • Definition: minExistDNNS1(O1) = max {minMaxDist(l, S1) | location l O1} • O1 does not affect S2, if there exists S1, s.t. minExistDNNS1(O1) < minDist(O1, S2). VLDB 2005, Trondheim, Norway
O1 O1 S1 S1 Examples of minExistDNNS1(O1) • How to calculate it? VLDB 2005, Trondheim, Norway
P1:b P2:c P3:a P4:d a c S1 b d P8:a P7:d P6:b P5:c Calculating minExistDNNS1(O1) • Step 1: Space partitioning Every location l in the same partition is associated with the second closest corner of S1 – the distance is minMaxDist(l, S1)! VLDB 2005, Trondheim, Norway
P1:b P2:c O1 a c S1 b d Space Partitioning • O1 is divided into multiple sub-regions, one in each partition. VLDB 2005, Trondheim, Norway
P1:b P2:c O1 minExistDNNS1(O1) a c S1 b d Calculating minExistDNNS1(O1) • Step 2: Choose up-to 8 locations on O1’ border and compute the minMaxDist’s to S1. • minExistDNN is the largest one! VLDB 2005, Trondheim, Norway
Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway
Data Structure • Two R-trees: S of sites, O of objects. • Three queues: • queueSIN: entries of S inside Q. • queueSOUT: entries of S outside Q. • queueO: entries of O. VLDB 2005, Trondheim, Norway
O3 S3 S4 O2 O1 Q S1 O4 S2 Data Structure • queueSIN: • queueO: • queueSOUT: S1 S2 O1 S3 VLDB 2005, Trondheim, Norway
maxInfluence and minInfluence • For each entry Sj in queueSIN, • maxInfluence: total weight of entries in queueO that affect Sj. • minInfluence: total weight of entries in queueO that ONLY affect Sj, divided by the number of objects in Sj. • queueSIN is sorted in decreasing order of maxInfluence. VLDB 2005, Trondheim, Norway
Algorithm Overview • Expand an entry from one of the three queues. • Remove the entry from the queue. • Retrieve the referenced node, and insert the (unpruned) entries into the same queue. • Update maxInfluence and minInfluence if necessary. • If top-t entries in queueSIN are sites, with minInfluences ≥ maxInfluences of all remaining entries, return. VLDB 2005, Trondheim, Norway
S3 S8 O5 S9 O6 S1 S5 O1 S6 S7 Example • S6 is not affected by O1, prune S6. • O5 does not affect S5 and S7, prune O5. • queueSIN: S1 • queueO: O1 • queueSOUT: S3 • queueSIN: S5, S7 • queueO: O6 • queueSOUT: S9 Q VLDB 2005, Trondheim, Norway
minExistDNNS3(O1)=4 minDist(S2, O1)=5 S3 S2 O1 S4 minExistDNNS1(O1)=7 A Pruning Case • S2 is pruned because of minExistDNNS3(O1) < minDist(S2, O1) S1 Expand S1 VLDB 2005, Trondheim, Norway
Choosing an Entry to Expand • Expand top entries in queueSIN. • Expand the most important Oi. • Importance: |Oi| * #affected entries * area(Oi) • Expand Sj that contains the most important Oi. VLDB 2005, Trondheim, Norway
Q Q S1 S1 minDist(S1, O1)=5 minDist(S1, O1)=5 O1 O1 minExistDNNS2(O1)=6 minExistDNNS2(O1)=6 S2 S’2 Choosing an Entry to Expand • Estimate the probability of pruning Oi using some Sj in queueSOUT. • After expanding S2, O1 is likely not to affect S1. VLDB 2005, Trondheim, Norway
Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway
Experimental Setup • Data sets: • 24,493 populated places in North America • 9,203 cultural landmarks in North America • R-tree page size: 1 KB • LRU buffer: 128 disk pages. • t = 4. • Comparing to the solution using Voronoi diagram. VLDB 2005, Trondheim, Norway
Selected Experimental Results #sites : #objects = 1 : 2.5 VLDB 2005, Trondheim, Norway
Selected Experimental Results #sites : #objects = 2.5 : 1 VLDB 2005, Trondheim, Norway
Outline • Problem Definition • Related Work • The New Metric: minExistDNN • Data Structures and Algorithm • Experimental Results • Conclusions VLDB 2005, Trondheim, Norway
Conclusions • We addressed a new problem: Top-t most influential sites query. • We proposed a new metric: minExistDNN. It can be used to prune search space in NN/RNN related problems. • We carefully designed an algorithm which systematically browses both R-trees once. • Experiments showed more than an order of magnitude improvement. VLDB 2005, Trondheim, Norway
Thank you! Q & A VLDB 2005, Trondheim, Norway