Nearest neighbor search in spatial and spatiotemporal databases
Download
1 / 29

Nearest Neighbor Search in Spatial and Spatiotemporal Databases - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

Nearest Neighbor Search in Spatial and Spatiotemporal Databases. Dimitris Papadias Hong Kong University of Science and Technology. Spatial and spatiotemporal databases. Spatial databases manage large collection of multi-dimensional objects. Important query types

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Nearest Neighbor Search in Spatial and Spatiotemporal Databases' - nitara


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Nearest neighbor search in spatial and spatiotemporal databases

Nearest Neighbor Search in Spatial and Spatiotemporal Databases

Dimitris Papadias

Hong Kong University of Science and Technology


Spatial and spatiotemporal databases
Spatial and spatiotemporal databases Databases

  • Spatial databases manage large collection of multi-dimensional objects.

  • Important query types

    • Window query: Retrieve all rivers in CA

    • Nearest neighbor: Find my nearest gas station

    • Spatial join: Report pairs of (city C, river R) such that R crosses C

  • Spatiotemporal databases deal with the same queries assuming, however, moving objects

    • Mobile computing

    • Traffic supervision

    • Flight control

    • Weather forecasting



Tpr trees saltenis et al sigmod 00 our group vldb 03
TPR-trees [Saltenis et al., SIGMOD 00, our group VLDB 03] al SIGMOD 00]

  • Extends the R-tree by introducing the velocity bounding rectangle (VBR) in non-leaf entries.

  • Objects are grouped together based on both their location and velocities.


Conventional nn search with r tpr trees
Conventional NN search with R-(TPR-) trees al SIGMOD 00]

  • Depth-first [Roussopoulos et al., SIGMOD 95]

  • Best-first traversal Hjaltason and Samet TODS 99], incremental and optimal


Nn search other approaches
NN search - other approaches al SIGMOD 00]

  • Several algorithms and theoretical performance bounds have been devised for exact and approximate processing in main memory. Here we care about I/O efficiency (minimization of node and page accesses) as well as cost models about the practical performance (suitable for query optimization).

  • Several approaches for NN in high-dimensional spaces (but the problem is different due to the dimensionality curse). Here we consider low dimensional spaces (spatial and spatiotemporal databases).

  • Ferhatosmanoglu et al [SSTD 01] discover the NN in a constrained area of the data space (e.g., find the NN to the south of the query point).

  • Korn and Muthukrishnan [SIGMOD 00 ] discuss reverse nearest neighbor queries, where the goal is to retrieve the data points whose nearest neighbor is a specified query point.

  • Korn et al. [VLDB 02] study the same problem in the context of data streams, where the data are not known in advance.


Nn search for mobile queries
NN search for mobile queries al SIGMOD 00]

  • [Zheng and Lee, SSTD 01]: return the current NN and the validity time of the result.

  • Restrictions: (i) assumes a maximum speed (ii) applicable only to single NN (iii) requires voronoi diagrams.

  • [Song and Roussopoulos, SSTD 01]: minimize the number of queries for moving clients by returning m>k NNs.

  • Problem: how to determine m.

IF 2dist(q,q') dist(q,b)-dist(q,a), THEN the 2 NN at q' be among the 4 NN of the first query.


Time parameterized nn our group sigmod 02
Time parameterized NN (our group, SIGMOD 02) al SIGMOD 00]

  • Assuming a constant and known velocity, a TPNN returns:

    • The current query result R

    • The validity period T of R

    • The change C of the result at the end of T

Result:

R={i}, T=2, C={j}


Tp nn queries influence time

Some objects have “infinite” influence time. al SIGMOD 00]

The object that will become the next nearest neighbor is the one with the minimum influence time.

TP NN queries: Influence Time


Processing tp nn with r tpr trees
Processing TP NN with R- (TPR-) trees al SIGMOD 00]

  • Influence time of a MBR: the earliest possible time that any object in the MBR will become the new NN.

  • Algorithm: traverse the R-tree using depth-first or best-first traversal using the influence time instead of themindist .

  • Cost of TPNN queries about the same as that of conventional queries because we have to visit the influencing nodes anyway (to find the NN).


Continuous nearest neighbors cnn our group vldb 02
Continuous Nearest Neighbors (CNN) al SIGMOD 00](our group, VLDB 02)

Given a line segment q=[s,e], find the NN of every point on q.

Result representation: {s(.NN=a), s1(.NN=c), s2(.NN=f), s3(.NN=h), e}.

The points (s, s1, s2, s3,e) are the split points.


Main idea

Maintain the set of split points incrementally. al SIGMOD 00]

Main idea

After processing a

After processing c


Processing tp nn with an r tpr tree

Avoid examination of all points. al SIGMOD 00]

Given an MBR E and query segment q, E must be searched if and only if there exists a split point siSL such that dist(si,si.NN) > mindist(si, E).

Processing TP NN with an R- (TPR-) tree


Location based nn queries lbnn our group sigmod 03
Location Based NN queries (LBNN) al SIGMOD 00] (our group, SIGMOD 03)

  • A location-based kNN queryq returns

    • The current k NNs

    • A validity regionsuch that the result remains the same as long as q remains in the region.

    • The validity region of q is the Voronoi Cell (VC) of the NN o.


Computing the voronoi cell on the fly
Computing the Voronoi Cell on-the-fly al SIGMOD 00]

  • Step 1 – Find the current NN

  • Step 2 – Use time TP NN queries to tighten the validity region


Nn queries in road networks our group vldb 03
NN queries in road networks al SIGMOD 00] (our group, VLDB 03)

  • Find my nearest gas station in terms of driving distance.

  • Answer: Hotel b (the Euclidean NN is d)

Assumptions:

  • We can incrementally compute Euclidean NN using conventional NN algorithms.

  • We can compute the network distance between the query and any point (i.e., the length of the shortest path connecting them) using Dijkstra's algorithm.


Euclidean restriction algorithm
Euclidean Restriction Algorithm al SIGMOD 00]

1st Euclidean NN

2nd Euclidean NN



Nn in the presence of obstacles not published
NN in the presence of obstacles (not published) al SIGMOD 00]

  • The NN of q in terms of obstructed distance is b, although the Euclidean NN is a.


Visibility graphs
Visibility graphs al SIGMOD 00]

  • Have been used widely in Computational Geometry for shortest path problems (e.g., find the shortest path from pstart to pend that does not cross any obstacle).

  • Problem: We cannot maintain the entire visibility graph in memory for real spatial datasets.

  • Solution: We only need the obstacles and objects that affect the result of the query.


Obstacle nearest neighbor algorithm
Obstacle nearest neighbor algorithm al SIGMOD 00]

  • Idea: Similar to the Euclidean Restriction algorithm for road networks.

  • BUT how do we perform the obstructed distance computations?


Obstructed distance computation
Obstructed distance computation al SIGMOD 00]

  • Goal: compute the obstructed distance between p and q.

  • First retrieve obstacles o1, o2 in the Euclidean range.

  • Compute a provisional distance d1(p,q) using only o1, o2.

  • d1(p,q) is not enough because the shortest path is obstructed by o3.

  • Perform a second Euclidean range query on the obstacle R-tree using d1(p,q) and retrieve o3, o4.

  • Compute a new obstructed distance d2(p,q) taking o3, o4 into account.

  • Repeat the process until the obstructed distance remains the same for two consecutive iterations.


Other related work
Other related work al SIGMOD 00]

By our group: Similar concepts to the ones presented here, apply to several other spatial queries, i.e., TP spatial joins, Continuous window queries.

  • Cost Models for TP and continuous queries [TODS 03].

  • Analysis of predictive NN (and other) queries [TODS to appear].

  • An Efficient Cost Model for Optimization of Nearest Neighbor Search in Low and Medium Dimensional Spaces [TKDE to appear].

    By other groups: increasing interest for novel types of NN search in the context of mobile computing and data streams applications

  • Iwerks et al [VLDB03] discuss continuous NN in the presence of object updates.

  • Shekhar et al [ACM GIS 03] discuss the in-route nearest neighbor query, which, given a trajectory, retrieves the single NN (e.g., gas station) that results in the minimum diversion from the trajectory.

  • Jensen et al [ACM GIS 03] discuss NN for objects moving on road networks.


Group nn queries our group icde 04
Group NN queries (our group, ICDE 04) al SIGMOD 00]

  • Input: a set P={p1,…,pN} of static data points in multidimensional space and a group of query points Q={q1,…,qn}.

  • Output: the k (1) data point(s) with the smallest sum of distances to all points in Q. The distance between a data point p and Q is defined as dist(p,Q)=i=1~n|pqi|, where |pqi| is the Euclidean distance between p and query point qi.

  • Example: three users at locations q1, q2 and q3 want to find a meeting point (e.g., a restaurant); the corresponding query returns the data point p that minimizes the sum of Euclidean distances |pqi| for 1i3

  • Assumption: the data points are indexed by an R-trees. Q may or may not fit in main memory.


Multiple query method mqm
Multiple Query Method (MQM) al SIGMOD 00]

  • Idea: Perform incrementalNN queries for each point in Q and combine their results.

  • <p10, 7>, <p11, 6>, T=5 (2+3)

  • <p11, 7>

    T=6 (3+3)

    MQM terminates

  • Problem: MQM may visit the same node and discover the same data point many times (for different query points).


Minimum bounding method mbm
Minimum Bounding Method (MBM) al SIGMOD 00]

  • Applies the MBR of Q to prune the search space.

  • Heuristic 1: Let M be the MBR of Q, and best_dist be the distance of the best GNN found so far. A node N cannot contain qualifying points, if:

  • Heuristic 2: A node N cannot contain qualifying points, if:


File multiple query method f mqm
File Multiple Query Method (F-MQM) al SIGMOD 00]

What happens if Q does not fit in memory.

  • F-MQM sorts query points according to their Hilbert value and splits Q into blocks {Q1, .., Qm} that fit in memory.

  • For each block, it computes the GNN using one of the main memory algorithms

  • It finally combines their results using MQM.

    Complication: once a NN of a group has been retrieved, we cannot compute its global distance (i.e., with respect to all data points) immediately.


F mqm cont
F-MQM (cont) al SIGMOD 00]

Solution: lazy evaluation:

  • First we find the GNN p1 of the first group Q1

  • Then, we load in memory the second group Q2 and retrieve its NN p2. At the same time, we also compute the distance between p1 and Q2.

  • Similarly, when we load Q3, we update the current distances of p1 and p2 taking into account the objects of the third group.

  • After the end of the first round, we only have one data point (p1), whose global distance with respect to all query points has been computed.


File minimum bounding method f mbm
File Minimum Bounding Method (F-MBM) al SIGMOD 00]

  • First, the points of Q are sorted by their Hilbert value and are assigned to groups (that fit in memory) according to this order.

  • For each group Qi, F-MBM keeps in memory its MBR Mi and cardinality ni (but not its contents).

  • F-MBM descends the R-tree of P (in depth-first or best-first traversal), only following nodes that may contain qualifying points.

Heuristic: Let best_dist be the distance of the best GNN found so far. A node N can be safely pruned if: