Continuous Query Processing on Spatio-Temporal Data Streams

Continuous Query Processing on Spatio-Temporal Data Streams Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor: Elke A. Rundensteiner Thesis Reader: Michael A. Gennert June 27, 2005

Outline • Motivation • Part I: SCUBA • Motivation • Moving Clusters • SCUBA Algorithm • Analysis of SCUBA • Evaluation • Conclusions • Future Work • Part II: Performance vs. Accuracy • Discrete vs. Continuous Model • Accuracy Model • Evaluation • Conclusions Are we there yet?

Motivation Send a notification to all cell phone users in the range of 2 miles that we have 50% off lunch sale Monitor the traffic in the red areas Continuously return the area covered by the heard during the migration

- moving objects Challenges - dynamic range query - dynamic kNN query • Scalability • Large number of objects • Large number of queries • Limited Resources • Memory • CPU • Real-time Response Requirement • Reduce the number of computations Novel Idea: Exploit the fact that objects/queries move in groups (i.e., clusters) to optimize the execution The challenge is to provide fast query responsein update-intensive environments

Traditional Execution Shared Cluster-Based Execution • My work (SCUBA) • SR [SR01] • DQ [LPM02] • CNN [TPS] • TPR [SJL00] Big Picture • SINA [MXA04] • SEA-CNN [XMA05] • Q-Index [PXK+02] Shared Execution Use clustering as means to improve execution for densely moving objects and queries

Proposed Solution Continuously retrieve closest police car next to me Moving Clusters!!! • Main Idea: Abstracting individual entities into a cluster based on common attributes -Direction -Speed -Spatial Position • The execution of continuous moving queries on moving objects is then abstracted as the join-between moving clusters and join-within moving clusters Police Car Scalable Cluster-Based Algorithm for Evaluating Continuous Spatio-Temporal Queries on Moving Objects (SCUBA)

Architecture Overview • SCUBA-enabled motion operator execution CAPE Grid-based Join Between/Within Clusters  Time interval expires Results Data Stream -moving object SCUBA - Motion Operator -range query Moving Clusters Moving Queries Data Stream Moving Objects Data Stream I present the system in the context of continuous spatio-temporal range queries

Centroid Moving Cluster Representation in SCUBA Cluster members: -moving objects Max Cluster Size Centroid Actual Cluster Size ΘD Cluster members: -moving queries Velocity Vector Cluster Member Representation Inside Cluster: Cluster member: (moving object)

Object & Queries In-memory clustering Clusters Position Update Cluster-Based Joining Send Results SCUBA Execution • SCUBA produces result every  time units • SCUBA has three phases • Phase I: Cluster Pre-Join Maintenance • Formation of new clusters • Dissolving “empty” clusters • Expanding existing clusters Cluster Pre-Join Maintenance Cluster-Based Joining Cluster Post-Join Maintenance DONE  Timeout DONE DONE • Phase II: Cluster-Based Joining … • Phase III: Cluster Post-Join Maintenance • Dissolving “expiring” clusters • Relocating “non-expiring” clusters based on velocity vector in the grid

Connection Node (CNLoc) Phase I: Cluster Pre-Join Maintenance • Clustering is done incrementally (upon the arrival of updates) • Location update format • (ID, Loct, t, Speed, CNLoc, ...) • Use 2 threshold distances + destination • ΘD – distance threshold • ΘS – speed threshold • Destination Clustering New Object Example M1 M1 M2 M2 (1) New moving object arrives (2) Hash object into the grid Parent Cluster (4) If the cluster has expanded check for overlap with neighboring cells (make new entries if necessary) (3) Add object to the cluster and update cluster attributes -centroid position -radius -average speed -member count (5) If object left the existing cluster, for a new cluster and the old cluster is “empty”, dissolve the old cluster. Clustering Algorithm is based on Leader-Follower Clustering Algorithm (J.A. Hartigan. Clustering Algorithms, John Wiley and Sons 1975) M3 M3

Phase I ∆ expires Join-Within Join-Between = query results = overlap ignored Phase II: Cluster-Based Joining Location updates arrive Incremental Clustering Cluster-Based Join Phase II

Join-Between Join-Within Join-Within = overlap = query results ignored Phase II: Cluster-Based Joining (cont.) • Join-Between • Between two clusters • Join-Within • For each cluster (joining objects and queries inside) • For two overlapping clusters (cross-join between objects and queries from the two clusters)

Connection Node Phase III: Cluster Post-Join Maintenance • Clear the grid • Dissolve “expiring” clusters • Relocate “non-expiring” clusters based on velocity vector back into the grid Insert into the grid New Cluster Position Updated Dissolved

Data Structures • Objects Table • Queries Table • ClusterHome Table • ClusterStorage Table • ClusterGrid 1 46 2 56 37 42

O2(r2,2) O1(r1,1) O3(r3,3) Q4(r4,4) ΘD Q5(r5,5) Velocity Vector Moving Cluster-Based Load Shedding • Focus: Discarding data inside moving clusters Case 1: No Load Shedding (All relative positions of cluster members are preserved)

Moving Cluster-Based Load Shedding (cont.) Cluster Members: (O1,O2,O3, Q4,Q5) ΘD Velocity Vector Case 2: Full Shedding (All relative positions of cluster members are discarded) -Cluster is the sole representation of movements of its members -Assume all objects satisfy all queries inside the cluster -No Join-Within is needed

Nucleus Moving Cluster-Based Load Shedding (cont.) O2(r2,2) Nucleus Threshold ΘN ΘD ΘN = 0.45 * ΘD Velocity Vector Case 3: Partial Shedding (Some (furthest) relative positions of cluster members are maintained) - Introduce new structure to abstract discarded members - Nucleus -Assume all objects satisfy all queries inside the nucleus -No Join-Within is needed for cluster nucleus members

Experimental Settings • We use the Network-based Generator of Moving Objects to generate a set of moving objects and moving queries in Worcester County (Tiger Line files) • Unless mentioned otherwise, the following are the parameters used: • 10,000 moving objects and 10,000 moving queries. Each moving object or query reports its new information (if changed) every time unit. • The percentage of objects and queries that report a change of information is 100% • Speed of objects and queries is set to medium • ΘD = 100 (spatial units), ΘS = 10 (spatial units/time units) ΘN = 0 (no load shedding) • Grid: 100x100

Experimental Results • Varying Grid Cell Sizes - Performance of regular grid-based execution improves with finer granularity of grid cells (But memory requirements increase as well)

Experimental Results (cont.) • Varying Skew Factor: • The higher the skew factor the more dense the objects and queries (i.e., more clusterable) • EXPERIMENTS TO FINISH • Incremental vs. Non-incremental: • Join time slightly improves with non-incremental clustering • But the clustering wait time outweighs the advantage of faster join

Experimental Results (cont.) • Moving Cluster-Based Load Shedding: - Varying ΘN relative to the ΘD - Accuracy measured in terms of false positives and false negatives - Measure average # of FP and FN (per object and query)

Experimental Results (cont.) • Cluster Maintenance: Cluster maintenance time is cheap relative to the join time • EXPERIMENTS TO FINISH

Contributions I proposed: • SCUBA is a novel cluster-based algorithm for continuously evaluating a set of concurrent continuous spatio-temporal queries. SCUBA is generic model that is applicable to any location-aware server. • Scalability in SCUBA is achieved through shared cluster-based execution, where objects and queries with similar attributes are grouped into clusters. The execution of a set of concurrent continuous queries is abstracted as a join-between and join-withinmoving clusters. • SCUBA utilizes moving cluster-based load shedding, with two alternatives (full shedding, partial shedding of cluster members) to resource usage while maintaining accurate answers. • Experimental results show that SCUBA outperforms regular grid-based indexing scheme when executing on densely moving objects

Future Work • Non-circular clusters • Extend to other types of spatio-temporal queries • CKNN • Aggregate • Hierarchical clustering (merge and break-down clusters) • Use real-sensor data

Part II: Additional Work Accuracy vs. Performance Tradeoffin Location-Aware Services

time time Part II: Accuracy vs. Performance Tradeoff • Motion can be described as (b) A continuous function (a) A list of discrete positions

Related Works: Discrete & Continuous • Discrete: • mSTOMM [SDK02] • MobiEyes[GL04] • SINA [MXA04] • SEA-CNN [XMA05] • Q-Index [PXK+02] • Continuous: • DOMINO [WCL02] • A Framework for Representing Moving Objects [BBH04] • MON-Tree [AG04] • CHOROCHRONOS/TB-tree [PJT00] • Continuous Nearest Neighbor Search [TPS02] • Dynamic Queries [LPM02] • Discrete: • Faster • Simpler computations (join) • Smaller memory req-s • Poor approximation of actual movement • Poor accuracy, especially with infrequent updates or when objects move fast • Don’t know anything about the object between the updates • Load shedding has dramatic effect on accuracy • Continuous: • Slower • More complex computations (join) • Larger memory req-s • Better approximation of actual movement • Higher accuracy • Can answer questions about durations of events • Can do load shedding with relatively good quality answers I investigate when each model is more appropriate for any location-aware server

Washington, DC Los-Angeles Chicago Linear Continuous Model • Use linear segments to approximate the movement between updates • Common justifications: • Simple • Arbitrarily complex movements can be approximated using piece-wise linear movements. • Movement is constrained within a road network (roads tend to be linear) Other functions describing motion can be plugged into the system

Accuracy vs. Performance Tradeoff • Continuous Model MORE ACCURATE, but is MORE EXPENSIVE • Accuracy model • comparison between discrete and continuous results • Assumptions • Continuous model is more accurate (100% accuracy) • Compare discrete to continuous • Idea • Construct continuous segments out of discrete answers • Compare them to continuous results

Accuracy Model • Step1: Calculate Average Result Segment Length • Step 1: Calculate Average Result Segment Length • Step 2: Multiply average result segment length by the number of discrete results According to our model, discrete is ~30% as accurate as continuous • Step3: Calculate accuracy

Accuracy Examples Scenario 1: Object location update received every time object entered, stayed, and left the query Scenario 2: Object location received only once when object was inside the query Scenario 3: No location update received at any point in time when object was inside the query Accuracy ≈ 100% Accuracy ≈ 50% Accuracy ≈ 0%

Experimental Results • We compare the performance of two models: • Varying the speed of the objects and queries • Varying the update probability of objects and queries • We use the Network-based Generator of Moving Objects to generate a set of moving objects and moving queries in Worcester County (Tiger Line files) • 5,000 moving objects and 5,000 moving queries. Each moving object or query reports its new information (if changed) every time unit. • Results are computed every 2 time units. Unless mentioned otherwise, the percentage of objects and queries that report a change of information is 100%

Very Slow Very Fast Very Slow Very Fast Accuracy and Performance (Varying Speed)

Accuracy vs. Scalability (Varying Update Probability) Update Probability = frequency of updates from objects and queries 100% = every timestamp 50% = every other timestamp

Conclusions • Continuous model is more preferred when: • objects move fast • not all location updates are received (e.g., load shedding occurs); • location updates arrive out-of-sync due to network delay (in this case, we assume the system would load shed this data, as it is outside of the current window of execution). • Discrete model is preferred when: • objects move slow or • very frequent location updates occur • Continuous model can give a higher accuracy with better performance with only 75% of location updates. • Next Step: Dynamic switch between location modeling techniques based on: • attributes of the arriving data and • performance and accuracy requirements

References [SDK02] D. Stojanovi´c and S. Djordjevi´c–Kajan: Location–based Web services for tracking and visual route analysis of mobile objects. In: Proceedings of Yu INFO Conference, Kopaonik, 2002, CD ROM (Serbian). [GL04] Gedik, B., Liu, L. MobiEyes: Distributed Processing of Continuously Moving Queries on Moving Objects in a Mobile System. EDBT, 2004. [MXA04] Mokbel, M., Xiong, X., Aref, W. SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases. SIGMOD, 2004. [PXK+02] Prabhakar, S., Xia, Y., Kalashnikov, D., Aref, W., Hambrusch, S. Query Indexing and Velocity Constrained Indexing: Scalable Techniques for Continuous Queries on Moving Objects. IEEE Transactions on Computers, 51(10): 1124-1140, 2002. [XMA05] Xiong, X., Mokbel, M., Aref, W. SEA-CNN: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio-temporal Databases. ICDE, 2005. [WCL02] Ouri Wolfson, Hu Cao, Hai Lin, Goce Trajcevski, Fengli Zhang, Naphtali Rishe: Management of Dynamic Location Information in DOMINO. EDBT 2002: 769-771 [BBH04] L. Becker, H. Blunck, K. Hinrichs, J. Vahrenhold: A Framework for Representing Moving Objects. Proceedings of the 14th International Conference on Database and Expert Systems Applications (DEXA 2004) Berlin, 2004, 854 - 863 [AG04] V. T. Almeida and R. H. Guting. Indexing the trajectories of moving objects in networks. Technical Report 309, FernuniversitÄat Hagen, Fachbereich Informatik, 2004. [PJT00] D. Pfoser, C. S. Jensen, and Y. Theodoridis. Novel approaches to the indexing of moving object trajectories. In Proceedings of the 26th International Conference on Very Large Databases, pages 395–406, 2000. [TPS02] Yufei Tao, Dimitris Papadias, and Qiongmao Shen. Continuous Nearest Neighbor Search. In VLDB, 2002. [LPM02] Iosif Lazaridis, Kriengkrai Porkaew, and Sharad Mehrotra. Dynamic Queries over Mobile Objects. In EDBT, 2002 [SR01] Zhexuan Song and Nick Roussopoulos. K-Nearest Neighbor Search for Moving Query Point. In SSTD, 2001. [LPM02] Iosif Lazaridis, Kriengkrai Porkaew, and Sharad Mehrotra. Dynamic Queries over Mobile Objects. In EDBT, 2002. [TPS] Yufei Tao, Dimitris Papadias, and Qiongmao Shen. Continuous Nearest Neighbor Search. In VLDB, 2002. [SJL00] Simonas Saltenis, Christian S. Jensen, Scott T. Leutenegger, and Mario A. Lopez. Indexing the Positions of Continuously Moving Objects. In SIGMOD, 2000.

Acknowledgments • Elke A. Rundensteiner • DSRG • Michael Gennert • George Heineman • Thomas Brinkhoff

The End Thank You

Continuous Query Processing on Spatio-Temporal Data Streams

Continuous Query Processing on Spatio-Temporal Data Streams

Presentation Transcript

Query Assurance on Data Streams

Spatio-Temporal Data Mining

SPATIO TEMPORAL FRAMEWORKS

Probabilistic Cardinal Direction Queries On Spatio -Temporal Data

Spatio-temporal HAC

Spatio-Temporal Databases

PIRS: Query Verification on Data Streams

A Spatio-Temporal Query Language for a data model based on XML.

Dynamic Plan Migration for Continuous Query over Data Streams

Dynamic Plan Migration for Continuous Query over Data Streams

Data Models and Query Languages of Spatio-Temporal Information

Processing Continuous Network-Data Streams

Spatio-Temporal Clustering

Approximate Query Processing (AQP) in Data Streams

Rule-Based Spatio-Temporal Query Processing for Video Databases

Spatio-Temporal Databases

Spatio-Temporal Query Processing in Smartphone Networks

6350 Spatio -temporal Data Processing Course Overview

Indexing Spatio-Temporal Data Warehouses

Spatio-temporal Databases

Spatio-Temporal Predicates

On Discovering Moving Clusters in Spatio-temporal Data