On Discovering Moving Clusters in Spatio-temporal Data

On Discovering Moving Clusters in Spatio-temporal Data Panos Kalnis National University of Singapore Nikos Mamoulis University of Hong Kong Spiridon Bakiras Hong Kong University of Science and Technology

What is a Moving Cluster? • Dense clusters of objects that move similarly for a long time period • Not necessarily the same objects during the lifetime of the cluster • Examples • Migrating animals • Convoy of cars • Military applications • Solutions: • Efficient exact and approximate algorithms

Problem Formulation • Example: • Moving cluster

MinPts=3 ε ε Related Work (Static) • Partition-based clustering (k-medoids) • Hierarchical clustering (BIRCH, CURE) • Density-based clustering (DBSCAN)

Related Work (Moving Objects) • Grouping trajectories [Vlachos et.al, ICDE 02] • Trajectory cluster: Constant set of objects through its lifetime • Only similar movement; no space proximity • Dense areas over time [Hadjieleftheriou et.al, SSTD 03] • Static dense regions • No common objects between regions in sequence • Incremental DBSCAN/OPTICS [Ester et.al, VLDB 98] • Only a small percentage of objects moves • Maintaining Data Bubbles [Nassar et.al, SIGMOD 04] • Redistributes updated objects in existing bubbles

MC1: The Straight-forward approach • G: set of moving clusters • Apply clustering to next timeslice Si • Expand moving clusters in G • Add new moving clusters to G • Report ending clusters

Hash-based DBSCAN • Memory: • 10M objects with 1GB RAM

MC1 is inefficient! • Checks all possible combination of clusters in consecutive timeslices • Performs clustering for every timeslice

MC2: Minimizing Redundant Checks • Clustering in every timeslice • Select a random object in c1 • Search the object in S2 • Repeat for remaining objects • Max: (1-θ)|ci| objects c1c2 is a moving cluster

Ambiguity Cases: θ<0.5 {c0c1, c2} {c0c2, c1}

MC3: Approximate Moving Clusters • Intuition: Many clusters will remain the same even if objects move • Avoid performing clustering in every timeslice • For an object o • If o belongs to cluster c in timeslice Si • Assume that o also belongs to c in the next timeslice (notice: objects may have moved)

Refine clusters • Hash new clusters in a grid • Legal cluster: • Does not meet/intersect with other clusters • It is connected (cells meet) • Objects in legal clusters are not considered further • For the rest of the objects, perform clustering • Possible inaccuracies!!!

Minimize Error • Perform exact clustering to absorb (may not eliminate) the accumulated error • Period for exact clustering: Grows linearly, drops exponentially • Exact clustering: If more that α|G| clusters have been added/removed

Experimental Evaluation • 10K-50K objects per timeslice • 50-100 timeslices, up to 5M objects • Linux, C++, 1.3GHz CPU, 1.2GB RAM • Generator: Clusters move/rotate, objects appear/disappear

Varying data size (10K-50K per timeslice) • θ=0.9, α=0.1 • Larger dataset: larger clusters, more interactions Avg: 87%

Varying number of clusters (100-800 per timeslice) • 5M objects, θ=0.9, α=0.1 • Many clusters: Reaches error threshold fast 96% 87% 73%

Varying α • 5M objects, θ=0.9, 800 clusters • α small: may not recover!!!

Varying α for different agilities • Low agility: Fewer errors  faster

MC3 for varying θ • 5M objects, α=0.1, 800 clusters • θ large: incorrect clusters are pruned for not satisfying the θ criterion

Conclusions • Moving clusters • Objects may move/change • Exact and approximate solutions • Future work • Automatic setting of parameter α • Better error estimation • Constraints (e.g, moving cluster must span at least k timeslices)

Questions?

On Discovering Moving Clusters in Spatio-temporal Data

On Discovering Moving Clusters in Spatio-temporal Data

Presentation Transcript

Spatio-Temporal Data Mining

SPATIO TEMPORAL FRAMEWORKS

Probabilistic Cardinal Direction Queries On Spatio -Temporal Data

Towards efficient prospective detection of multiple spatio -temporal clusters

On-Line Discovery of Flock Patterns in Spatio-Temporal Data *

Spatio-temporal HAC

Spatio-Temporal Databases

Moving Pattern Detection in Spatio Temporal Data Mining

Moving Pattern Detection in Spatio Temporal Data Mining

DMiST- Data Mining in Spatio-Temporal sets dmist

Spatio-Temporal Clustering

Discovering Communicable Scientific Knowledge from Spatio-Temporal Data

Spatio-Temporal Databases

SPATIO-TEMPORAL DATABASES

Continuous Query Processing on Spatio-Temporal Data Streams

Swarm: Mining Relaxed Temporal Moving Object Clusters

SPATIO-TEMPORAL DATABASES

Managing Uncertainty in Spatial and Spatio -temporal Data

Indexing Spatio-Temporal Data Warehouses

Spatio-temporal Databases

Spatio-Temporal Predicates

Spatio-Temporal Databases