1 / 21

On Discovering Moving Clusters in Spatio-temporal Data

On Discovering Moving Clusters in Spatio-temporal Data. Panos Kalnis National University of Singapore Nikos Mamoulis University of Hong Kong Spiridon Bakiras Hong Kong University of Science and Technology. What is a Moving Cluster?.

kmedina
Download Presentation

On Discovering Moving Clusters in Spatio-temporal Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Discovering Moving Clusters in Spatio-temporal Data Panos Kalnis National University of Singapore Nikos Mamoulis University of Hong Kong Spiridon Bakiras Hong Kong University of Science and Technology

  2. What is a Moving Cluster? • Dense clusters of objects that move similarly for a long time period • Not necessarily the same objects during the lifetime of the cluster • Examples • Migrating animals • Convoy of cars • Military applications • Solutions: • Efficient exact and approximate algorithms

  3. Problem Formulation • Example: • Moving cluster

  4. MinPts=3 ε ε Related Work (Static) • Partition-based clustering (k-medoids) • Hierarchical clustering (BIRCH, CURE) • Density-based clustering (DBSCAN)

  5. Related Work (Moving Objects) • Grouping trajectories [Vlachos et.al, ICDE 02] • Trajectory cluster: Constant set of objects through its lifetime • Only similar movement; no space proximity • Dense areas over time [Hadjieleftheriou et.al, SSTD 03] • Static dense regions • No common objects between regions in sequence • Incremental DBSCAN/OPTICS [Ester et.al, VLDB 98] • Only a small percentage of objects moves • Maintaining Data Bubbles [Nassar et.al, SIGMOD 04] • Redistributes updated objects in existing bubbles

  6. MC1: The Straight-forward approach • G: set of moving clusters • Apply clustering to next timeslice Si • Expand moving clusters in G • Add new moving clusters to G • Report ending clusters

  7. Hash-based DBSCAN • Memory: • 10M objects with 1GB RAM

  8. MC1 is inefficient! • Checks all possible combination of clusters in consecutive timeslices • Performs clustering for every timeslice

  9. MC2: Minimizing Redundant Checks • Clustering in every timeslice • Select a random object in c1 • Search the object in S2 • Repeat for remaining objects • Max: (1-θ)|ci| objects c1c2 is a moving cluster

  10. Ambiguity Cases: θ<0.5 {c0c1, c2} {c0c2, c1}

  11. MC3: Approximate Moving Clusters • Intuition: Many clusters will remain the same even if objects move • Avoid performing clustering in every timeslice • For an object o • If o belongs to cluster c in timeslice Si • Assume that o also belongs to c in the next timeslice (notice: objects may have moved)

  12. Refine clusters • Hash new clusters in a grid • Legal cluster: • Does not meet/intersect with other clusters • It is connected (cells meet) • Objects in legal clusters are not considered further • For the rest of the objects, perform clustering • Possible inaccuracies!!!

  13. Minimize Error • Perform exact clustering to absorb (may not eliminate) the accumulated error • Period for exact clustering: Grows linearly, drops exponentially • Exact clustering: If more that α|G| clusters have been added/removed

  14. Experimental Evaluation • 10K-50K objects per timeslice • 50-100 timeslices, up to 5M objects • Linux, C++, 1.3GHz CPU, 1.2GB RAM • Generator: Clusters move/rotate, objects appear/disappear

  15. Varying data size (10K-50K per timeslice) • θ=0.9, α=0.1 • Larger dataset: larger clusters, more interactions Avg: 87%

  16. Varying number of clusters (100-800 per timeslice) • 5M objects, θ=0.9, α=0.1 • Many clusters: Reaches error threshold fast 96% 87% 73%

  17. Varying α • 5M objects, θ=0.9, 800 clusters • α small: may not recover!!!

  18. Varying α for different agilities • Low agility: Fewer errors  faster

  19. MC3 for varying θ • 5M objects, α=0.1, 800 clusters • θ large: incorrect clusters are pruned for not satisfying the θ criterion

  20. Conclusions • Moving clusters • Objects may move/change • Exact and approximate solutions • Future work • Automatic setting of parameter α • Better error estimation • Constraints (e.g, moving cluster must span at least k timeslices)

  21. Questions?

More Related