1 / 34

Reporter: yu-ch Lo

Mining Frequent Spatio-temporal Sequential Patterns. Huiping Cao, Nikos Mamoulis, and David W. Cheung Department of Computer Science The University of Hong Kong Pokfulam Road, Hong Kong { hpcao, nikos, dcheung } @cs.hku.hk. Reporter: yu-ch Lo. Outline. Introduction Related work

analu
Download Presentation

Reporter: yu-ch Lo

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Frequent Spatio-temporal Sequential Patterns Huiping Cao, Nikos Mamoulis, and David W. CheungDepartment of Computer ScienceThe University of Hong KongPokfulam Road, Hong Kong{hpcao, nikos, dcheung}@cs.hku.hk Reporter: yu-ch Lo

  2. Outline • Introduction • Related work • Spatio-temporal sequential patterns • Motivation • Problem definition • Growing • Experiments • Conclusion

  3. Introduction • The movement of an object (i.e., trajectory) can be described by a sequence of spatial locations sampled at consecutive timestamps (e.g., with the use of Global Positioning System (GPS) devices).

  4. Introduction • The main contributions of this paper are: • (i) We propose a model for spatio-temporal sequential patterns mining, based on appropriate definitions for pattern elements and pattern instances. • (ii) We present an effective method for extracting pattern elements. • (iii) We provide efficient pattern mining algorithms for discovering longer patterns.

  5. Related Work • A fixed sliding window w is used to extract segments (i.e., subsequences) in the event series, and the contribution of every segment to each candidate episode’s frequency is counted. W=5 a a b b c d e f g (a_b_c, _ab_c, a_ _bc, _a_bc)

  6. Spatio-temporal Sequential Patterns • A spatio-temporal sequence S is a list of locations,(x1, y1, t1), (x2, y2, t2), . . . , (xn, yn, tn), where ti represents the timestamp of location (xi, yi) (1 ≤ i ≤ n). Y (2,3,1) (6,3,3) (4,2,2) (1,1,0) X

  7. Motivation • A naive method is to use a regular grid (or some predefined spatial decomposition) to divide the space into regions by taking a user-defined parameter G Grid I:c2c4c8c9c6c2 . . . Grid II:c2c1c4c7c8c9c6 . . .

  8. Motivation • Although intuitive, this method has two problems: • i) We lose the information on how the object moves inside a cell, if the space decomposition is coarse. • ii) For two instances of a pattern, the locations may not fall into the same cell.

  9. Problems i) ii) d b a c

  10. lij S Problem Definition • A segment sij in a spatio-temporal sequence S (1 ≤ i < j ≤ n) is a contiguous subsequence of S, starting from (xi, yi, ti) and ending at (xj , yj , tj). • Given sij , we define its representative line segment lij.

  11. Problem Definition • Let ε be a distance error threshold, sij complies with ~ lij with respect to ǫ and is denoted as sij ∝ε lij , if dist((xk, yk), lij ) ≤ ε • for all k(i ≤ k ≤ j), where dist((xk, yk), l ) is the distance between (xk, yk) and line segment l.

  12. Problem Definition • When sij ∝ε lij , each point (xk, yk), i ≤ k ≤ j, in sij can be projected to a point (x′k,y′k) on ~ lij . (x′k, y′k) implicitly denotes the projection of (xk, yk) to lij .

  13. <= f × max(lij .len, lgh.len) <= θ (i) (ii) Problem Definition L2 L1

  14. S2 S1 L2 Problem Definition L1 ∀(x′k, y′k) ∈ lij

  15. Problem Definition

  16. Problem Definition

  17. Problem Definition • A contiguous subsequence of Ss, sisi+1 . . . si+q−1, is a pattern instance for P: r1r2 . . . rq if ∀j(1 ≤ j ≤ q), if the representative line segment for segment si+j−1 is similarand closeto the central line segment of region rj .

  18. DP(Douglas-Peucker) Algorithm • Our idea is to transform S to Ss using such a technique. • The DP (Douglas-Peucker) algorithm [3] is a classical top down approach for this problem. • We employ DP method because it has been proved to be the best algorithm in choosing splitting points [12].

  19. DP(Douglas-Peucker) Algorithm A A B B A C C C < threshold D D > threshold < threshold E E E Threshold = ε

  20. Growing • Discovering frequent singular patterns from Ss is a hard problem, since in the worst-case, case, all combinations of segments in Ss have to be considered as candidate. To expedite the process, we employ a heuristic, Growing.

  21. Growing • Let Segs be a set initially containing all the segments in Ss. • It selects the segment s with median length, i.e., the median of the lengths of the segments in Segs, as seed for the initial spatial region r. • Then, r is grown by merging other gments in Segs through filteringand verificationsteps.

  22. Growing -filtering

  23. ls

  24. 1 r2 1 r3 1 r4 1 r1 Mining Using Substring Tree • Create a substring tree structure >> Let SR be r1 r2 r3 r4 r1r3 r4 r2 r3 r4 r1 r2 r3 r4 ROOT 1 3 2 r1 3 2 2 4 1 r3 3 r4 1 2 r2 2 3 2 4 1 r4 2 r1 1 r2 1 r3 1 2 r3 3 2 2 r1 1 r2 1 r3 1 r2 1 r3 1 r4 1 r4 1 r2 1 r3 1 r2 2 2

  25. Mining Using Substring Tree • In the above example, there are initially four elements(root’s children.) in the stack. • Let minsup = 2.

  26. r3(4) r3r4(4)

  27. 3 r3r4r2(3) r3r4(4) r3r4(4) r3r4r1(2)

  28. Experiments • Real datasets: The real data contain tracked bus movements in Patras, Greece. Each sequence is the movement of a bus in a single day. • Locations were sampled every 30 seconds • The series length varies in the range between 1000 to 7000..

  29. Experiments • we tune the parameters to ε = 20 (map size is 100 × 100), f = 0.2, θ = 0.3 radian, and minsup = 3.

  30. Experiments

  31. Experiments

  32. Conclusion • In this paper, we modeled the problem of mining sequential patterns from spatio-temporal data by considering both spatial and temporal information. • In addition, we employed special properties of the problem (spatial connectivity,closeness) and a newly proposed substring tree to accelerate search for longer patterns.

More Related