1 / 22

DMiST- Data Mining in Spatio-Temporal sets dmist

DMiST- Data Mining in Spatio-Temporal sets www.dmist.net. Input. Number of time steps = T. Example: T = 9. Entity: (x1,y1), (x2,y2), … , (x9,y9). t=0. t=1. t=2. t=3. t=4. t=5. t=6. t=7. t=8. convergence. encounter. flock. Input. Number of entities/animals/items = n

jenna
Download Presentation

DMiST- Data Mining in Spatio-Temporal sets dmist

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DMiST-Data Mining in Spatio-Temporal sets www.dmist.net

  2. Input Number of time steps = T Example: T = 9 Entity: (x1,y1), (x2,y2), … , (x9,y9) t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7 t=8

  3. convergence encounter flock Input Number of entities/animals/items = n Example: n=4 and T=11 I1 : (x11,y11), … , (x1T,y1T) I2 : (x21,y21), … , (x2T,y2T) … In : (xn1,yn1), … , (xnT,ynT)

  4. Example Caribou Satellite Collar Project, Canada. Number of caribou = 15. Time steps = once a week for 8 years.

  5. Input size? To obtain efficient solutions we need solutions that scales well, i.e. algorithms with limited dependency on the input. n - number of entities (20  millions) T – number of time steps (10 thousands) m – size of a flock (2 200) entities k – flock duration (5 50) time steps Size of input = nT Practical algorithms O((nT)2) Fast algorithms O(nT log nT)

  6. Six basic patterns • Encounter • At least m entities pass through a circular region of radius r. • Convergence • At least m entities are simultaneously within a circular region of • radius r. • Flock • At least m entities move together during a time interval of length at least s; for every point in time there is a circular region of radius r that contains all the entities. • Recurrences • At least m entities are visiting a circular region of radius r at least k times. • Regular recurrences • Concurrent recurrences

  7. Members NICTA Joachim Gudmundsson Thomas Wolle Ghazi Al-Naymat DSTO Brenton Williams Matthew Lowry Uni. of Queensland Xiaofang Zhou Heng Tao Shen Hoyoung Jeung Uni. of Sydney Sanjay Chawla Utrecht University Marc van Kreveld

  8. Members NICTA Algorithms (apx) Computational Geometry Data mining DSTO Applications Data mining Uni. of Sydney Data mining Algorithms Uni. of Queensland Data base systems Data mining Utrecht University Algorithms GIS

  9. Approximations Most problems cannot be solved fast! Instead we need to approximate the solution. Example: Convergence (Radius r is given) Find all discs of radius r that contains at least m entities. r Approximate radius Approximate #entities Convergence m=10

  10. Convergence Is there a disc of radius r that intersects at least m lines?  Is there a point that is “covered” by at least m rectangles?

  11. Convergence Bad news: Cannot be solved exactly faster than ~Tn2. Good news: 2-approximation of the number of entities in O(Tn2/m) time.

  12. Encounter Is there a disc of radius r that intersects at least m entities at some point in time? t4 t3 t2 t1 2r

  13. Encounter - detect Idea: • Consider one “cylinder” C with radius 2r. • Compute the intersections between C and the n-1 paths. • If > 7m paths inside C at any time then “Encounter” Total time: O(n log n) / cylinder • If not, then solve exactly. Observation: The total size of all subsets within C is O(mn). Total time: O(n log n + nm) / cylinder Time O(Tn2 (log n+m)).

  14. t2 t4 t1 t3 Flock - definition m – flock size k – flock duration r – radius of disc

  15. c b a d e a d a b a c d b b c a e c e d e d e b e d e c b d t1 t2 t3 t4 t5 MaxClique Flock - Problem Problem: Find a largest flock. Problem is NP-hard. Problem as hard as MaxClique!

  16. Flock – Hardness result Cannot be approximated in polynomial time within a factor of n1- of the optimal. (even if we approximate the radius (factor 2)). Hopeless?

  17. Flock Idea: An entity in the time interval [t1,td]  A point in 2d-dimensions t4  t6 t2 t7 14-dimensional Euclidean space t3 t5 t1

  18. Flock t4  t6 t2 t7 t3 t5 Intersection of k (2k-2)-dimensional “cylinders” t1

  19. Flock • For each i=k to T do • For every entity E in the time interval [ti,ti+k] do • transform E to a point in 2k-dimensional space • Build a “Skip Quadtree” 5. For each point do • perform a 2k-dimensional range counting query. Approximation: 3-approximation of the radius Total time: O(Tk (n log n + (1.5)2k))

  20. Flock – experimental results

  21. What should be reported? • Detect if a pattern exists, report. • Report all patterns. • Report “largest” pattern

  22. Current and future research • Advanced patterns • Regular recurrences • Hierarchical patterns • … • Implement practical algorithms • Algorithms and association rule mining • Input data with errors? • External memory algorithms? • Generate test data

More Related