1 / 88

Computational Geometry and Spatial Data Mining

Computational Geometry and Spatial Data Mining. Marc van Kreveld Department of Information and Computing Sciences Utrecht University. Clustering?. Are the people clustered in this room?  How do we define a cluster?

aideen
Download Presentation

Computational Geometry and Spatial Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Geometry and Spatial Data Mining Marc van Kreveld Department of Information and Computing Sciences Utrecht University

  2. Clustering? • Are the people clustered in this room? How do we define a cluster? • In spatial data mining we have objects/ entities with a location given by coordinates • Cluster definitions involve distance between locations

  3. Clustering - options • Determine whether clustering occurs • Determine the degree of clustering • Determine the clusters • Determine the largest cluster • Determine the outliers

  4. Co-location • Are the men clustered? • Are the women clustered? • Is there a co-location of men and women?

  5. Co-location • Like before, we may be interested in • is there co-location? • the degree of co-location • the largest co-location • the co-locations themselves • the objects not involved in co-location

  6. Spatio-temporal data • Locations have a time stamp • Interesting patterns involve space and time

  7. Trajectory data • Entities with a trajectory (time-stamped motion path) • Interesting patterns involve subgroupswith similar heading, expected arrival,joint motion, ... • n entities = trajectories; n = 10 – 100,000 • t time steps; t = 10 – 100,000 input size is nt • m size subgroup (unknown); m = 10 – 100,000

  8. Examples of trajectory data • Tracked animals (buffalo, birds, ...) • Tracked people (potential terrorists) • Tracked GSMs (e.g. for traffic purposes) • Trajectories of tornadoes • Sports scene analysis (players on a soccer field)

  9. Example pattern in trajectories • What is the location visited by most entities? location = circular region of specified radius

  10. Example pattern in trajectories • What is the location visited by most entities? location = circular region of specified radius 4 entities

  11. Example pattern in trajectories • What is the location visited by most entities? location = circular region of specified radius 3 entities

  12. Example pattern in trajectories • Compute buffer of each trajectory

  13. 1 Example pattern in trajectories • Compute buffer of each trajectory • Compute the arrangement of the buffers and the cover count of each cell 1 1 1 2 0 1

  14. Example pattern in trajectories • One trajectory has t time stamps; its buffer can be computed in O(t log t) time • All buffers can be computed in O(nt log t) time • The arrangement can be computed in O(nt log (nt) + k) time, where k = O( (nt)2 ) is the complexity of the arrangement • Cell cover counts are determined in O(k) time

  15. Example pattern in trajectories • Total: O(nt log (nt) + k) time • If the most visited location is visited bym entities, this is O(nt log (nt) + ntm) • Note: input size is nt ;n entities, each with location at t moments

  16. Spatial data n points (locations) Distance is important clustering pattern Presence of attributes (e.g. man/woman): co-location patterns Spatio-temporal data n trajectories, each has t time steps Distance is time-dependent flock pattern meet pattern Heading and speed are important and are also time-dependent Patterns in entity data

  17. Entities in subdivisions • Also co-location pattern • Discovered simply by overlayE.g., occurrences of oakson different soil types

  18. Clustering entities in subdivisions • What if it is known that the entities only occur in regions of a certain type? Situation without subdivision radius of cluster bird nests

  19. Clustering entities in subdivisions • What if it is known that the entities only occur in regions of a certain type? Situation with subdivisionland-water radius of cluster bird nests

  20. house car Clustering entities in subdivisions burglary

  21. Region-restricted clustering Joint research with Joachim Gudmundsson (NICTA, Sydney) and Giri Narasimhan (U of F, Miami), 2006 • Determine clusters in point sets that are sensitive to the geographic context (at least, for the relevant aspects) Assume that a set of regions is given where points can only be, how should we define clusters?

  22. Region-restricted clustering • Given a set P of points, a set F of regions, a radius r and a subset size m, aregion-restricted cluster is a subset P’P inside a circle C where • P’ has size at least m • C has radius at most 2r • C contains at most r2 area of regions of F r ≤ 2r sum area ≤ r2

  23. Region-restricted clustering • Given a set P of n points, a set F of polygons with nf edges in total, and values for r and m, report all region-restricted clusters of exactly m points • Exactly m points? • “Real” clustering (partition)? • Outliers?

  24. Region-restricted clustering • Exactly m points?Every cluster with >m points consists of clusters with m points with smaller circles • “Real” clustering (partition)? • Outliers? m = 5

  25. Region-restricted clustering • Exactly m points?Every cluster with >m points consists of clusters with m points with smaller circles • “Real” clustering (partition)? • Outliers? m = 5

  26. Region-restricted clustering • Determine all smallest circles with m points of P inside • Test if the radius is ≤r (report) or > 2r (discard) • If the radius is in between, determine the area of regions of F inside

  27. Region-restricted clustering • Determine all smallest circles with m points of P inside • Use (m-2)-th order Voronoi diagram: cells where the same (m-2) points are closest • Its vertices are centers of smallest circles around exactly m points

  28. ordinary = order-1 VD

  29. order-2 VD

  30. order-3 VD

  31. Region-restricted clustering • The m-th order Voronoi diagram (or (m-2)) has O(nm) cells, edges, and vertices • It can be constructed in O(nm log n) time we get O(nm) smallest circles with m points inside; for each we also know the radius

  32. Region-restricted clustering 2. Test if the radius is ≤r (report) or > 2r (discard) Trivial in O(1) time per circle, so in O(nm) time overall

  33. Region-restricted clustering 3. Determine the area of regions of F inside Brute force: O(nf) time per circle, so in O(nmnf) time overall

  34. Region-restricted clustering • Complication: This need not give all region-restricted clusters! • Need to compute area of F inside a circle with moving center • Requires solving high-degree polynomials

  35. Region-restricted clusters • The anti-climax: we cannot give an exact algorithm! • If we takes squares instead of circles, we can deal with the problem ....

  36. Region-restricted clustering 3. Determine the area of regions of F inside Brute force: O(nf) time per square, so in O(nmnf) time overall The total time for steps 1, 2, and 3 isO(nm log n) + O(nm) + O(nmnf) = O(nm log n + nmnf) time

  37. Region-restricted clustering 3. Determine the area of regions of F inside Using a suitable data structure (only possible for squares): O(log2nf) time per square, so in O(nm log2 nf) time overall The total time becomes O(nm log n + nflog2 nf +nm log2 nf) total query time in data structure order- (m-2) VD construction preprocessing of data structure

  38. Region-restricted clustering • The squares solution generalizes toregular polygons (e.g. 20-gons) • An approximation of the radius within (1+)r gives a O(n/2 + nf log2nf + n log nf /(m 2)) time algorithm 16-gon

  39. Region-restricted clustering • Open problems: • Develop a region-restricted version of k-means clustering, single link clustering, ... • Region-restricted co-location? • Replace region-restricted by gradual model typical: clusters: 0 /unit 2 /unit 5 /unit 8 /unit

  40. Patterns in trajectories • n trajectories, each with t time steps n polygonal lines with t vertices • Already looked at most visited location

  41. Patterns in trajectories • Flock: near positions of (sub)trajectories for some subset of the entities during some time • Convergence: same destination region for some subset of the entities • Encounter: same destination region with same arrival time for some subset of the entities • Similarity of trajectories • Same direction of movement, leadership, ...... flock convergence

  42. Patterns in trajectories • Flocking, convergence, encounter patterns • Laube, van Kreveld, Imfeld (SDH 2004) • Gudmundsson, van Kreveld, Speckmann (ACM GIS 2004) • Benkert, Gudmundsson, Huebner, Wolle (ESA 2006) • ... • Similarity of trajectories • Vlachos, Kollios, Gunopulos (ICDE 2002) • Shim, Chang (WAIM 2003) • ... • Lifelines, motion mining, modeling motion • Mountain, Raper (GeoComputation 2001) • Kollios, Scaroff, Betke (DM&KD 2001) • Frank (GISDATA 8, 2001) • ...

  43. Patterns in trajectories • Flock: near positions of (sub)trajectories for some subset of the entities during some time • clustering-type pattern • different definitions are used • Given: radius r, subset size m, and duration T,a flock is a subset of size m that is inside a (moving) circle of radius r for a duration T

More Related