slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Computational Geometry and Spatial Data Mining PowerPoint Presentation
Download Presentation
Computational Geometry and Spatial Data Mining

Loading in 2 Seconds...

play fullscreen
1 / 90

Computational Geometry and Spatial Data Mining - PowerPoint PPT Presentation


  • 111 Views
  • Uploaded on

Marc van Kreveld ( and Giri Narasimhan ) Department of Information and Computing Sciences Utrecht University. Computational Geometry and Spatial Data Mining. Clustering?. Are the people clustered in this room? How do we define a cluster?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Computational Geometry and Spatial Data Mining


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Marc van Kreveld (and Giri Narasimhan) Department of Information and Computing Sciences Utrecht University Computational Geometry and Spatial Data Mining

    2. Clustering? • Are the people clustered in this room? • How do we define a cluster? • In spatial data mining we have objects/ entities with a location given by coordinates • Cluster definitions involve distance between locations • How do we define distance?

    3. Clustering - options • Determine whether clustering occurs • Determine the degree of clustering • Determine the clusters • Determine the largest cluster • Determine the largest empty region • Determine the outliers

    4. Co-location • Are the men clustered? • Are the women clustered? • Is there a co-location of men and women? • Determine regions favored exclusively by women. Men? Loners? Couples? Families? • Determine empty regions.

    5. Co-location • Like before, we may be interested in • is there co-location? • the degree of co-location • the largest co-location • the co-locations themselves • the objects not involved in co-location • Regions with no (or little) co-location

    6. Spatio-temporal data • Locations have a time stamp • Interesting patterns involve space and time • Anomalies?

    7. Trajectory data • Entities with a trajectory (time-stamped motion path) • Interesting patterns involve subgroupswith similar heading, expected arrival,joint motion, ... • n entities = trajectories; n = 10 – 100,000 • t time steps; t = 10 – 100,000 input size is nt • m size subgroup (unknown); m = 10 – 100,000

    8. Examples of trajectory data • Tracked animals (buffalo, birds, ...) • Tracked people (potential terrorists) • Tracked GSMs (e.g. for traffic purposes) • Trajectories of tornadoes • Sports scene analysis (players on a soccer field)

    9. Example pattern in trajectories • What is the location visited by most entities? location = circular region of specified radius

    10. Example pattern in trajectories • What is the location visited by most entities? location = circular region of specified radius 4 entities

    11. Example pattern in trajectories • What is the location visited by most entities? location = circular region of specified radius 3 entities

    12. Example pattern in trajectories • Compute buffer of each trajectory

    13. 1 Example pattern in trajectories • Compute buffer of each trajectory • Compute the arrangement of the buffers and the cover count of each cell 1 1 1 2 0 1

    14. Example pattern in trajectories • One trajectory has t time stamps; its buffer can be computed in O(t log t) time • All buffers can be computed in O(nt log t) time • The arrangement can be computed in O(nt log (nt) + k) time, where k = O( (nt)2 ) is the complexity of the arrangement • Cell cover counts are determined in O(k) time

    15. Example pattern in trajectories • Total: O(nt log (nt) + k) time • If the most visited location is visited bym entities, this is O(nt log (nt) + ntm) • Note: input size is nt ;n entities, each with location at t moments

    16. Patterns in entity data Spatial data • n points (locations) • Distance is important • clustering pattern • Presence of attributes (e.g. man/woman): • co-location patterns Spatio-temporal data • n trajectories, each has t time steps • Distance is time-dependent • flock pattern • meet pattern • Heading and speed are important and are also time-dependent

    17. Entities in subdivisions • Also co-location pattern • Discovered simply by overlayE.g., occurrences of oakson different soil types

    18. Clustering entities in subdivisions • What if it is known that the entities only occur in regions of a certain type? Situation without subdivision radius of cluster bird nests

    19. Clustering entities in subdivisions • What if it is known that the entities only occur in regions of a certain type? Situation with subdivisionland-water radius of cluster bird nests

    20. house car Clustering entities in subdivisions burglary

    21. Region-restricted clustering Joint research with Joachim Gudmundsson (NICTA, Sydney) and Giri Narasimhan (U of F, Miami), 2006 • Determine clusters in point sets that are sensitive to the geographic context (at least, for the relevant aspects) Assume that a set of regions is given where points can only be, how should we define clusters?

    22. Region-restricted clustering • Given a set P of points, a set F of regions, a radius r and a subset size m, aregion-restricted cluster is a subset P’P inside a circle C where • P’ has size at least m • C has radius at most 2r • C contains at most r2 area of regions of F r ≤ 2r sum area ≤ r2

    23. Region-restricted clustering • Given a set P of n points, a set F of polygons with nf edges in total, and values for r and m, report all region-restricted clusters of exactly m points • Exactly m points? • “Real” clustering (partition)? • Outliers?

    24. Region-restricted clustering • Exactly m points?Every cluster with >m points consists of clusters with m points with smaller circles • “Real” clustering (partition)? • Outliers? m = 5

    25. Region-restricted clustering • Exactly m points?Every cluster with >m points consists of clusters with m points with smaller circles • “Real” clustering (partition)? • Outliers? m = 5

    26. Region-restricted clustering • Determine all smallest circles with m points of P inside • Test if the radius is ≤r (report) or > 2r (discard) • If the radius is in between, determine the area of regions of F inside

    27. Region-restricted clustering: Step 1 • Determine all minimal circles with m points of P inside • Determine all minimal circles with 3 points of P inside

    28. ordinary = order-1 VD

    29. Region-restricted clustering • Determine all smallest circles with m points of P inside • Use (m-2)-th order Voronoi diagram: cells where the same (m-2) points are closest • Its vertices are centers of smallest circles around exactly m points

    30. ordinary = order-1 VD

    31. order-2 VD

    32. order-3 VD

    33. Region-restricted clustering • The m-th order Voronoi diagram (or (m-2)) has O(nm) cells, edges, and vertices • It can be constructed in O(nm log n) time we get O(nm) smallest circles with m points inside; for each we also know the radius

    34. Region-restricted clustering 2. Test if the radius is ≤r (report) or > 2r (discard) Trivial in O(1) time per circle, so in O(nm) time overall

    35. Region-restricted clustering 3. Determine the area of regions of F inside Brute force: O(nf) time per circle, so in O(nmnf) time overall

    36. Region-restricted clustering • Complication: This need not give all region-restricted clusters! • Need to compute area of F inside a circle with moving center • Requires solving high-degree polynomials

    37. Region-restricted clusters • The anti-climax: we cannot give an exact algorithm! • If we takes squares instead of circles, we can deal with the problem ....

    38. Region-restricted clustering 3. Determine the area of regions of F inside Brute force: O(nf) time per square, so in O(nmnf) time overall The total time for steps 1, 2, and 3 isO(nm log n) + O(nm) + O(nmnf) = O(nm log n + nmnf) time

    39. Region-restricted clustering 3. Determine the area of regions of F inside Using a suitable data structure (only possible for squares): O(log2nf) time per square, so in O(nm log2 nf) time overall The total time becomes O(nm log n + nflog2 nf +nm log2 nf) total query time in data structure order- (m-2) VD construction preprocessing of data structure

    40. Region-restricted clustering • The squares solution generalizes toregular polygons (e.g. 20-gons) • An approximation of the radius within (1+)r gives a O(n/2 + nf log2nf + n log nf /(m 2)) time algorithm 16-gon

    41. Region-restricted clustering • Open problems: • Develop a region-restricted version of k-means clustering, single link clustering, ... • Region-restricted co-location? • Replace region-restricted by gradual model typical: clusters: 0 /unit 2 /unit 5 /unit 8 /unit

    42. Patterns in trajectories • n trajectories, each with t time steps n polygonal lines with t vertices • Already looked at most visited location

    43. Patterns in trajectories • Flock: near positions of (sub)trajectories for some subset of the entities during some time • Convergence: same destination region for some subset of the entities • Encounter: same destination region with same arrival time for some subset of the entities • Similarity of trajectories • Same direction of movement, leadership, ...... flock convergence