Create Presentation
Download Presentation

Download Presentation

Computational Geometry and Spatial Data Mining

Download Presentation
## Computational Geometry and Spatial Data Mining

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Computational Geometry and Spatial Data Mining**Marc van Kreveld Department of Information and Computing Sciences Utrecht University**Clustering?**• Are the people clustered in this room? How do we define a cluster? • In spatial data mining we have objects/ entities with a location given by coordinates • Cluster definitions involve distance between locations**Clustering - options**• Determine whether clustering occurs • Determine the degree of clustering • Determine the clusters • Determine the largest cluster • Determine the outliers**Co-location**• Are the men clustered? • Are the women clustered? • Is there a co-location of men and women?**Co-location**• Like before, we may be interested in • is there co-location? • the degree of co-location • the largest co-location • the co-locations themselves • the objects not involved in co-location**Spatio-temporal data**• Locations have a time stamp • Interesting patterns involve space and time**Trajectory data**• Entities with a trajectory (time-stamped motion path) • Interesting patterns involve subgroupswith similar heading, expected arrival,joint motion, ... • n entities = trajectories; n = 10 – 100,000 • t time steps; t = 10 – 100,000 input size is nt • m size subgroup (unknown); m = 10 – 100,000**Examples of trajectory data**• Tracked animals (buffalo, birds, ...) • Tracked people (potential terrorists) • Tracked GSMs (e.g. for traffic purposes) • Trajectories of tornadoes • Sports scene analysis (players on a soccer field)**Example pattern in trajectories**• What is the location visited by most entities? location = circular region of specified radius**Example pattern in trajectories**• What is the location visited by most entities? location = circular region of specified radius 4 entities**Example pattern in trajectories**• What is the location visited by most entities? location = circular region of specified radius 3 entities**Example pattern in trajectories**• Compute buffer of each trajectory**1**Example pattern in trajectories • Compute buffer of each trajectory • Compute the arrangement of the buffers and the cover count of each cell 1 1 1 2 0 1**Example pattern in trajectories**• One trajectory has t time stamps; its buffer can be computed in O(t log t) time • All buffers can be computed in O(nt log t) time • The arrangement can be computed in O(nt log (nt) + k) time, where k = O( (nt)2 ) is the complexity of the arrangement • Cell cover counts are determined in O(k) time**Example pattern in trajectories**• Total: O(nt log (nt) + k) time • If the most visited location is visited bym entities, this is O(nt log (nt) + ntm) • Note: input size is nt ;n entities, each with location at t moments**Spatial data**n points (locations) Distance is important clustering pattern Presence of attributes (e.g. man/woman): co-location patterns Spatio-temporal data n trajectories, each has t time steps Distance is time-dependent flock pattern meet pattern Heading and speed are important and are also time-dependent Patterns in entity data**Entities in subdivisions**• Also co-location pattern • Discovered simply by overlayE.g., occurrences of oakson different soil types**Clustering entities in subdivisions**• What if it is known that the entities only occur in regions of a certain type? Situation without subdivision radius of cluster bird nests**Clustering entities in subdivisions**• What if it is known that the entities only occur in regions of a certain type? Situation with subdivisionland-water radius of cluster bird nests**house**car Clustering entities in subdivisions burglary**Region-restricted clustering**Joint research with Joachim Gudmundsson (NICTA, Sydney) and Giri Narasimhan (U of F, Miami), 2006 • Determine clusters in point sets that are sensitive to the geographic context (at least, for the relevant aspects) Assume that a set of regions is given where points can only be, how should we define clusters?**Region-restricted clustering**• Given a set P of points, a set F of regions, a radius r and a subset size m, aregion-restricted cluster is a subset P’P inside a circle C where • P’ has size at least m • C has radius at most 2r • C contains at most r2 area of regions of F r ≤ 2r sum area ≤ r2**Region-restricted clustering**• Given a set P of n points, a set F of polygons with nf edges in total, and values for r and m, report all region-restricted clusters of exactly m points • Exactly m points? • “Real” clustering (partition)? • Outliers?**Region-restricted clustering**• Exactly m points?Every cluster with >m points consists of clusters with m points with smaller circles • “Real” clustering (partition)? • Outliers? m = 5**Region-restricted clustering**• Exactly m points?Every cluster with >m points consists of clusters with m points with smaller circles • “Real” clustering (partition)? • Outliers? m = 5**Region-restricted clustering**• Determine all smallest circles with m points of P inside • Test if the radius is ≤r (report) or > 2r (discard) • If the radius is in between, determine the area of regions of F inside**Region-restricted clustering**• Determine all smallest circles with m points of P inside • Use (m-2)-th order Voronoi diagram: cells where the same (m-2) points are closest • Its vertices are centers of smallest circles around exactly m points**ordinary =**order-1 VD**Region-restricted clustering**• The m-th order Voronoi diagram (or (m-2)) has O(nm) cells, edges, and vertices • It can be constructed in O(nm log n) time we get O(nm) smallest circles with m points inside; for each we also know the radius**Region-restricted clustering**2. Test if the radius is ≤r (report) or > 2r (discard) Trivial in O(1) time per circle, so in O(nm) time overall**Region-restricted clustering**3. Determine the area of regions of F inside Brute force: O(nf) time per circle, so in O(nmnf) time overall**Region-restricted clustering**• Complication: This need not give all region-restricted clusters! • Need to compute area of F inside a circle with moving center • Requires solving high-degree polynomials**Region-restricted clusters**• The anti-climax: we cannot give an exact algorithm! • If we takes squares instead of circles, we can deal with the problem ....**Region-restricted clustering**3. Determine the area of regions of F inside Brute force: O(nf) time per square, so in O(nmnf) time overall The total time for steps 1, 2, and 3 isO(nm log n) + O(nm) + O(nmnf) = O(nm log n + nmnf) time**Region-restricted clustering**3. Determine the area of regions of F inside Using a suitable data structure (only possible for squares): O(log2nf) time per square, so in O(nm log2 nf) time overall The total time becomes O(nm log n + nflog2 nf +nm log2 nf) total query time in data structure order- (m-2) VD construction preprocessing of data structure**Region-restricted clustering**• The squares solution generalizes toregular polygons (e.g. 20-gons) • An approximation of the radius within (1+)r gives a O(n/2 + nf log2nf + n log nf /(m 2)) time algorithm 16-gon**Region-restricted clustering**• Open problems: • Develop a region-restricted version of k-means clustering, single link clustering, ... • Region-restricted co-location? • Replace region-restricted by gradual model typical: clusters: 0 /unit 2 /unit 5 /unit 8 /unit**Patterns in trajectories**• n trajectories, each with t time steps n polygonal lines with t vertices • Already looked at most visited location**Patterns in trajectories**• Flock: near positions of (sub)trajectories for some subset of the entities during some time • Convergence: same destination region for some subset of the entities • Encounter: same destination region with same arrival time for some subset of the entities • Similarity of trajectories • Same direction of movement, leadership, ...... flock convergence**Patterns in trajectories**• Flocking, convergence, encounter patterns • Laube, van Kreveld, Imfeld (SDH 2004) • Gudmundsson, van Kreveld, Speckmann (ACM GIS 2004) • Benkert, Gudmundsson, Huebner, Wolle (ESA 2006) • ... • Similarity of trajectories • Vlachos, Kollios, Gunopulos (ICDE 2002) • Shim, Chang (WAIM 2003) • ... • Lifelines, motion mining, modeling motion • Mountain, Raper (GeoComputation 2001) • Kollios, Scaroff, Betke (DM&KD 2001) • Frank (GISDATA 8, 2001) • ...**Patterns in trajectories**• Flock: near positions of (sub)trajectories for some subset of the entities during some time • clustering-type pattern • different definitions are used • Given: radius r, subset size m, and duration T,a flock is a subset of size m that is inside a (moving) circle of radius r for a duration T