Marc van Kreveld (
This presentation is the property of its rightful owner.
Sponsored Links
1 / 90

Computational Geometry and Spatial Data Mining PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

Marc van Kreveld ( and Giri Narasimhan ) Department of Information and Computing Sciences Utrecht University. Computational Geometry and Spatial Data Mining. Clustering?. Are the people clustered in this room? How do we define a cluster?

Download Presentation

Computational Geometry and Spatial Data Mining

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Computational geometry and spatial data mining

Marc van Kreveld (and Giri Narasimhan)

Department of Information and Computing Sciences

Utrecht University

Computational Geometry and Spatial Data Mining


Clustering

Clustering?

  • Are the people clustered in this room?

    • How do we define a cluster?

  • In spatial data mining we have objects/ entities with a location given by coordinates

  • Cluster definitions involve distance between locations

    • How do we define distance?


Clustering options

Clustering - options

  • Determine whether clustering occurs

  • Determine the degree of clustering

  • Determine the clusters

  • Determine the largest cluster

  • Determine the largest empty region

  • Determine the outliers


Co location

Co-location

  • Are the men clustered?

  • Are the women clustered?

  • Is there a co-location of men and women?

  • Determine regions favored exclusively by women. Men? Loners? Couples? Families?

  • Determine empty regions.


Co location1

Co-location

  • Like before, we may be interested in

    • is there co-location?

    • the degree of co-location

    • the largest co-location

    • the co-locations themselves

    • the objects not involved in co-location

    • Regions with no (or little) co-location


Spatio temporal data

Spatio-temporal data

  • Locations have a time stamp

  • Interesting patterns involve space and time

  • Anomalies?


Trajectory data

Trajectory data

  • Entities with a trajectory (time-stamped motion path)

  • Interesting patterns involve subgroupswith similar heading, expected arrival,joint motion, ...

  • n entities = trajectories; n = 10 – 100,000

  • t time steps; t = 10 – 100,000 input size is nt

  • m size subgroup (unknown); m = 10 – 100,000


Examples of trajectory data

Examples of trajectory data

  • Tracked animals (buffalo, birds, ...)

  • Tracked people (potential terrorists)

  • Tracked GSMs (e.g. for traffic purposes)

  • Trajectories of tornadoes

  • Sports scene analysis (players on a soccer field)


Example pattern in trajectories

Example pattern in trajectories

  • What is the location visited by most entities?

location = circular region of specified radius


Example pattern in trajectories1

Example pattern in trajectories

  • What is the location visited by most entities?

location = circular region of specified radius

4 entities


Example pattern in trajectories2

Example pattern in trajectories

  • What is the location visited by most entities?

location = circular region of specified radius

3 entities


Example pattern in trajectories3

Example pattern in trajectories

  • Compute buffer of each trajectory


Example pattern in trajectories4

1

Example pattern in trajectories

  • Compute buffer of each trajectory

  • Compute the arrangement of the buffers and the cover count of each cell

1

1

1

2

0

1


Example pattern in trajectories5

Example pattern in trajectories

  • One trajectory has t time stamps; its buffer can be computed in O(t log t) time

  • All buffers can be computed in O(nt log t) time

  • The arrangement can be computed in O(nt log (nt) + k) time, where k = O( (nt)2 ) is the complexity of the arrangement

  • Cell cover counts are determined in O(k) time


Example pattern in trajectories6

Example pattern in trajectories

  • Total: O(nt log (nt) + k) time

  • If the most visited location is visited bym entities, this is O(nt log (nt) + ntm)

  • Note: input size is nt ;n entities, each with location at t moments


Patterns in entity data

Patterns in entity data

Spatial data

  • n points (locations)

  • Distance is important

    • clustering pattern

  • Presence of attributes (e.g. man/woman):

    • co-location patterns

Spatio-temporal data

  • n trajectories, each has t time steps

  • Distance is time-dependent

    • flock pattern

    • meet pattern

  • Heading and speed are important and are also time-dependent


Entities in subdivisions

Entities in subdivisions

  • Also co-location pattern

  • Discovered simply by overlayE.g., occurrences of oakson different soil types


Clustering entities in subdivisions

Clustering entities in subdivisions

  • What if it is known that the entities only occur in regions of a certain type?

Situation without subdivision

radius of cluster

bird nests


Clustering entities in subdivisions1

Clustering entities in subdivisions

  • What if it is known that the entities only occur in regions of a certain type?

Situation with subdivisionland-water

radius of cluster

bird nests


Clustering entities in subdivisions2

house

car

Clustering entities in subdivisions

burglary


Region restricted clustering

Region-restricted clustering

Joint research with Joachim Gudmundsson (NICTA, Sydney) and Giri Narasimhan (U of F, Miami), 2006

  • Determine clusters in point sets that are sensitive to the geographic context (at least, for the relevant aspects) Assume that a set of regions is given where points can only be, how should we define clusters?


Region restricted clustering1

Region-restricted clustering

  • Given a set P of points, a set F of regions, a radius r and a subset size m, aregion-restricted cluster is a subset P’P inside a circle C where

    • P’ has size at least m

    • C has radius at most 2r

    • C contains at most r2 area of regions of F

r

≤ 2r

sum area ≤ r2


Region restricted clustering2

Region-restricted clustering

  • Given a set P of n points, a set F of polygons with nf edges in total, and values for r and m, report all region-restricted clusters of exactly m points

  • Exactly m points?

  • “Real” clustering (partition)?

  • Outliers?


Region restricted clustering3

Region-restricted clustering

  • Exactly m points?Every cluster with >m points consists of clusters with m points with smaller circles

  • “Real” clustering (partition)?

  • Outliers?

m = 5


Region restricted clustering4

Region-restricted clustering

  • Exactly m points?Every cluster with >m points consists of clusters with m points with smaller circles

  • “Real” clustering (partition)?

  • Outliers?

m = 5


Region restricted clustering5

Region-restricted clustering

  • Determine all smallest circles with m points of P inside

  • Test if the radius is ≤r (report) or > 2r (discard)

  • If the radius is in between, determine the area of regions of F inside


Region restricted clustering step 1

Region-restricted clustering: Step 1

  • Determine all minimal circles with m points of P inside

  • Determine all minimal circles with 3 points of P inside


Computational geometry and spatial data mining

ordinary =

order-1 VD


Region restricted clustering6

Region-restricted clustering

  • Determine all smallest circles with m points of P inside

    • Use (m-2)-th order Voronoi diagram: cells where the same (m-2) points are closest

    • Its vertices are centers of smallest circles around exactly m points


Computational geometry and spatial data mining

ordinary =

order-1 VD


Computational geometry and spatial data mining

order-2 VD


Computational geometry and spatial data mining

order-3 VD


Region restricted clustering7

Region-restricted clustering

  • The m-th order Voronoi diagram (or (m-2)) has O(nm) cells, edges, and vertices

  • It can be constructed in O(nm log n) time we get O(nm) smallest circles with m points inside; for each we also know the radius


Region restricted clustering8

Region-restricted clustering

2. Test if the radius is ≤r (report) or > 2r (discard) Trivial in O(1) time per circle, so in O(nm) time overall


Region restricted clustering9

Region-restricted clustering

3. Determine the area of regions of F inside

Brute force: O(nf) time per circle, so in O(nmnf) time overall


Region restricted clustering10

Region-restricted clustering

  • Complication: This need not give all region-restricted clusters!

    • Need to compute area of F inside a circle with moving center

    • Requires solving high-degree polynomials


Region restricted clusters

Region-restricted clusters

  • The anti-climax: we cannot give an exact algorithm!

  • If we takes squares instead of circles, we can deal with the problem ....


Region restricted clustering11

Region-restricted clustering

3. Determine the area of regions of F inside

Brute force: O(nf) time per square, so in O(nmnf) time overall

The total time for steps 1, 2, and 3 isO(nm log n) + O(nm) + O(nmnf) = O(nm log n + nmnf) time


Region restricted clustering12

Region-restricted clustering

3. Determine the area of regions of F inside

Using a suitable data structure (only possible for squares): O(log2nf) time per square, so in O(nm log2 nf) time overall

The total time becomes

O(nm log n + nflog2 nf +nm log2 nf)

total query time

in data structure

order- (m-2)

VD construction

preprocessing

of data structure


Region restricted clustering13

Region-restricted clustering

  • The squares solution generalizes toregular polygons (e.g. 20-gons)

  • An approximation of the radius within (1+)r gives a O(n/2 + nf log2nf + n log nf /(m 2)) time algorithm

16-gon


Region restricted clustering14

Region-restricted clustering

  • Open problems:

    • Develop a region-restricted version of k-means clustering, single link clustering, ...

    • Region-restricted co-location?

    • Replace region-restricted by gradual model

typical:

clusters:

0 /unit

2 /unit

5 /unit

8 /unit


Patterns in trajectories

Patterns in trajectories

  • n trajectories, each with t time steps n polygonal lines with t vertices

  • Already looked at most visited location


Patterns in trajectories1

Patterns in trajectories

  • Flock: near positions of (sub)trajectories for some subset of the entities during some time

  • Convergence: same destination region for some subset of the entities

  • Encounter: same destination region with same arrival time for some subset of the entities

  • Similarity of trajectories

  • Same direction of movement, leadership, ......

flock

convergence


Patterns in trajectories2

Patterns in trajectories

  • Flocking, convergence, encounter patterns

    • Laube, van Kreveld, Imfeld (SDH 2004)

    • Gudmundsson, van Kreveld, Speckmann (ACM GIS 2004)

    • Benkert, Gudmundsson, Huebner, Wolle (ESA 2006)

    • ...

  • Similarity of trajectories

    • Vlachos, Kollios, Gunopulos (ICDE 2002)

    • Shim, Chang (WAIM 2003)

    • ...

  • Lifelines, motion mining, modeling motion

    • Mountain, Raper (GeoComputation 2001)

    • Kollios, Scaroff, Betke (DM&KD 2001)

    • Frank (GISDATA 8, 2001)

    • ...


Patterns in trajectories3

Patterns in trajectories

  • Flock: near positions of (sub)trajectories for some subset of the entities during some time

    • clustering-type pattern

    • different definitions are used

  • Given: radius r, subset size m, and duration T,a flock is a subset of size m that is inside a (moving) circle of radius r for a duration T


Patterns in trajectories4

Patterns in trajectories

  • Longest flock: given a radius r and subset size m, determine the longest time interval for which m entities were within each other’s proximity (circle radius r)

Time = 0

1

2

3

4

5

6

7

8

m = 3

longest flock in [ 1.8 , 6.4 ]


Patterns in trajectories5

Patterns in trajectories

  • Meet: near some position of (sub)trajectories for some subset of the entities

    • clustering-type pattern

  • Given: radius r, subset size m, and duration T,a meet is a subset of size m that is inside a (stationary) circle of radius r for a duration T

this was “moving” for flock


Patterns in trajectories6

Patterns in trajectories

  • The same subset required for a flock or meet?

Example: meet with m = 4; duration is 3+ time steps or 4+ time steps?


Patterns in trajectories7

Patterns in trajectories

fixed subset

variable subset

flock

meet

examples for m = 3


Patterns in trajectories8

Patterns in trajectories

fixed subset

variable subset

O(n3 log n)

NP-hard

flock

meet

O(n42 log n + n23)

O(n42 log n + n23)

Exact results ( input size is n )


Patterns in trajectories9

longest flock for r

at least as long a flock for 2r

Patterns in trajectories

  • A radius-2 approximation of the longest flock can be computed in time O(n2 log n)... meaning: if the longest flock of size m for radius rhas duration T, then we surely find a flock of size m and duration T for radius 2r


Patterns in trajectories10

Patterns in trajectories

Approximate radius results ( input size is n )

fixed subset

variable subset

flock

O(n2 log n)

O((n2 log n) / 2)

factor 2

factor 2+

O(n3 log n)

NP-hard

meet

O((n2 log n) / (m2))

O((n2 log n) / (m2))

factor 1+

factor 1+

O(n42 log n + n23)

O(n42 log n + n23)


Fixed subset flock

r

Fixed subset flock

  • It is NP-complete to decide if a graph has a subgraph with m nodes that is a clique

v7

v2

v4

For every node of the graph,

make an entity with a trajectory

v1

v3

v5

v1

v2

v3

v4

v5

v6

v7

v6

v1 is not adjacent to

v4, v5, and v7

all nodes notadjacent to v1 go here


Fixed subset flock1

v4 in flock

v4 not in flock

Fixed subset flock

v7

v2

v4

v1

v3

v1

v2

v3

v4

v5

v6

v7

v5

v6


Fixed subset flock2

Fixed subset flock

v7

v2

v4

v1

v3

v1

v2

v3

v4

v5

v6

v7

v5

v6

flock {v4,v5,v7} of (full) duration 23 (3·7+2) and size 3

The trajectories have a fixed flock of size m and full duration if and only if the graph has a clique of size m


Fixed subset flock3

Fixed subset flock

  • Longest fixed flock is NP-hard

  • Max clique has no approximation cannot approximate duration, nor flock size

  • The reduction applies for all radii < 2r

v4 in flock

v1

v2

v3

v4

v5

v6

v7

v4 not in flock


Flock and meet algorithms

Flock and meet algorithms

  • Go into 3D (space-time) for algorithms

time

4

3

duration

2

duration

1

0

flock

meet


Fixed subset flock approximation

Fixed subset flock, approximation

  • An efficient radius-2 approximation algorithm of longest fixed flock exists

  • Idea: if some vi is in the longest flock, then all other entities are within distance 2r from vi

flock with vi

vi

radius 2r, centered at vi

2r


Fixed subset flock approximation1

Fixed subset flock, approximation

  • For each vj, we can determine the O() time intervals where vj is in the column of vi

  • Maintain the intersections for all entities in an augmented tree inO(n log n) time

  • Do this for all columns (role of vi)and report longest overall pattern Total: O(n2 log n) time


Variable subset flock exact

r

defining entities

Variable subset flock, exact

  • The subset that forms the flock may change entities, but must stay of size m

  • Any flock subset at any instant has a disk D of radius r with at least 2 entities on the boundary defining entities


Variable subset flock exact1

Variable subset flock, exact

  • Two entities define two cylinders through time by tracing the two possible radius r disks


Variable subset flock exact2

Variable subset flock, exact

  • Two entities define two cylinders through time by tracing the two possible radius r disks


Variable subset flock exact3

Variable subset flock, exact

  • Two entities define two cylinders through time by tracing the two possible radius r disks


Variable subset flock exact4

Variable subset flock, exact

  • Two entities define two cylinders through time by tracing the two possible radius r disks


Variable subset flock exact5

Variable subset flock, exact

  • Two entities define two cylinders through time by tracing the two possible radius r disks


Variable subset flock exact6

Variable subset flock, exact

  • Two entities define two cylinders through time by tracing the two possible radius r disks


Variable subset flock exact7

Variable subset flock, exact

  • Two entities define two cylinders through time by tracing the two possible radius r disks


Variable subset flock exact8

Variable subset flock, exact

  • Two entities define two cylinders through time by tracing the two possible radius r disks


Variable subset flock exact9

Variable subset flock, exact

  • Two entities define two cylinders through time by tracing the two possible radius r disks


Variable subset flock exact10

Variable subset flock, exact

  • Two entities define two cylinders through time by tracing the two possible radius r disks


Variable subset flock exact11

Variable subset flock, exact

  • Two entities define two cylinders through time by tracing the two possible radius r disks


Variable subset flock exact12

Variable subset flock, exact

  • A critical moment is where another entity is on the boundary of the disk; it may go outside or inside


Variable subset flock exact13

Variable subset flock, exact

  • At a critical moment:

    • a variable subset flock may start (m entities)

    • a variable subset flock may stop (<m entities)

    • Three pairs of defining entities have disks that coincide

  • There are also critical moments when two entities are at distance exactly 2r

  • Between two time steps ti and ti+1 there are O(n3) critical moments  in total there are O(n3) critical moments

2r


Variable subset flock exact14

Variable subset flock, exact

  • Let theO(n3) critical moments be the nodes in a directed acyclic graph G

  • Edges of G are between two consecutive critical moments of the same two defining entities

    • directed from earlier to later

    • weight is time between critical moments

    • only if at least m entities are inside the disk

A longest variable subset flock is a maximum weight path in G

time


Variable subset flock exact15

Variable subset flock, exact

  • The graph G can be built inO(n3 log n) time

  • A maximum weight path can be found in O(n3 log n) time

A longest variable subset flock is a maximum weight path in G

time


Patterns in trajectories summary

Patterns in trajectories, summary

  • Flock and meet patterns require algorithms in 3-dimensional space (space-time)

  • Exact algorithms are inefficient  only suitable for smaller data sets

  • Approximation can reduce running time with one or two orders of magnitude


Patterns in trajectories summary1

Patterns in trajectories, summary

fixed subset

variable subset

apx

O(n2 log n)

O((n2 log n) / 2)

factor 2

factor 2+

flock

NP-hard

O(n3 log n)

exact

apx

O((n2 log n) / (m2))

O((n2 log n) / (m2))

factor 1+

factor 1+

meet

O(n42 log n + n23)

O(n42 log n + n23)

exact


Future research on longest trajectories

Future research on longest trajectories

  • Faster exact and approximation algorithms

  • Better approximation factors

  • Remove restriction of fixed shape of flocking region (compact or elongated both possible during same flock)

  • Longest duration convergence

longest convergence


Patterns in trajectories11

Patterns in trajectories

  • Flock and meet patterns require algorithms in 3-dimensional space (space-time)

  • Exact algorithms are inefficient  only suitable for smaller data sets

  • Approximation can reduce running time with an order of magnitude


To conclude

To conclude

  • With an exact definition of a spatial or spatio-temporal pattern, geometric algorithms can be used to compute all patterns

  • Many known structures from computational geometry are useful (Voronoi diagrams, arrangements, ...)

  • Since the (exact) algorithms may be inefficient, approximation may be a solution


To discuss

To discuss

  • What patterns must be detected in practice (both spatial and spatio-temporal)?

  • What is the most appropriate definition (formalization) of these?

  • Spatial association rules, auto-correlation, irregularities, classification, ... and other computable things in spatial/spatio-temporal data mining


  • Login