clustering
Download
Skip this Video
Download Presentation
Clustering

Loading in 2 Seconds...

play fullscreen
1 / 40

Clustering - PowerPoint PPT Presentation


  • 148 Views
  • Uploaded on

Clustering. An overview of clustering algorithms Dènis de Keijzer GIA 2004. Overview. Algorithms GRAVIclust AUTOCLUST AUTOCLUST+ 3D Boundary-based Clustering SNN. Gravity based spatial clustering. GRAVIclust Initialisation Phase calculate the initial centre clusters

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Clustering' - bertille


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
clustering
Clustering

An overview of clustering algorithms

Dènis de Keijzer

GIA 2004

overview
Overview
  • Algorithms
    • GRAVIclust
    • AUTOCLUST
    • AUTOCLUST+
    • 3D Boundary-based Clustering
    • SNN
gravity based spatial clustering
Gravity based spatial clustering
  • GRAVIclust
    • Initialisation Phase
      • calculate the initial centre clusters
    • Optimisation Phase
      • improve the position of the cluster centres so as to achieve a solution which minimizes the distance function
graviclust initialisation phase5
GRAVIclust: Initialisation Phase
  • Input:
    • set of points P
    • matrix of distances between all pairs of points
      • assumption: actual access path distance
      • exists in GIS maps
        • e.g.. http://www.transinfo.qld.gov.au
      • very versatile
        • footpath
        • road map
        • rail map
graviclust initialisation phase6
GRAVIclust: Initialisation Phase
  • Input:
    • set of points P
    • matrix of distances between all pairs of points
    • # of required clusters k
graviclust initialisation phase7
GRAVIclust: Initialisation Phase
  • Step 1:
    • calculate first initial centre
      • the point with the largest number of points within radius r
      • remove first initial centre & all points within radius r from further consideration
    • Step 2:
      • repeat Step 1 until k initial centres have been chosen
    • Step 3:
      • create initial clusters by assigning all points to the closest cluster centre
graviclust radius calculation
GRAVIclust: radius calculation
  • Radius r
    • calculated based on the area of the region considered for clustering
    • static radius
      • based on the assumption that all clusters are of the same size
    • dynamic radius
      • recalculated after each initial cluster centre is chosen
graviclust static vs dynamic
GRAVIclust: Static vs. Dynamic
  • Static
    • reduced computation
    • # points within a radius r has to be calculated only once
    • not suitable for problems where the points are separated by large empty areas
  • Dynamic
    • increases computation time
    • ensures the radius is adjusted as the points are removed
  • Differs only when distribution is non-uniform
graviclust optimisation phase
GRAVIclust: Optimisation Phase
  • Step 1:
    • for each cluster, calculate new centre
      • based on the the point closest to cluster centre of gravity
  • Step 2:
    • re-assign points to new cluster centres
  • Step 3:
    • recalculate distance function
      • never greater than previous
  • Step 4:
    • repeat Step 1 to 3 until value distance function equals previous
graviclust
GRAVIclust
  • Deterministic
  • Can handle obstacles
  • Monotonic convergence of the distance function to a stable point
autoclust
AUTOCLUST
  • Definitions
autoclust13
AUTOCLUST
  • Definitions II
autoclust14
AUTOCLUST
  • Phase 1:
    • finding boundaries
  • Phase 2:
    • restoring and re-attaching
  • Phase 3:
    • detecting second-order inconsistency
autoclust phase 1
AUTOCLUST: Phase 1
  • Finding boundaries
    • Calculate
      • Delaunay Diagram
      • for each point pi
        • ShortEdges(pi)
        • LongEdges(pi)
        • OtherEdges(pi)
    • Remove
      • ShortEdges(pi) and LongEdges(pi)
autoclust phase 2
AUTOCLUST: Phase 2
  • Restoring and re-attaching
    • for each point pi where ShortEdges(pi) 
      • Determine a candidate connected component C for pi
        • If there are 2 edges ej = (pi,pj) and ek = (pi,pk) in ShortEdges(pi) with CC[pj] CC[pk], then
          • Compute, for each edge e = (pi,pj)  ShortEdges(pi), the size ||CC[pj]|| and let M = maxe = (pi,pj)  ShortEdges(pi) ||CC[pj]||
          • Let C be the class labels of the largest connected component (if there are two different connected components with cardinality M, we let C be the one with the shortest edge to pi)
autoclust phase 217
AUTOCLUST: Phase 2
  • Restoring and re-attaching
    • for each point pi where ShortEdges(pi) 
      • Determine a candidate connected component C for pi
        • If …
        • Otherwise, let C be the label of the connected component all edges e  ShortEdges(pi) connect pi to
autoclust phase 218
AUTOCLUST: Phase 2
  • Restoring and re-attaching
    • for each point pi where ShortEdges(pi) 
      • Determine a candidate connected component C for pi
      • If the edges in OtherEdges(pi) connect to a connected component different than C, remove them. Note that
        • all edges in OtherEdges(pi) are removed, and
        • only in this case, will pi swap connected components
      • Add all edges e  ShortEdges(pi) that connect to C
autoclust phase 3
AUTOCLUST: Phase 3
  • Detecting second-order inconsistency
    • compute the LocalMean for 2-neighbourhoods
    • remove all edges in N2,G(pi) that are long edges
autoclust21
AUTOCLUST
  • No user supplied arguments
    • eliminates expensive human-based exploration time for finding best-fit arguments
  • Robust to noise, outliers, bridges and type of distribution
  • Able to detect clusters with arbitrary shapes, different sizes and different densities
  • Can handle multiple bridges
  • O(n log n)
autoclust22
AUTOCLUST+
  • Construct Delaunay Diagram
  • Calculate MeanStDev(P)
  • For all edges e, remove e if it intersects some obstacles
  • Apply the 3 phases of AUTOCLUST to the planar graph resulting from the previous steps
3d boundary based clustering
3D Boundary-based Clustering
  • Benefits from 3D Clustering
    • more accurate spatial analysis
    • distinguish
      • positive clusters:
        • clusters in higher dimensions but not in lower dimensions
3d boundary based clustering24
3D Boundary-based Clustering
  • Benefits from 3D Clustering
    • more accurate spatial analysis
    • distinguish
      • positive clusters:
        • clusters in higher dimensions but not in lower dimensions
      • negative clusters:
        • clusters in lower dimensions but not in higher dimensions
3d boundary based clustering25
3D Boundary-based Clustering
  • Based on AUTOCLUST
  • Uses Delaunay Tetrahedrizations
  • Definitions:
    • ej potential inter-cluster edge if:
3d boundary based clustering26
3D Boundary-based Clustering
  • Phase I
    • For all the piP, classify each edge ej incident to pi into one of three groups
      • ShortEdges(pi) when the length of ej is less than the range in AI(pi)
      • LongEdges(pi) when the length of ej is greater than the range in AI(pi)
      • OtherEdges(pi) when the length of ej is within AI(pi)
    • For all the piP, remove all edges in ShortEdges(pi) and LongEdges(pi)
3d boundary based clustering27
3D Boundary-based Clustering
  • Phase II
    • Recuperate ShortEdges(pi) incident to border points using connected component analysis
  • Phase III
    • Remove exceptionally long edges in local regions
shared nearest neighbour
Shared Nearest Neighbour
  • Clustering in higher dimensions
    • Distances or similarities between points become more uniform, making clustering more difficult
    • Also, similarity between points can be misleading
      • i.e.. a point can be more similar to a point that “actually” belongs to a different cluster
    • Solution
      • Shared nearest neighbor approach to similarity
snn an alternative definition of similarity
SNN: An alternative definition of similarity
  • Euclidian distance
    • most common distance metric used
    • while useful in low dimensions, it doesn’t work well in high dimensions
snn an alternative definition of similarity30
SNN: An alternative definition of similarity
  • Define similarity in terms of their shared nearest neighbours
    • the similarity of the points is “confirmed” by their common shared nearest neighbours
snn an alternative definition of density
SNN: An alternative definition ofdensity
  • SNN similarity, with the k-nearest neighbour approach
    • if the k-nearest neighbour of a point, with respect to SNN similarity is close, then we say that there is a high density at this point
    • since it reflects the local configuration of the points in the data space, it is relatively insensitive to variations in desitiy and the dimensionality of the space
snn algorithm
SNN: Algorithm
  • Compute the similarity matrix
    • corresponds to a similarity graph with data points for nodes and edges whose weights are the similarities between data points
snn algorithm33
SNN: Algorithm
  • Compute the similarity matrix
  • Sparsify the similarity matrix by keeping only the k most similar neighbours
    • corresponds to keeping only the k strongest links of the similarity graph
snn algorithm34
SNN: Algorithm
  • Compute the similarity matrix
  • Sparsify the similarity matrix …
  • Construct the shared nearest neighbour graph from the sparsified similarity matrix
snn algorithm35
SNN: Algorithm
  • Compute the similarity matrix
  • Sparsify the similarity matrix …
  • Construct the shared …
  • Find the SNN density of each point
  • Find the core points
snn algorithm36
SNN: Algorithm
  • Compute the similarity matrix
  • Sparsify the similarity matrix …
  • Construct the shared …
  • Find the SNN density of each point
snn algorithm37
SNN: Algorithm
  • Compute the similarity matrix
  • Sparsify the similarity matrix …
  • Construct the shared …
  • Find the SNN density of each point
  • Form clusters from the core points
snn algorithm38
SNN: Algorithm
  • Compute the similarity matrix
  • Sparsify the similarity matrix …
  • Construct the shared …
  • Find the SNN density of each point
  • Form clusters from the core points
  • Discard all noise points
snn algorithm39
SNN: Algorithm
  • Compute the similarity matrix
  • Sparsify the similarity matrix …
  • Construct the shared …
  • Find the SNN density of each point
  • Form clusters from the core points
  • Discard all noise points
  • Assign al non-noise, non-core points to clusters
shared nearest neighbour40
Shared Nearest Neighbour
  • Finds clusters of varying shapes, sizes, and densities, even in the presence of noise and outliers
  • Handles data of high dimentionality and varying densities
  • Automaticly detects the # of clusters
ad