Clustering l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 40

Clustering PowerPoint PPT Presentation


  • 118 Views
  • Uploaded on
  • Presentation posted in: General

Clustering. An overview of clustering algorithms Dènis de Keijzer GIA 2004. Overview. Algorithms GRAVIclust AUTOCLUST AUTOCLUST+ 3D Boundary-based Clustering SNN. Gravity based spatial clustering. GRAVIclust Initialisation Phase calculate the initial centre clusters

Download Presentation

Clustering

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Clustering l.jpg

Clustering

An overview of clustering algorithms

Dènis de Keijzer

GIA 2004


Overview l.jpg

Overview

  • Algorithms

    • GRAVIclust

    • AUTOCLUST

    • AUTOCLUST+

    • 3D Boundary-based Clustering

    • SNN


Gravity based spatial clustering l.jpg

Gravity based spatial clustering

  • GRAVIclust

    • Initialisation Phase

      • calculate the initial centre clusters

    • Optimisation Phase

      • improve the position of the cluster centres so as to achieve a solution which minimizes the distance function


Graviclust initialisation phase l.jpg

GRAVIclust: Initialisation Phase

  • Input:

    • set of points P


Graviclust initialisation phase5 l.jpg

GRAVIclust: Initialisation Phase

  • Input:

    • set of points P

    • matrix of distances between all pairs of points

      • assumption: actual access path distance

      • exists in GIS maps

        • e.g.. http://www.transinfo.qld.gov.au

      • very versatile

        • footpath

        • road map

        • rail map


Graviclust initialisation phase6 l.jpg

GRAVIclust: Initialisation Phase

  • Input:

    • set of points P

    • matrix of distances between all pairs of points

    • # of required clusters k


Graviclust initialisation phase7 l.jpg

GRAVIclust: Initialisation Phase

  • Step 1:

    • calculate first initial centre

      • the point with the largest number of points within radius r

      • remove first initial centre & all points within radius r from further consideration

    • Step 2:

      • repeat Step 1 until k initial centres have been chosen

    • Step 3:

      • create initial clusters by assigning all points to the closest cluster centre


Graviclust radius calculation l.jpg

GRAVIclust: radius calculation

  • Radius r

    • calculated based on the area of the region considered for clustering

    • static radius

      • based on the assumption that all clusters are of the same size

    • dynamic radius

      • recalculated after each initial cluster centre is chosen


Graviclust static vs dynamic l.jpg

GRAVIclust: Static vs. Dynamic

  • Static

    • reduced computation

    • # points within a radius r has to be calculated only once

    • not suitable for problems where the points are separated by large empty areas

  • Dynamic

    • increases computation time

    • ensures the radius is adjusted as the points are removed

  • Differs only when distribution is non-uniform


Graviclust optimisation phase l.jpg

GRAVIclust: Optimisation Phase

  • Step 1:

    • for each cluster, calculate new centre

      • based on the the point closest to cluster centre of gravity

  • Step 2:

    • re-assign points to new cluster centres

  • Step 3:

    • recalculate distance function

      • never greater than previous

  • Step 4:

    • repeat Step 1 to 3 until value distance function equals previous


Graviclust l.jpg

GRAVIclust

  • Deterministic

  • Can handle obstacles

  • Monotonic convergence of the distance function to a stable point


Autoclust l.jpg

AUTOCLUST

  • Definitions


Autoclust13 l.jpg

AUTOCLUST

  • Definitions II


Autoclust14 l.jpg

AUTOCLUST

  • Phase 1:

    • finding boundaries

  • Phase 2:

    • restoring and re-attaching

  • Phase 3:

    • detecting second-order inconsistency


Autoclust phase 1 l.jpg

AUTOCLUST: Phase 1

  • Finding boundaries

    • Calculate

      • Delaunay Diagram

      • for each point pi

        • ShortEdges(pi)

        • LongEdges(pi)

        • OtherEdges(pi)

    • Remove

      • ShortEdges(pi) and LongEdges(pi)


Autoclust phase 2 l.jpg

AUTOCLUST: Phase 2

  • Restoring and re-attaching

    • for each point pi where ShortEdges(pi) 

      • Determine a candidate connected component C for pi

        • If there are 2 edges ej = (pi,pj) and ek = (pi,pk) in ShortEdges(pi) with CC[pj] CC[pk], then

          • Compute, for each edge e = (pi,pj)  ShortEdges(pi), the size ||CC[pj]|| and let M = maxe = (pi,pj)  ShortEdges(pi) ||CC[pj]||

          • Let C be the class labels of the largest connected component (if there are two different connected components with cardinality M, we let C be the one with the shortest edge to pi)


Autoclust phase 217 l.jpg

AUTOCLUST: Phase 2

  • Restoring and re-attaching

    • for each point pi where ShortEdges(pi) 

      • Determine a candidate connected component C for pi

        • If …

        • Otherwise, let C be the label of the connected component all edges e  ShortEdges(pi) connect pi to


Autoclust phase 218 l.jpg

AUTOCLUST: Phase 2

  • Restoring and re-attaching

    • for each point pi where ShortEdges(pi) 

      • Determine a candidate connected component C for pi

      • If the edges in OtherEdges(pi) connect to a connected component different than C, remove them. Note that

        • all edges in OtherEdges(pi) are removed, and

        • only in this case, will pi swap connected components

      • Add all edges e  ShortEdges(pi) that connect to C


Autoclust phase 3 l.jpg

AUTOCLUST: Phase 3

  • Detecting second-order inconsistency

    • compute the LocalMean for 2-neighbourhoods

    • remove all edges in N2,G(pi) that are long edges


Autoclust20 l.jpg

AUTOCLUST


Autoclust21 l.jpg

AUTOCLUST

  • No user supplied arguments

    • eliminates expensive human-based exploration time for finding best-fit arguments

  • Robust to noise, outliers, bridges and type of distribution

  • Able to detect clusters with arbitrary shapes, different sizes and different densities

  • Can handle multiple bridges

  • O(n log n)


Autoclust22 l.jpg

AUTOCLUST+

  • Construct Delaunay Diagram

  • Calculate MeanStDev(P)

  • For all edges e, remove e if it intersects some obstacles

  • Apply the 3 phases of AUTOCLUST to the planar graph resulting from the previous steps


3d boundary based clustering l.jpg

3D Boundary-based Clustering

  • Benefits from 3D Clustering

    • more accurate spatial analysis

    • distinguish

      • positive clusters:

        • clusters in higher dimensions but not in lower dimensions


3d boundary based clustering24 l.jpg

3D Boundary-based Clustering

  • Benefits from 3D Clustering

    • more accurate spatial analysis

    • distinguish

      • positive clusters:

        • clusters in higher dimensions but not in lower dimensions

      • negative clusters:

        • clusters in lower dimensions but not in higher dimensions


3d boundary based clustering25 l.jpg

3D Boundary-based Clustering

  • Based on AUTOCLUST

  • Uses Delaunay Tetrahedrizations

  • Definitions:

    • ej potential inter-cluster edge if:


3d boundary based clustering26 l.jpg

3D Boundary-based Clustering

  • Phase I

    • For all the piP, classify each edge ej incident to pi into one of three groups

      • ShortEdges(pi) when the length of ej is less than the range in AI(pi)

      • LongEdges(pi) when the length of ej is greater than the range in AI(pi)

      • OtherEdges(pi) when the length of ej is within AI(pi)

    • For all the piP, remove all edges in ShortEdges(pi) and LongEdges(pi)


3d boundary based clustering27 l.jpg

3D Boundary-based Clustering

  • Phase II

    • Recuperate ShortEdges(pi) incident to border points using connected component analysis

  • Phase III

    • Remove exceptionally long edges in local regions


Shared nearest neighbour l.jpg

Shared Nearest Neighbour

  • Clustering in higher dimensions

    • Distances or similarities between points become more uniform, making clustering more difficult

    • Also, similarity between points can be misleading

      • i.e.. a point can be more similar to a point that “actually” belongs to a different cluster

    • Solution

      • Shared nearest neighbor approach to similarity


Snn an alternative definition of similarity l.jpg

SNN: An alternative definition of similarity

  • Euclidian distance

    • most common distance metric used

    • while useful in low dimensions, it doesn’t work well in high dimensions


Snn an alternative definition of similarity30 l.jpg

SNN: An alternative definition of similarity

  • Define similarity in terms of their shared nearest neighbours

    • the similarity of the points is “confirmed” by their common shared nearest neighbours


Snn an alternative definition of density l.jpg

SNN: An alternative definition ofdensity

  • SNN similarity, with the k-nearest neighbour approach

    • if the k-nearest neighbour of a point, with respect to SNN similarity is close, then we say that there is a high density at this point

    • since it reflects the local configuration of the points in the data space, it is relatively insensitive to variations in desitiy and the dimensionality of the space


Snn algorithm l.jpg

SNN: Algorithm

  • Compute the similarity matrix

    • corresponds to a similarity graph with data points for nodes and edges whose weights are the similarities between data points


Snn algorithm33 l.jpg

SNN: Algorithm

  • Compute the similarity matrix

  • Sparsify the similarity matrix by keeping only the k most similar neighbours

    • corresponds to keeping only the k strongest links of the similarity graph


Snn algorithm34 l.jpg

SNN: Algorithm

  • Compute the similarity matrix

  • Sparsify the similarity matrix …

  • Construct the shared nearest neighbour graph from the sparsified similarity matrix


Snn algorithm35 l.jpg

SNN: Algorithm

  • Compute the similarity matrix

  • Sparsify the similarity matrix …

  • Construct the shared …

  • Find the SNN density of each point

  • Find the core points


Snn algorithm36 l.jpg

SNN: Algorithm

  • Compute the similarity matrix

  • Sparsify the similarity matrix …

  • Construct the shared …

  • Find the SNN density of each point


Snn algorithm37 l.jpg

SNN: Algorithm

  • Compute the similarity matrix

  • Sparsify the similarity matrix …

  • Construct the shared …

  • Find the SNN density of each point

  • Form clusters from the core points


Snn algorithm38 l.jpg

SNN: Algorithm

  • Compute the similarity matrix

  • Sparsify the similarity matrix …

  • Construct the shared …

  • Find the SNN density of each point

  • Form clusters from the core points

  • Discard all noise points


Snn algorithm39 l.jpg

SNN: Algorithm

  • Compute the similarity matrix

  • Sparsify the similarity matrix …

  • Construct the shared …

  • Find the SNN density of each point

  • Form clusters from the core points

  • Discard all noise points

  • Assign al non-noise, non-core points to clusters


Shared nearest neighbour40 l.jpg

Shared Nearest Neighbour

  • Finds clusters of varying shapes, sizes, and densities, even in the presence of noise and outliers

  • Handles data of high dimentionality and varying densities

  • Automaticly detects the # of clusters


  • Login