slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND PowerPoint Presentation
Download Presentation
Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND

Loading in 2 Seconds...

play fullscreen
1 / 43

Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND - PowerPoint PPT Presentation


  • 120 Views
  • Uploaded on

Clustering Methods: Part 6. Dimensionality. Ilja Sidoroff Pasi Fränti. Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND. Dimensionality of data.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND' - sani


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Clustering Methods: Part 6

Dimensionality

Ilja Sidoroff

Pasi Fränti

Speech and Image Processing UnitDepartment of Computer Science

University of Joensuu, FINLAND

dimensionality of data
Dimensionality of data
  • Dimensionality of data set = the minimum number of free variables needed to represent data without information loss
  • An d-attribute data set has an intrinsic dimensionality (ID) of M if its elements lie entirely within an M-dimensional subspace of Rd (M < d)
dimensionality of data1
Dimensionality of data
  • The use of more dimensions than necessary leads to problems:
    • greater storage requirements
    • the speed of algorithms is slower
    • finding clusters and creating good classifiers is more difficult (curse of dimensionality)
curse of dimensionality
Curse of dimensionality
  • When the dimensionality of space increases, distance measures become less useful
    • all points are more or less equidistant
    • most of the volume of a sphere is concentrated on a thin layer near the surface of the sphere (eg. next slide)
slide5

V(r) – volume of sphere with radius r

D – dimension of the sphere

two approaches
Two approaches
  • Estimation of dimensionality
    • knowing ID of data set could help in tuning classification or clustering performance
  • Dimensionality reduction
    • projecting data to some subspace
    • eg. 2D/3D visualisation of multi-dimensional data set
    • may result in information loss if the subspace dimension is smaller than ID
goodness of the projection
Goodness of the projection

Can be estimated by two measures:

  • Trustworthiness: data points that are not neighbours in input space are not mapped as neighbours in output space.
  • Continuity: data points that are close are not mapped far away in output space [11].
trustworthiness
Trustworthiness
  • N - number of feature vectors
  • r(i,j) – the rank of data sample j in the ordering according to the distance from i in the original data space
  • Uk(i) – set of feature vectors that are in the size k-neighbourhood of sample i in the projection space but not in the original space
  • A(k) – Scales the measure between 0 and 1
continuity
Continuity
  • r'(i,j) – the rank of data sample j in the ordering according to the distance from i in the projection space
  • Vk(i) – set of feature vectors that are in the size k-neighbourhood of sample i in the original space but not in the projection space
example data sets
Example data sets
  • Swiss roll: 20000 3D points
  • 2D manifold in 3D space
  • http://isomap.stanford.edu
example data sets1
Example data sets
  • 16  16 pixel images of hands in different positions
  • Each image can be considered as 4096-dimensional data element
  • Could also be interpreted in terms of finger extension – wrist rotation (2D)
example data sets2
Example data sets

http://isomap.stanford.edu

synthetic data sets 11
Synthetic data sets [11]

Sphere

S-shaped manifold

Six clusters

principal component analysis pca
Principal component analysis (PCA)
  • Idea: find directions of maximal variance and align coordinate axis to them.
  • If variance is zero, that dimension is not needed.
  • Drawback: works well only with linear data [1]
pca method 1 2
PCA method (1/2)
  • Center data so that its means are zero
  • Calculate covariance matrix for data
  • Calculate eigenvalues and eigenvectors of the covariance matrix
  • Arrange eigenvectors according to the eigenvalues
  • For dimensionality reduction, choose the desired number of eigenvectors (2 or 3 for visualization)
pca method
PCA Method
  • Intrinsic dimensionality = number of non-zero eigenvalues
  • Dimensionality reduction by projection: yi = Axi
  • Here xi is the input vector, yi the output vector, and A is the matrix containing eigenvectors corresponding to the largest eigenvalues.
  • For visualization typically 2 or 3 eigenvalues preserved.
example of pca
Example of PCA
  • The distances between points are different in projections.
  • Test set c:
    • two clusters are projected into one cluster
    • s-shaped cluster is projected nicely
another example of pca 10
Another example of PCA [10]
  • Data set: point lying on circle: (x2 + y2 = 1), ID = 2
  • PCA yield two non-null eigenvalues
  • u, v – principal components
limitations of pca
Limitations of PCA
  • Since eigenvectors are orthogonal works well only with linear data
  • Tends to overestimate ID
  • Kernel PCA uses so called kernel trick to apply PCA also to non linear data
    • make non linear projection into a higher dimensional space, perform PCA analysis in this space
multidimensional scaling method mds
Multidimensional scaling method (MDS)
  • Project data into a new space while trying to preserve distances between data points
  • Define stress E (difference of pairwise distances in original and projection spaces)
  • E is minimized using some optimization algorithm
  • With certain stress functions (i.e. Kruskal) when E is 0, perfect projection exists
  • ID of the data is the smallest projection dimension where perfect projection exists
metric mds
Metric MDS

The simplest stress function [2], raw stress:

d(xi, xj)distance in the original space

d(yi, yj)distance in the projection space

yi, yj representation of xi, xj in output space

sammon s mapping
Sammon's Mapping
  • Sammon's mapping gives small distances a larger weight [5]:
kruskal s stress
Kruskal's stress
  • Ranking the point distances accounts for decreasing distances in lower dimensional projections:
mds example
MDS example
  • Separates clusters better than PCA
  • Local structures are not always preserved (leftmost test set)
other mds approaches
Other MDS approaches
  • ISOMAP [12]
  • Curvilinear component analysis CCA [13]
local methods
Local methods
  • Previous methods are global in the sense that the all input data is considered at once.
  • Local methods consider only some neighbourhood of data points  may be computationally less demanding
  • Try to estimate topological dimension of the data manifold
fukunaga olsen algorithm 6
Fukunaga-Olsen algorithm [6]
  • Assume that data can be divided into small regions, i.e. clustered
  • Each cluster (voronoi set) of the data vector lies in an approximately linear surface => PCA method can be applied to each cluster
  • Eigenvalues are normalized by diving by the largest eigenvalue
fukunaga olsen algorithm
Fukunaga-Olsen algorithm
  • ID is defined as the number of normalized eigenvalues that are larger than a threshold T
  • Defining a good threshold is a problem as such
near neighbour algorithm
Near neighbour algorithm
  • Trunk's method [7]:
    • An initial value for an integer parameter k is chosen (usually k=1).
    • k nearest neighbours for each data vector are identified.
    • for each data vector i, subspace spanned by vectors from i to each of its k neighbours is constructed.
near neighbour algorithm1

(k+1)th-neighbour

Near neighbour algorithm
  • The angle between (k+1)th near neighbour and its projection to the subspace is calculated for each data vector
  • If the average of these angles is below a threshold, ID is k, otherwise increase k and repeat the process

angle

subspace

near neighbour algorithm2
Near neighbour algorithm
  • It is not clear how to select suitable value for threshold
  • Improvements to Trunk's method
    • Pettis et al. [8]
    • Verver-Duin [9]
fractal methods
Fractal methods
  • Global methods, but different definition of dimensionality
  • Basic idea:
    • count the observations inside a ball of radius r (f(r)).
    • analyse the growth rate of f(r)
    • if f grows as rkthe dimensionality of data can be considered as k
fractal methods1
Fractal methods
  • Dimensionality can be fractional, i.e. 1.5
  • So does not provide projections for lesser dimensional space (what is an R1,5anyway?)
  • Fractal dimensionality estimate can be used in time-series analysis etc. [10]
fractal methods2
Fractal methods
  • Different definitions for fractal dimensions [10]
    • Hausdorff dimension
    • Box-counting dimension
    • Correlation dimension
  • In order to get an accurate estimate of the dimension D, the data set cardinality must be at least 10D/2
hausdorff dimension
Hausdorff dimension
  • data set is covered by cells siwith variable diameter ri, all ri < r
  • in other words, we look for collection of covering sets siwith diameter less than or equal to r, which minimizes the sum
  • d-dimensional Hausdorff measure:
hausdorff dimension1
Hausdorff dimension
  • For every data set ΓdH is infinite if d is less than some critical value DH, and 0 if d is greater than DH
  • The critical value DH is the Hausdorff dimension of the data set
box counting dimension
Box-Counting dimension
  • Hausdorff dimension is not easy to calculate
  • Box-Counting DB dimension is an upper bound of Hausdorff dimension, does not usually differ from it:

v(r) – is the number of the boxes of size r needed to cover the data set

box counting dimension1
Box-Counting dimension
  • Although Box-Counting dimension is easier to calculate than Hausdorff dimension, the algorithmic complexity grows exponentially with the set dimensionality => can be used only for low-dimensional data sets
  • Correlation dimension is computationally more feasible fractal dimension measure
  • Correlation dimension is an lower bound of the Box-Counting dimension
correlation dimension
Correlation dimension
  • Let x1, x2, x3, ... , xNbe data points
  • Correlation integral can be defined as:

I(x) is indicator function:

I(x) = 1, iff x istrue,

I(x) = 0, otherwise.

correlation dimension1
Correlation dimension

(some explanation needed!!!)

literature
Literature
  • M. Kirby, Geometric Data Analysis: An Empirical Approach to Dimensionality Reduction and the Study of Patterns, John Wiley and Sons, 2001.
  • J. B. Kruskal, Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis, Psychometrika 29 (1964) 1–27.
  • R. N. Shepard, The analysis of proximities: Multimensional scaling with an unknown distance function, Psychometrika 27 (1962) 125–140.
  • R. S. Bennett, The intrinsic dimensionality of signal collections, IEEE Transactions on Information Theory 15 (1969) 517–525.
  • J. W. J. Sammon, A nonlinear mapping for data structure analysis, IEEE Transaction on Computers C-18 (1969) 401–409.
  • K. Fukunaga, D. R. Olsen, An algorithm for finding intrinsic dimensionality of data, IEEE Transactions on Computers 20 (2) (1976) 165–171.
  • G. V. Trunk, Statistical estimation of the intrinsic dimensionality of a noisy signal collection, IEEE Transaction on Computers 25 (1976) 165–171.
slide43

Literature

  • K. Pettis, T. Bailey, T. Jain, R. Dubes, An intrinsic dimensionality estimator from near-neighbor information, IEEE Transaction on Pattern Analysis and Machine Intelligence 1 (1) (1979) 25–37.
  • P. J. Verveer, R. Duin, An evaluation of intrinsic dimensionality estimators, IEEE Transaction on Pattern Analysis and Machine Intelligence 17 (1) (1995) 81–86.
  • F. Camastra, Data dimensionality estimation methods: a survey, Pattern Recognition 36 (2003) 2945-2954.
  • J. Venna, Dimensionality reduction for visual exploration of similarity structures (2007), PhD thesis manuscript (submitted)
  • J. B. Tenenbaum, V. de Silva, J. C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290 (12) (2000) 2319–2323.
  • P. Demartines, J. Herault, Curvilinear component analysis: A self-organizing neural network for nonlinear mapping in cluster analysis, IEEE Transactions on Neural Networks 8 (1) (1997) 148–154.