Loading in 2 Seconds...

Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND

Loading in 2 Seconds...

- By
**sani** - Follow User

- 120 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND' - sani

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Dimensionality

Ilja Sidoroff

Pasi Fränti

Speech and Image Processing UnitDepartment of Computer Science

University of Joensuu, FINLAND

Dimensionality of data

- Dimensionality of data set = the minimum number of free variables needed to represent data without information loss
- An d-attribute data set has an intrinsic dimensionality (ID) of M if its elements lie entirely within an M-dimensional subspace of Rd (M < d)

Dimensionality of data

- The use of more dimensions than necessary leads to problems:
- greater storage requirements
- the speed of algorithms is slower
- finding clusters and creating good classifiers is more difficult (curse of dimensionality)

Curse of dimensionality

- When the dimensionality of space increases, distance measures become less useful
- all points are more or less equidistant
- most of the volume of a sphere is concentrated on a thin layer near the surface of the sphere (eg. next slide)

V(r) – volume of sphere with radius r

D – dimension of the sphere

Two approaches

- Estimation of dimensionality
- knowing ID of data set could help in tuning classification or clustering performance
- Dimensionality reduction
- projecting data to some subspace
- eg. 2D/3D visualisation of multi-dimensional data set
- may result in information loss if the subspace dimension is smaller than ID

Goodness of the projection

Can be estimated by two measures:

- Trustworthiness: data points that are not neighbours in input space are not mapped as neighbours in output space.
- Continuity: data points that are close are not mapped far away in output space [11].

Trustworthiness

- N - number of feature vectors
- r(i,j) – the rank of data sample j in the ordering according to the distance from i in the original data space
- Uk(i) – set of feature vectors that are in the size k-neighbourhood of sample i in the projection space but not in the original space
- A(k) – Scales the measure between 0 and 1

Continuity

- r'(i,j) – the rank of data sample j in the ordering according to the distance from i in the projection space
- Vk(i) – set of feature vectors that are in the size k-neighbourhood of sample i in the original space but not in the projection space

Example data sets

- Swiss roll: 20000 3D points
- 2D manifold in 3D space
- http://isomap.stanford.edu

Example data sets

- 16 16 pixel images of hands in different positions
- Each image can be considered as 4096-dimensional data element
- Could also be interpreted in terms of finger extension – wrist rotation (2D)

Example data sets

http://isomap.stanford.edu

Principal component analysis (PCA)

- Idea: find directions of maximal variance and align coordinate axis to them.
- If variance is zero, that dimension is not needed.
- Drawback: works well only with linear data [1]

PCA method (1/2)

- Center data so that its means are zero
- Calculate covariance matrix for data
- Calculate eigenvalues and eigenvectors of the covariance matrix
- Arrange eigenvectors according to the eigenvalues
- For dimensionality reduction, choose the desired number of eigenvectors (2 or 3 for visualization)

PCA Method

- Intrinsic dimensionality = number of non-zero eigenvalues
- Dimensionality reduction by projection: yi = Axi
- Here xi is the input vector, yi the output vector, and A is the matrix containing eigenvectors corresponding to the largest eigenvalues.
- For visualization typically 2 or 3 eigenvalues preserved.

Example of PCA

- The distances between points are different in projections.
- Test set c:
- two clusters are projected into one cluster
- s-shaped cluster is projected nicely

Another example of PCA [10]

- Data set: point lying on circle: (x2 + y2 = 1), ID = 2
- PCA yield two non-null eigenvalues
- u, v – principal components

Limitations of PCA

- Since eigenvectors are orthogonal works well only with linear data
- Tends to overestimate ID
- Kernel PCA uses so called kernel trick to apply PCA also to non linear data
- make non linear projection into a higher dimensional space, perform PCA analysis in this space

Multidimensional scaling method (MDS)

- Project data into a new space while trying to preserve distances between data points
- Define stress E (difference of pairwise distances in original and projection spaces)
- E is minimized using some optimization algorithm
- With certain stress functions (i.e. Kruskal) when E is 0, perfect projection exists
- ID of the data is the smallest projection dimension where perfect projection exists

Metric MDS

The simplest stress function [2], raw stress:

d(xi, xj)distance in the original space

d(yi, yj)distance in the projection space

yi, yj representation of xi, xj in output space

Sammon's Mapping

- Sammon's mapping gives small distances a larger weight [5]:

Kruskal's stress

- Ranking the point distances accounts for decreasing distances in lower dimensional projections:

MDS example

- Separates clusters better than PCA
- Local structures are not always preserved (leftmost test set)

Other MDS approaches

- ISOMAP [12]
- Curvilinear component analysis CCA [13]

Local methods

- Previous methods are global in the sense that the all input data is considered at once.
- Local methods consider only some neighbourhood of data points may be computationally less demanding
- Try to estimate topological dimension of the data manifold

Fukunaga-Olsen algorithm [6]

- Assume that data can be divided into small regions, i.e. clustered
- Each cluster (voronoi set) of the data vector lies in an approximately linear surface => PCA method can be applied to each cluster
- Eigenvalues are normalized by diving by the largest eigenvalue

Fukunaga-Olsen algorithm

- ID is defined as the number of normalized eigenvalues that are larger than a threshold T
- Defining a good threshold is a problem as such

Near neighbour algorithm

- Trunk's method [7]:
- An initial value for an integer parameter k is chosen (usually k=1).
- k nearest neighbours for each data vector are identified.
- for each data vector i, subspace spanned by vectors from i to each of its k neighbours is constructed.

Near neighbour algorithm

- The angle between (k+1)th near neighbour and its projection to the subspace is calculated for each data vector
- If the average of these angles is below a threshold, ID is k, otherwise increase k and repeat the process

angle

subspace

Near neighbour algorithm

- It is not clear how to select suitable value for threshold
- Improvements to Trunk's method
- Pettis et al. [8]
- Verver-Duin [9]

Fractal methods

- Global methods, but different definition of dimensionality
- Basic idea:
- count the observations inside a ball of radius r (f(r)).
- analyse the growth rate of f(r)
- if f grows as rkthe dimensionality of data can be considered as k

Fractal methods

- Dimensionality can be fractional, i.e. 1.5
- So does not provide projections for lesser dimensional space (what is an R1,5anyway?)
- Fractal dimensionality estimate can be used in time-series analysis etc. [10]

Fractal methods

- Different definitions for fractal dimensions [10]
- Hausdorff dimension
- Box-counting dimension
- Correlation dimension
- In order to get an accurate estimate of the dimension D, the data set cardinality must be at least 10D/2

Hausdorff dimension

- data set is covered by cells siwith variable diameter ri, all ri < r
- in other words, we look for collection of covering sets siwith diameter less than or equal to r, which minimizes the sum
- d-dimensional Hausdorff measure:

Hausdorff dimension

- For every data set ΓdH is infinite if d is less than some critical value DH, and 0 if d is greater than DH
- The critical value DH is the Hausdorff dimension of the data set

Box-Counting dimension

- Hausdorff dimension is not easy to calculate
- Box-Counting DB dimension is an upper bound of Hausdorff dimension, does not usually differ from it:

v(r) – is the number of the boxes of size r needed to cover the data set

Box-Counting dimension

- Although Box-Counting dimension is easier to calculate than Hausdorff dimension, the algorithmic complexity grows exponentially with the set dimensionality => can be used only for low-dimensional data sets
- Correlation dimension is computationally more feasible fractal dimension measure
- Correlation dimension is an lower bound of the Box-Counting dimension

Correlation dimension

- Let x1, x2, x3, ... , xNbe data points
- Correlation integral can be defined as:

I(x) is indicator function:

I(x) = 1, iff x istrue,

I(x) = 0, otherwise.

Correlation dimension

(some explanation needed!!!)

Literature

- M. Kirby, Geometric Data Analysis: An Empirical Approach to Dimensionality Reduction and the Study of Patterns, John Wiley and Sons, 2001.
- J. B. Kruskal, Multidimensional scaling by optimizing goodness of ﬁt to a nonmetric hypothesis, Psychometrika 29 (1964) 1–27.
- R. N. Shepard, The analysis of proximities: Multimensional scaling with an unknown distance function, Psychometrika 27 (1962) 125–140.
- R. S. Bennett, The intrinsic dimensionality of signal collections, IEEE Transactions on Information Theory 15 (1969) 517–525.
- J. W. J. Sammon, A nonlinear mapping for data structure analysis, IEEE Transaction on Computers C-18 (1969) 401–409.
- K. Fukunaga, D. R. Olsen, An algorithm for ﬁnding intrinsic dimensionality of data, IEEE Transactions on Computers 20 (2) (1976) 165–171.
- G. V. Trunk, Statistical estimation of the intrinsic dimensionality of a noisy signal collection, IEEE Transaction on Computers 25 (1976) 165–171.

- K. Pettis, T. Bailey, T. Jain, R. Dubes, An intrinsic dimensionality estimator from near-neighbor information, IEEE Transaction on Pattern Analysis and Machine Intelligence 1 (1) (1979) 25–37.
- P. J. Verveer, R. Duin, An evaluation of intrinsic dimensionality estimators, IEEE Transaction on Pattern Analysis and Machine Intelligence 17 (1) (1995) 81–86.
- F. Camastra, Data dimensionality estimation methods: a survey, Pattern Recognition 36 (2003) 2945-2954.
- J. Venna, Dimensionality reduction for visual exploration of similarity structures (2007), PhD thesis manuscript (submitted)
- J. B. Tenenbaum, V. de Silva, J. C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290 (12) (2000) 2319–2323.
- P. Demartines, J. Herault, Curvilinear component analysis: A self-organizing neural network for nonlinear mapping in cluster analysis, IEEE Transactions on Neural Networks 8 (1) (1997) 148–154.

Download Presentation

Connecting to Server..