Independent component analysis
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

Independent Component Analysis PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on
  • Presentation posted in: General

Independent Component Analysis. An Introduction. Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University. Outline. Introduction History, Motivation and Problem Formulation Algorithms Stochastic Gradient Algorithm FastICA Ordering Algorithm Applications

Download Presentation

Independent Component Analysis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Independent component analysis

Independent Component Analysis

An Introduction

Zhen Wei, Li Jin, Yuxue Jin

Department of Statistics

Stanford University


Outline

Outline

  • Introduction

    • History, Motivation and Problem Formulation

  • Algorithms

    • Stochastic Gradient Algorithm

    • FastICA

    • Ordering Algorithm

  • Applications

  • Concluding Remark


Introduction

Introduction

  • There has been a wide discussion about the application of Independence Component Analysis (ICA) in Signal Processing, Neural Computation and Finance, first introduced as a novel tool to separate blind sources in a mixed signal. The Basic idea of ICA is to reconstruct from observation sequences the hypothesized independent original sequences.


Ica versus pca

ICA versus PCA

  • Similarity

    • Feature extraction

    • Dimension reduction

  • Difference

    • PCA uses up to second order moment of the data to produce uncorrelated components

    • ICA strives to generate components as independent as possible


Motivation blind source separation

Motivation - Blind Source Separation

  • Suppose that there are k unknown independent sources

  • A data vector x(t) is observed at each time point t, such that

    where A is a full rank scalar matrix


Blind source separation

Independent

components

Observed

sequences

Recovered independent components

Mixing

process

A

Blind

Source

Blind source separation

De-mixing

process

W


Problem formulation

Problem formulation

  • The goal of ICA is to find a linear mapping W such that the unmixed sequences u

    are maximally statistically independent

  • Find some

    where C is a diagonal matrix and P is a permutation matrix.


Principle of ica nongaussianity

Principle of ICA: Nongaussianity

  • The fundamental restriction in ICA is that the independent components must be nongaussian for ICA to be possible.

  • This is because gaussianity is invariant under orthogonal transformation and hence make the matrix A not identifiable for gaussian independent components.


Measures of nongaussianity 1

Measures of nongaussianity (1)

  • Kurtosis

    • Kurtosis can be very sensitive to outliers, when its value has to be estimate from a measured sample.


Measures of nongaussianity 2

Measures of nongaussianity (2)

  • Negentropy

    • A guassian variable has the largest entropy among all random variables of equal variance.

    • Definition:

      where is entropy

      and ygauss is a gaussian random variable of the same covariance matrix as y


Measures of nongaussianity 3

Measures of nongaussianity (3)

  • Mutual information

    • Definition:

    • Mutual information is a natural measure of the dependence between random variables.

    • It is always non-negative, and zero if and only if the variables are statistically independent.


Relation between negentropy and mutual information

Relation between negentropy and Mutual Information

  • If we constrain yi to be uncorrelated and of unit variance

    where C is a constant that does not depend on W.

  • This shows that finding an invertible transformation W that minimizes the mutual information is equivalent to finding directions in which the negentropy is maximized.


Algorithms

Algorithms

  • Maximum likelihood Bell and Sejnowski (1995)

    • Maximum entropy

    • Minimum mutual information

  • Low-Complexity Coding and Decoding (LOCOCODE ) Sepp Hochreiter et al. (1998)

  • Neuro-mimetic approach


Maximum likelihood

Maximum Likelihood

  • The log-likelihood is:

    where the fi are the density functions of the si

  • Connection to mutual information:

    if the fi were equal to the true distributions of


Stochastic gradient algorithm

Stochastic Gradient Algorithm

  • Initialize the weight matrix W

  • Iteration:

    where is the learning rate, g is a nonlinear function, e.g.

    • Repeat until converges to

  • The ICAs are the components of


Fastica preprocessing

FastICA - Preprocessing

  • Centering:

    • Make the x-s mean 0 variables

  • Whitening

    • Transform the observed vector x linearly so that it has unit variance:

    • One can show that:

      where


Fastica algorithm

FastICA algorithm

  • Initialize the weight matrix W

  • Iteration:

    where

    • Repeat until convergence

  • The ICAs are the components of


Ordering of the icas

Ordering of the ICAs

  • Unlike PCA which has well-defined and intuitive explanation of the ordering of its components, i.e. the eigen values of its covariance matrix, ICA, however, deserves further investigation on this particular problem since a particular kind of ordering is not readily at hand.

  • Follow a heuristic scheme called: testing-and-acceptance (TNA)


Ordering algorithm

Ordering Algorithm


Applications 1

Applications (1)

  • Feature extraction: Recognize the pattern of excess returns of Mutual Funds in the financial market of China

  • Data: the time series of excess returns of four mutual funds in the financial market of China


Ica components

ICA components


Ica reconstruction

ICA reconstruction


Applications 2

Applications (2)

  • Image de-noising

    • ICA

    • Sparse Code Shrinkage

  • The example is exacted from (Hyvarinen, 1999).


Image de noising 1

Image de-noising (1)

  • Suppose a noisy image model holds:

    where n is uncorrelated noise.

    where W is an orthogonal matrix that is the best orthogonal approximation of the inverse of the ICA mixing matrix.


Image de noising 11

Image de-noising (1)

  • Sparse code shrinkage transformation:

    Function g(.) is zero close to the origin and linear after a cutting value depending on the parameters of the Laplacian density and the Gaussian noise density.


Independent component analysis

1. Original image

2. Corrupted with noise

3. Recover by ICA and Sparse Code Shrinkage

3. Recover by classical wiener filtering


Concluding remarks

Concluding Remarks

  • ICA is a very flexible and widely-applicable tool which searches the linear transformation of the observed data into statistically maximally independent components

  • It is also interesting to note that the methods to compute ICA: maximum negentropy, minimum Mutual Information, maximum likelihood are equivalent to each other (at least in the statistical sense). There is also resemblance between the forms of the gradient descent (Newton Raphson) algorithm and the FastICA algorithm.

  • Other application prospects: audio (signal) processing, image processing, telecommunication, Finance, Education


References

References

[1] Amari, S., Cichocki, A., and Yang, H. (1996). A New Learning Algorithm for Blind Signal Separation, Advances in Neural Information Processing Systems 8, pages 757-763.

[2] Bell, A. J. and Sejnowski, T. J. (1995). An Information-Maximization Approach to Blind Separation and Blind Deconvolution. Neural Computation, 7:1129-1159

[3] Cardoso, J. and Soloumiac, A. (1993). Blind beamforming for non-Gaussian signals. IEEE Proceedings-F, 140(46):362-370.

[4] Chatfield, C. (1989). Analysis of Time Series: An Introduction, Fourth Edition. London: Chapman and Hall.


References continued

References continued

[5] Moulines, E., Cardoso, J.-F., and Cassiat, E. (1997). Maximum likelihood for blind separation and deconvolution of noisy signals using mixture models. Proc. ICASSP’ 97, volume 5, pages 3617-3620, Munich.

[6] Nadal, J.-P. and Parga, N. (1997). Redundancy reduction and independent component analysis: Conditions on cumulants and adaptive approaches. Neural Computation, 9:1421-1456.

[7] Xu, L., Cheung, C., Yang, H., and Amari, S. (1997). Maximum equalization by entropy maximization and mixture of cumulative distribution functions. Proc. Of ICNN’97, pages 1821-1826, Houston

[8] Yang, H., Amari, S., and Cichocki, A. (1997). Information back-propagation for blind separation of sources from non-linear mixtures. Proc. of ICNN, pages 2141-2146, Houston


  • Login