Independent Component Analysis

1 / 43

# Independent Component Analysis - PowerPoint PPT Presentation

Independent Component Analysis. NiC fMRI-methodology Journal Club 05 Dec. 2008. There will be math â€¦. Overview. Theory PCA ICA Application Data reduction Spatial vs. Temporal ICA Group-ICA. Non-fMRI example. The cocktail-party problem Many people are speaking in a room

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Independent Component Analysis' - Sophia

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Independent Component Analysis

NiC fMRI-methodology Journal Club

05 Dec. 2008

There will be math …

Overview
• Theory
• PCA
• ICA
• Application
• Data reduction
• Spatial vs. Temporal ICA
• Group-ICA
Non-fMRI example
• The cocktail-party problem
• Many people are speaking in a room
• Given a number of microphonerecordings from different locations,can the various sourcesbe reconstructed?
Non-fMRI example

Sources

• Signals are collected in row-vectors
• r1 = 0.8∙s1+0.6∙s2
• r2 = 0.2∙s1+0.5∙s2

Recordings

Non-fMRI example
• Assumptions
• Recordings consist of linearly transformed sources (‘weighted averages’)
• Original sources are completely unrelated
• Uncorrelated
• Independent
• Then
• Sources can be estimated by linearly (back-)transforming the recordings
• The optimal transformation makes the source estimates ‘as unrelated as possible’
• Minimize correlations
• Maximize independence
• In fMRI
• Sources of interest are the various ‘brain systems’
• Recordings are the acquisition volumes and/or voxel time courses

### Theory

PCA

ICA

PCA
• Recordings typically are not uncorrelated
• Goal: find a transformation matrix V such that transformed signals Y=V∙R are uncorrelated
• If the transformed signals are properly normalized (|yi|=1)
PCA
• Q: how do we find V such that(note that R∙RT equals the covariance matrix)
• A: use the singular value decomposition (SVD)such thatresulting in a solution
PCA
• In pictures, PCA performes ‘sphering’
• Fit the data cloud with an ellipsoidal function
• Rotate the ellipsoid such that its axes coincide with the coordinate axes
• Squeeze/stretch the ellipsoid to make it spherical

Recordings

Rotation

Squeezing/stretching

r2

x2

y2

r1

x1

y1

By means of a linear transformation of the recordings r1 and r2, we found signals y1 and y2 that are normalized and completely uncorrelated

Yet, these signals do not at all correspond with the original sources s1 and s2, and are obviously still mixtures

However, any rotation will preserve uncorrelatedness

So, how the determine the ‘best’ rotation?

PCA

y2

y1

ICA
• Statistical independence
• Knowledge of the value of y1 for a sample does not affect the statistical distribution of the values of y2
• Equivalently, the joint probability distribution is the product of the marginal probability ditributions
• As a result, loosely speaking, all non-gaussian ‘features’ of the probability distribution are either caused by P(y1) or P(y2), but not by both, and therefore lie parallel with the coordinate axes
ICA
• Independence implies uncorrelatedness(But the reverse is not true!)
• Therefore, maximally independent signals are also likely approximately uncorrelated
• This suggests performing PCA first, to decorrelate the signals, and determining a suitable rotation next that optimizes independence but (automatically) preserves uncorrelatedness

No penalty if y1=0 or y2=0 (i.e., for points on any axis)

Positive penalty for points in any of the four quadrants

Increasing penalty for points that are further away from the axes

ICA

y2

y1

The solution is determined up to

Component order

Component magnitude

Component sign

ICA

z2

z1

ICA
• Average penalty P
• Rotations preserve the euclidean distance to the origin
• It follows that minimizing P is equivalent to maximizing K
• K is closely related to kurtosis (‘peakiness’)
ICA
• Central limit theorem
• A mixture of a sufficiently large number of independent random variables (each with finite mean and variance) will be approximately normally distributed
• This suggests that the original sources are more non-gaussian than the mixtures
• Maximum non-gaussianity is the criterion to use to determine the optimal rotation!
• This cannot be successful if sources are normally distributed
• QuartiMax employs kurtosis as a measure of non-gaussianity
ICA
• If some sources are platykurtotic, the method may fail
• Therefore, maximize
ICA
• Due to the fourth power, the method is sensitive to outliers
• Therefore, choose any other function G
• G(y) = y4
• G(y) = log(cosh(y))
• G(y) = 1-exp(-y2)
• FastICA [Hyvärinen]
ICA
• Some non-gaussian distributions will happen to have K=0
• Therefore, use mutual information expressed in terms of negentropy
• InfoMax [Bell & Sejnovski]
ICA
• Entropy-calculations require approximation of the probability density function P itself
• Therefore, expand negentropy in terms of lower-order cumulants (generalized variance/skewness/kurtosis/…)
ICA

MatLab code

% Generate signals

A = [0.8,0.6;0.2,0.5];

R = A*S;

clear S,A;

% PCA

[UL,LAMBDA,UR] = svd(R,'econ');

V = inv(LAMBDA)*UL';

Y = V*R;

% ICA

Z = rotatefactors(Y','Method','quartimax')';

Z = icatb_fastICA(Y,'approach','symm','g','tanh');

[d1,d2,Z] = icatb_runica(Y);

In practice, all mentioned algorithms have been reported to perform satisfactorily

QuartiMax

FastICA

InfoMax

ICA

QuartiMax

InfoMax

FastICA

In practice, all mentioned algorithms have been reported to perform satisfactorily

QuartiMax

FastICA

InfoMax

ICA

QuartiMax

InfoMax

FastICA

### Application

Data reduction

Spatial vs. Temporal ICA

Group-ICA

Data reduction
• fMRI-data are gathered in a matrix Y

Y

Data reduction
• fMRI-data are gathered in a matrix Y
• Data are decomposed into principal components by means of SVD

time

comp

Y

Ux

comp

time

Λ

Ut

=∙∙

voxels

voxels

comp

comp

Data reduction
• fMRI-data are gathered in a matrix Y
• Data are decomposed into principal components by means of SVD
• Only the first few strongest components are retained

time

comp

Y

Ux

comp

time

Λ

Ut

=∙∙

voxels

voxels

comp

comp

λ1× ×

λ2× ×

λ3× ×

Data reduction
• fMRI-data are gathered in a matrix Y
• Data are decomposed into principal components by means of SVD
• Only the first few strongest components are retained
• Each component is the product of
• a coefficient
• a spatial map
• a time course

residuals

Y

+

Temporal vs. Spatial ICA

Temporal ICA

• PCA decomposition results in
• Uncorrelated spatial maps

and

• Uncorrelated time courses
• ICA rotation results in
• Maximally independent time courses: tICA

or

• Maximally independent spatial maps: sICA
• Some methods employ criteria in both domains

Acquisitions

Voxel j

Component

map

Voxel i

Spatial ICA

Voxels

Acquisition j

Component

time course

Acquisition i

Temporal vs. Spatial ICA
• PCA
• tICA
• sICA

Y

Ux

= ∙ ∙

St

Ut

Λ

Y

Ux

Vx

= ∙ ∙ ∙ =∙

Λ

A

St

St

Y

Sx

Sx

= ∙ ∙ ∙ =∙

A

Λ

Ut

St

St

Vt

Temporal vs. Spatial ICA
• Temporal ICA
• Components have independent temporal dynamics:“Strength of one component at a particular moment in time does not provide information on the strength of other components at that moment”
• Components may be correlated/dependent in space
• Popular for cocktail party problem
• Spatial ICA
• Components have independent spatial distributions:“Strength of one component in particular voxel does not provide information on the strength of other components in that voxel”
• Components may be correlated/dependent in time
• Popular for fMRI
Temporal vs. Spatial ICA

Calhoun et al., 2001

λi× ×

Group-ICA
• Some problems at the subject level become even more pressing at the group level
• Independent components have no natural interpretation
• Independent components have no meaningful order
• The magnitude of independent component maps and time courses is undetermined
• The sign of independent component maps and time courses is arbitrary
• For group level analyses, some form of ‘matching’ is required
• Assumptions
• Equal coefficients across subjects?
• Equal distributions across subjects?
• Equal dynamics across subjects?
Group-ICA
• Method I: Averaging
• E.g.: Schmithorst et al., 2004
• Principle
• Average the data sets of all subjects before ICA
• Perform ICA on the mean data
• Key points
• All subjects are assumed to have identical components
• Equal coefficients > homogeneous population
• Equal distributions > comparable brain organization
• Equal dynamics > fixed paradigm; resting state impossible
• Statistical assessment at group level
• Enter ICA time courses into linear regression model (back-projection)
Group-ICA
• Method II: Tensor-ICA
• E.g.: Beckmann et al., 2005
• Principle
• Stack subjects’ data matrices along a third dimension
• Decompose data tensor into product of (acquisition-dependent) time course, (voxel-dependent) maps, and (subject-dependent) loadings
• Key points
• Components may differ only in strength
• Unequal coefficients > inhomogeneous population
• Equal distributions > comparable brain organization
• Equal dynamics > fixed paradigm; resting state impossible
• Statistical assessment at group level
• Components as a whole
Group-ICA
• Method III: Spatial concatenation
• E.g.: Svensén et al., 2002
• Principle
• Concatenate subjects’ data matrices along the spatial dimension
• Perform ICA on aggregate data matrix
• Partition resulting components into individual maps
• Key points
• Components may differ in strength and distribution
• Unequal coefficients > inhomogeneous population
• Unequal distributions > brain plasticity
• Equal dynamics > fixed paradigm; resting state impossible
• Statistical assessment at group level
• Voxel-by-voxel SPMs
Group-ICA
• Method IV: Temporal concatenation
• E.g.: Calhoun et al., 2001
• Principle
• Concatenate subjects’ data matrices along the time dimension
• Perform ICA on aggregate data matrix
• Partition resulting components into individual time courses
• Key points
• Components may differ in strength and dynamics
• Unequal coefficients > inhomogeneous population
• Equal distributions > comparable brain organization
• Unequal dynamics > flexible paradigm
• Statistical assessment at group level
• Components as a whole, from time course spectrum/power
• Voxel-by-voxel SPMs, from back-projection (careful with statistics!)
Group-ICA
• Method V: Retrospective matching
• Subjective or (semi)automatic matching on basis of similarity between distribution maps and/or time courses
• Also used to test the reproducibility of some stochastic ICA algorithms
• Various principles, various authors
• Principle
• Perform ICA on individual subjects
• Match similar individual components one-on-one across subjects
• Key points
• Components may differ in strength and dynamics
• Unequal coefficients > inhomogeneous population
• Unequal distributions > brain plasticity
• Unequal dynamics > flexible paradigm
• Statistical assessment at group level
• Voxel-by-voxel SPMs (careful with scaling and bias!)
Conclusion
• ICA has the major advantage that it requires minimal assumptions or prior knowledge
• However, interpretation of the meaning of components occurs retrospectively and may be ambiguous
• Unfortunately, methods and statistics are not fully characterized yet and still quite heavily under development
• Therefore - IMHO – independent component analysis is an excellent tool for exploratory experiments, but should not be your first choice for confirmatory studies