slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Independent Component Analysis PowerPoint Presentation
Download Presentation
Independent Component Analysis

Loading in 2 Seconds...

play fullscreen
1 / 43

Independent Component Analysis - PowerPoint PPT Presentation


  • 547 Views
  • Uploaded on

Independent Component Analysis. NiC fMRI-methodology Journal Club 05 Dec. 2008. There will be math …. Overview. Theory PCA ICA Application Data reduction Spatial vs. Temporal ICA Group-ICA. Non-fMRI example. The cocktail-party problem Many people are speaking in a room

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Independent Component Analysis' - Sophia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
independent component analysis

Independent Component Analysis

NiC fMRI-methodology Journal Club

05 Dec. 2008

overview

There will be math …

Overview
  • Theory
    • PCA
    • ICA
  • Application
    • Data reduction
    • Spatial vs. Temporal ICA
    • Group-ICA
non fmri example
Non-fMRI example
  • The cocktail-party problem
    • Many people are speaking in a room
    • Given a number of microphonerecordings from different locations,can the various sourcesbe reconstructed?
non fmri example5
Non-fMRI example

Sources

  • Signals are collected in row-vectors
    • r1 = 0.8∙s1+0.6∙s2
    • r2 = 0.2∙s1+0.5∙s2

Recordings

non fmri example7
Non-fMRI example
  • Assumptions
    • Recordings consist of linearly transformed sources (‘weighted averages’)
    • Original sources are completely unrelated
      • Uncorrelated
      • Independent
  • Then
    • Sources can be estimated by linearly (back-)transforming the recordings
    • The optimal transformation makes the source estimates ‘as unrelated as possible’
      • Minimize correlations
      • Maximize independence
  • In fMRI
    • Sources of interest are the various ‘brain systems’
    • Recordings are the acquisition volumes and/or voxel time courses
theory

Theory

PCA

ICA

slide10
PCA
  • Recordings typically are not uncorrelated
  • Goal: find a transformation matrix V such that transformed signals Y=V∙R are uncorrelated
  • If the transformed signals are properly normalized (|yi|=1)
slide11
PCA
  • Q: how do we find V such that(note that R∙RT equals the covariance matrix)
  • A: use the singular value decomposition (SVD)such thatresulting in a solution
slide12
PCA
  • In pictures, PCA performes ‘sphering’
    • Fit the data cloud with an ellipsoidal function
    • Rotate the ellipsoid such that its axes coincide with the coordinate axes
    • Squeeze/stretch the ellipsoid to make it spherical

Recordings

Rotation

Squeezing/stretching

r2

x2

y2

r1

x1

y1

slide13
By means of a linear transformation of the recordings r1 and r2, we found signals y1 and y2 that are normalized and completely uncorrelated

Yet, these signals do not at all correspond with the original sources s1 and s2, and are obviously still mixtures

However, any rotation will preserve uncorrelatedness

So, how the determine the ‘best’ rotation?

PCA

y2

y1

slide14
ICA
  • Statistical independence
    • Knowledge of the value of y1 for a sample does not affect the statistical distribution of the values of y2
    • Equivalently, the joint probability distribution is the product of the marginal probability ditributions
    • As a result, loosely speaking, all non-gaussian ‘features’ of the probability distribution are either caused by P(y1) or P(y2), but not by both, and therefore lie parallel with the coordinate axes
slide15
ICA
  • Independence implies uncorrelatedness(But the reverse is not true!)
  • Therefore, maximally independent signals are also likely approximately uncorrelated
  • This suggests performing PCA first, to decorrelate the signals, and determining a suitable rotation next that optimizes independence but (automatically) preserves uncorrelatedness
slide16
A simple idea:Penalize all data points with a penalty P=(y12∙y22)

No penalty if y1=0 or y2=0 (i.e., for points on any axis)

Positive penalty for points in any of the four quadrants

Increasing penalty for points that are further away from the axes

ICA

y2

y1

slide17
Minimize the penalty over a search space that consists of all rotations

The solution is determined up to

Component order

Component magnitude

Component sign

ICA

z2

z1

slide18
ICA
  • Average penalty P
  • Rotations preserve the euclidean distance to the origin
  • It follows that minimizing P is equivalent to maximizing K
  • K is closely related to kurtosis (‘peakiness’)
slide19
ICA
  • Central limit theorem
    • A mixture of a sufficiently large number of independent random variables (each with finite mean and variance) will be approximately normally distributed
  • This suggests that the original sources are more non-gaussian than the mixtures
    • Maximum non-gaussianity is the criterion to use to determine the optimal rotation!
    • This cannot be successful if sources are normally distributed
    • QuartiMax employs kurtosis as a measure of non-gaussianity
slide20
ICA
  • If some sources are platykurtotic, the method may fail
  • Therefore, maximize
slide21
ICA
  • Due to the fourth power, the method is sensitive to outliers
  • Therefore, choose any other function G
    • G(y) = y4
    • G(y) = log(cosh(y))
    • G(y) = 1-exp(-y2)
  • FastICA [Hyvärinen]
slide22
ICA
  • Some non-gaussian distributions will happen to have K=0
  • Therefore, use mutual information expressed in terms of negentropy
  • InfoMax [Bell & Sejnovski]
slide23
ICA
  • Entropy-calculations require approximation of the probability density function P itself
  • Therefore, expand negentropy in terms of lower-order cumulants (generalized variance/skewness/kurtosis/…)
  • JADE [Cardoso]
slide24
ICA

MatLab code

% Generate signals

S = wavread('D:\sources.wav')';

A = [0.8,0.6;0.2,0.5];

R = A*S;

clear S,A;

% PCA

[UL,LAMBDA,UR] = svd(R,'econ');

V = inv(LAMBDA)*UL';

Y = V*R;

% ICA

Z = rotatefactors(Y','Method','quartimax')';

Z = icatb_fastICA(Y,'approach','symm','g','tanh');

[d1,d2,Z] = icatb_runica(Y);

Z = icatb_jade_opac(Y);

slide25
In practice, all mentioned algorithms have been reported to perform satisfactorily

QuartiMax

FastICA

InfoMax

JADE

ICA

QuartiMax

InfoMax

FastICA

JADE [?]

slide26
In practice, all mentioned algorithms have been reported to perform satisfactorily

QuartiMax

FastICA

InfoMax

JADE

ICA

QuartiMax

InfoMax

FastICA

JADE

application

Application

Data reduction

Spatial vs. Temporal ICA

Group-ICA

data reduction
Data reduction
  • fMRI-data are gathered in a matrix Y

Y

data reduction29
Data reduction
  • fMRI-data are gathered in a matrix Y
  • Data are decomposed into principal components by means of SVD

time

comp

Y

Ux

comp

time

Λ

Ut

=∙∙

voxels

voxels

comp

comp

data reduction30
Data reduction
  • fMRI-data are gathered in a matrix Y
  • Data are decomposed into principal components by means of SVD
  • Only the first few strongest components are retained

time

comp

Y

Ux

comp

time

Λ

Ut

=∙∙

voxels

voxels

comp

comp

data reduction31

λ1× ×

λ2× ×

λ3× ×

Data reduction
  • fMRI-data are gathered in a matrix Y
  • Data are decomposed into principal components by means of SVD
  • Only the first few strongest components are retained
  • Each component is the product of
    • a coefficient
    • a spatial map
    • a time course

residuals

Y

+

temporal vs spatial ica
Temporal vs. Spatial ICA

Temporal ICA

  • PCA decomposition results in
    • Uncorrelated spatial maps

and

    • Uncorrelated time courses
  • ICA rotation results in
    • Maximally independent time courses: tICA

or

    • Maximally independent spatial maps: sICA
  • Some methods employ criteria in both domains

Acquisitions

Voxel j

Component

map

Voxel i

Spatial ICA

Voxels

Acquisition j

Component

time course

Acquisition i

temporal vs spatial ica33
Temporal vs. Spatial ICA
  • PCA
  • tICA
  • sICA

Y

Ux

= ∙ ∙

St

Ut

Λ

Y

Ux

Vx

= ∙ ∙ ∙ =∙

Λ

A

St

St

Y

Sx

Sx

= ∙ ∙ ∙ =∙

A

Λ

Ut

St

St

Vt

temporal vs spatial ica34
Temporal vs. Spatial ICA
  • Temporal ICA
    • Components have independent temporal dynamics:“Strength of one component at a particular moment in time does not provide information on the strength of other components at that moment”
    • Components may be correlated/dependent in space
    • Popular for cocktail party problem
  • Spatial ICA
    • Components have independent spatial distributions:“Strength of one component in particular voxel does not provide information on the strength of other components in that voxel”
    • Components may be correlated/dependent in time
    • Popular for fMRI
temporal vs spatial ica35
Temporal vs. Spatial ICA

Calhoun et al., 2001

group ica

λi× ×

Group-ICA
  • Some problems at the subject level become even more pressing at the group level
    • Independent components have no natural interpretation
    • Independent components have no meaningful order
    • The magnitude of independent component maps and time courses is undetermined
    • The sign of independent component maps and time courses is arbitrary
  • For group level analyses, some form of ‘matching’ is required
    • Assumptions
      • Equal coefficients across subjects?
      • Equal distributions across subjects?
      • Equal dynamics across subjects?
group ica37
Group-ICA
  • Method I: Averaging
    • E.g.: Schmithorst et al., 2004
  • Principle
    • Average the data sets of all subjects before ICA
    • Perform ICA on the mean data
  • Key points
    • All subjects are assumed to have identical components
      • Equal coefficients > homogeneous population
      • Equal distributions > comparable brain organization
      • Equal dynamics > fixed paradigm; resting state impossible
    • Statistical assessment at group level
      • Enter ICA time courses into linear regression model (back-projection)
group ica38
Group-ICA
  • Method II: Tensor-ICA
    • E.g.: Beckmann et al., 2005
  • Principle
    • Stack subjects’ data matrices along a third dimension
    • Decompose data tensor into product of (acquisition-dependent) time course, (voxel-dependent) maps, and (subject-dependent) loadings
  • Key points
    • Components may differ only in strength
      • Unequal coefficients > inhomogeneous population
      • Equal distributions > comparable brain organization
      • Equal dynamics > fixed paradigm; resting state impossible
    • Statistical assessment at group level
      • Components as a whole
group ica39
Group-ICA
  • Method III: Spatial concatenation
    • E.g.: Svensén et al., 2002
  • Principle
    • Concatenate subjects’ data matrices along the spatial dimension
    • Perform ICA on aggregate data matrix
    • Partition resulting components into individual maps
  • Key points
    • Components may differ in strength and distribution
      • Unequal coefficients > inhomogeneous population
      • Unequal distributions > brain plasticity
      • Equal dynamics > fixed paradigm; resting state impossible
    • Statistical assessment at group level
      • Voxel-by-voxel SPMs
group ica40
Group-ICA
  • Method IV: Temporal concatenation
    • E.g.: Calhoun et al., 2001
  • Principle
    • Concatenate subjects’ data matrices along the time dimension
    • Perform ICA on aggregate data matrix
    • Partition resulting components into individual time courses
  • Key points
    • Components may differ in strength and dynamics
      • Unequal coefficients > inhomogeneous population
      • Equal distributions > comparable brain organization
      • Unequal dynamics > flexible paradigm
    • Statistical assessment at group level
      • Components as a whole, from time course spectrum/power
      • Voxel-by-voxel SPMs, from back-projection (careful with statistics!)
group ica41
Group-ICA
  • Method V: Retrospective matching
    • Subjective or (semi)automatic matching on basis of similarity between distribution maps and/or time courses
    • Also used to test the reproducibility of some stochastic ICA algorithms
    • Various principles, various authors
  • Principle
    • Perform ICA on individual subjects
    • Match similar individual components one-on-one across subjects
  • Key points
    • Components may differ in strength and dynamics
      • Unequal coefficients > inhomogeneous population
      • Unequal distributions > brain plasticity
      • Unequal dynamics > flexible paradigm
    • Statistical assessment at group level
      • Voxel-by-voxel SPMs (careful with scaling and bias!)
conclusion
Conclusion
  • ICA has the major advantage that it requires minimal assumptions or prior knowledge
  • However, interpretation of the meaning of components occurs retrospectively and may be ambiguous
  • Unfortunately, methods and statistics are not fully characterized yet and still quite heavily under development
  • Therefore - IMHO – independent component analysis is an excellent tool for exploratory experiments, but should not be your first choice for confirmatory studies