Create Presentation
Download Presentation

Download Presentation
## An Introduction to Independent Component Analysis (ICA)

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**An Introduction to Independent Component Analysis (ICA)**吳育德 陽明大學放射醫學科學研究所 台北榮總整合性腦功能實驗室**The Principle of ICA: a cocktail-party problem**x1(t)=a11 s1(t)+a12 s2(t)+a13 s3(t) x2(t)=a21 s1(t)+a22 s2(t) +a12 s3(t) x3(t)=a31 s1(t)+a32 s2(t) +a33 s3(t)**Independent Component Analysis**Reference : A. Hyvärinen, J. Karhunen, E. Oja (2001) John Wiley & Sons. Independent Component Analysis**Central limit theorem**• The distribution of a sum of independent random variables tends toward a Gaussian distribution ICn Observed signal = m1 IC1 + m2 IC2 ….+ mn toward Gaussian Non-Gaussian Non-Gaussian Non-Gaussian**Partial sum of a sequence {zi} of independent and**identically distributed random variables zi Partial sum of a sequence {zi} of independent and identically distributed random variables zi Since mean and variance of xk can grow without bound as k, consider instead of xk the standardized variables The distribution of yk a Gaussian distribution with zero mean and unit variance when k. Central Limit Theorem**How to estimate ICA model**• Principle for estimating the model of ICA Maximization of NonGaussianity**Measures for NonGaussianity**• Kurtosis Kurtosis : E{(x- )4}-3*[E{(x-)2}] 2 Super-Gaussian kurtosis > 0 Gaussian kurtosis = 0 Sub-Gaussian kurtosis < 0 kurt(x1+x2)= kurt(x1) + kurt(x2) kurt(x1) =4kurt(x1)**1**- = = T z Vx D E x 2 Whitening process = • Assume measurement is zero mean and x A s = T E { ss } I • Let Dand E be theeigenvalues and eigenvector matrix of • covariance matrix of x, i.e. = T T E { xx } EDE 1 - • Then is a whitening matrix = T V D E 2 = T T T E { zz } V E { xx } V 1 1 - - = T T D E EDE ED 2 2 = I**Importance of whitening**For the whitened data z, find a vector w such that the linear combination y=wTz has maximum nongaussianityunder the constrain Then Maximize | kurt(wTz)| under the simpler constraint that ||w||=1**Constrained Optimization**2 2 l = + l - = = T L ( w , ) F ( w ) ( w 1 ), w w w 1 max F(w), ||w||2=1 ¶ l L ( w , ) = 0 ¶ w ¶ F ( w ) Þ + l = [ 2 w ] 0 ¶ w ¶ F ( w ) Þ = - l 2 w ¶ w At the stable point, the gradient of F(w) must pointin the direction of w, i.e. equal tow multiplied by a scalar.**Gradient of kurtosis**= = - T T 4 T 2 2 F ( w ) kurt ( w z ) E {( w z ) } 3 [ E {( w z ) ] } ¶ - T 4 T 2 E {( w z ) } 3 E {( w z ) }} ¶ F ( w ) = ¶ ¶ w w 1 å T ¶ - T 4 T 2 ( w z ( t )) 3 ( w w ) 1 å = T T t 1 = E { y } y ( t ) Q = = T t 1 ¶ w 4 å T = - + T 3 T z ( t )[ w z ( t )] 3 * 2 ( w w )( w w ) = T t 1 2 = - T T 3 4 sign ( kurt ( w z ))[ E { z [ w z ] } 3 w w ]**Fixed-point algorithm using kurtosis**wk ¶ F ( ) wk+1 = wk + ¶ wk wk ¶ F ( ) -1 = ( + ) ¶ wk l 2 2 ¬ - 3 T Therefore, w E { z [ w z ] } 3 w w ¬ w w / w Note that adding the gradient to wkdoes not change its direction, since wk+1 = wk - ( wk ) l 2 ) wk = (1- 2 l Convergence :|<wk+1 , wk>|=1 since wk and wk+1areunit vectors**Fixed-point algorithm using kurtosis**• Centering • Whitening • Choose m, No. of ICs to estimate. Set counter p 1 • Choose an initial guess of unit norm for wp, eg. randomly. • Let • Do deflation decorrelation • Let wp wp/||wp|| • If wp has not converged (|<wpk+1 , wpk>| 1), go to step 5. • Set p p+1. If p m, go back to step 4. One-by-one Estimation Fixed-point iteration**Fixed-point algorithm using negentropy**The kurtosis is very sensitive to outliers, which may be erroneous or irrelevant observations ex. r.v. with sample size=1000, mean=0, variance=1, contains one value = 10 kurtosis at least equal to 104/1000-3=7 kurtosis : E{x4}-3 Need to find a more robust measure for nongaussianity Approximation of negentropy**Entropy**Fixed-point algorithm using negentropy Entropy Negentropy Approximation of negentropy**Fixed-point algorithm using negentropy**Max J(y) w E{zg(wTz)} E{g '(wTz)} w w w/||w|| Convergence : |<wk+1 , wk>|=1**Fixed-point algorithm using negentropy**• Centering • Whitening • Choose m, No. of ICs to estimate. Set counter p 1 • Choose an initial guess of unit norm for wp, eg. randomly. • Let • Do deflation decorrelation • Let wp wp/||wp|| • If wp has not converged, go back to step 5. • Set p p+1. If p m, go back to step 4. One-by-one Estimation Fixed-point iteration**Implantations**• Create two uniform sources**Implantations**• Create two uniform sources**Implantations**• Two mixed observed signals**Implantations**• Two mixed observed signals**Implantations**• Centering**Implantations**• Centering**Implantations**• Whitening**Implantations**• Whitening**Implantations**• Fixed-point iteration using kurtosis**Implantations**• Fixed-point iteration using kurtosis**Implantations**• Fixed-point iteration using kurtosis**Implantations**• Fixed-point iteration using negentropy**Implantations**• Fixed-point iteration using negentropy**Implantations**• Fixed-point iteration using negentropy**Implantations**• Fixed-point iteration using negentropy**Implantations**• Fixed-point iteration using negentropy**Entropy**Entropy A Gaussian variable has the largest entropy among all random variables of equal variance A Gaussian variable has the largest entropy among all random variables of equal variance Negentropy Gaussian 0 nonGaussian >0 Gaussian 0 nonGaussian >0 It’s the optimal estimator but computationally difficult since it requires an estimate of the pdf Negentropy Approximation of negentropy Fixed-point algorithm using negentropy**High-order cumulant approximation**Replace the polynomial function by any nonpolynomial fun Gi ex.G1 is odd and G2 is even It’s quite common that most r.v. have approximately symmetric dist. Fixed-point algorithm using negentropy It’s quite common that most r.v. have approximately symmetric dist.**Fixed-point algorithm using negentropy**According to Lagrange multiplier the gradient must point in the direction of w**Finding by approximative Newton method**Real Newton method is fast. small steps for convergence It requires a matrix inversion at every step. large computational load Special properties of the ICA problem approximative Newton method No need a matrix inversion but converge roughly with the same steps as real Newton method Fixed-point algorithm using negentropy The iteration doesn't have the good convergence properties because the nonpolynomial moments don't have the same nice algebraic properties.**According to Lagrange multiplier the gradient must point**in the direction of w Solve this equation by Newton method JF(w)*w = -F(w) w = [JF(w)]-1[-F(w)] E{zzT}E{g'(wTz)} = E{g'(wTz)} I diagonalize Multiply by -E{g '(wTz)} w w [E{zg(wTz)} w] /[E{g'(wTz)} ] w E{zg(wTz)} E{g '(wTz)} w w w/||w|| Fixed-point algorithm using negentropy**Fixed-point algorithm using negentropy**w E{zg(wTz)} E{g '(wTz)} w w w/||w|| Convergence : |<wk+1 , wk>|=1 Because ICs can be defined only up to a multiplication sign