 Download Download Presentation An Introduction to Independent Component Analysis (ICA)

# An Introduction to Independent Component Analysis (ICA)

Download Presentation ## An Introduction to Independent Component Analysis (ICA)

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. An Introduction to Independent Component Analysis (ICA) 吳育德 陽明大學放射醫學科學研究所 台北榮總整合性腦功能實驗室

2. The Principle of ICA: a cocktail-party problem x1(t)=a11 s1(t)+a12 s2(t)+a13 s3(t) x2(t)=a21 s1(t)+a22 s2(t) +a12 s3(t) x3(t)=a31 s1(t)+a32 s2(t) +a33 s3(t)

3. Independent Component Analysis Reference : A. Hyvärinen, J. Karhunen, E. Oja (2001) John Wiley & Sons. Independent Component Analysis

4. Central limit theorem • The distribution of a sum of independent random variables tends toward a Gaussian distribution ICn Observed signal = m1 IC1 + m2 IC2 ….+ mn toward Gaussian Non-Gaussian Non-Gaussian Non-Gaussian

5. Partial sum of a sequence {zi} of independent and identically distributed random variables zi Partial sum of a sequence {zi} of independent and identically distributed random variables zi Since mean and variance of xk can grow without bound as k, consider instead of xk the standardized variables The distribution of yk a Gaussian distribution with zero mean and unit variance when k. Central Limit Theorem

6. How to estimate ICA model • Principle for estimating the model of ICA Maximization of NonGaussianity

7. Measures for NonGaussianity • Kurtosis Kurtosis : E{(x- )4}-3*[E{(x-)2}] 2 Super-Gaussian kurtosis > 0 Gaussian kurtosis = 0 Sub-Gaussian kurtosis < 0 kurt(x1+x2)= kurt(x1) + kurt(x2) kurt(x1) =4kurt(x1)

8. 1 - = = T z Vx D E x 2 Whitening process = • Assume measurement is zero mean and x A s = T E { ss } I • Let Dand E be theeigenvalues and eigenvector matrix of • covariance matrix of x, i.e. = T T E { xx } EDE 1 - • Then is a whitening matrix = T V D E 2 = T T T E { zz } V E { xx } V 1 1 - - = T T D E EDE ED 2 2 = I

9. Importance of whitening For the whitened data z, find a vector w such that the linear combination y=wTz has maximum nongaussianityunder the constrain Then Maximize | kurt(wTz)| under the simpler constraint that ||w||=1

10. Constrained Optimization 2 2 l = + l - = = T L ( w , ) F ( w ) ( w 1 ), w w w 1 max F(w), ||w||2=1 ¶ l L ( w , ) = 0 ¶ w ¶ F ( w ) Þ + l = [ 2 w ] 0 ¶ w ¶ F ( w ) Þ = - l 2 w ¶ w At the stable point, the gradient of F(w) must pointin the direction of w, i.e. equal tow multiplied by a scalar.

11. Gradient of kurtosis = = - T T 4 T 2 2 F ( w ) kurt ( w z ) E {( w z ) } 3 [ E {( w z ) ] } ¶ - T 4 T 2 E {( w z ) } 3 E {( w z ) }} ¶ F ( w ) = ¶ ¶ w w 1 å T ¶ - T 4 T 2 ( w z ( t )) 3 ( w w ) 1 å = T T t 1 = E { y } y ( t ) Q = = T t 1 ¶ w 4 å T = - + T 3 T z ( t )[ w z ( t )] 3 * 2 ( w w )( w w ) = T t 1 2 = - T T 3 4 sign ( kurt ( w z ))[ E { z [ w z ] } 3 w w ]

12. Fixed-point algorithm using kurtosis wk ¶ F ( ) wk+1 = wk +  ¶ wk wk ¶ F ( ) -1 = ( +  ) ¶ wk l 2 2 ¬ - 3 T Therefore, w E { z [ w z ] } 3 w w ¬ w w / w Note that adding the gradient to wkdoes not change its direction, since wk+1 = wk - ( wk ) l 2 ) wk = (1- 2 l Convergence :|<wk+1 , wk>|=1 since wk and wk+1areunit vectors

13. Fixed-point algorithm using kurtosis • Centering • Whitening • Choose m, No. of ICs to estimate. Set counter p  1 • Choose an initial guess of unit norm for wp, eg. randomly. • Let • Do deflation decorrelation • Let wp wp/||wp|| • If wp has not converged (|<wpk+1 , wpk>| 1), go to step 5. • Set p  p+1. If p  m, go back to step 4. One-by-one Estimation Fixed-point iteration

14. Fixed-point algorithm using negentropy The kurtosis is very sensitive to outliers, which may be erroneous or irrelevant observations ex. r.v. with sample size=1000, mean=0, variance=1, contains one value = 10  kurtosis at least equal to 104/1000-3=7 kurtosis : E{x4}-3 Need to find a more robust measure for nongaussianity Approximation of negentropy

15. Entropy Fixed-point algorithm using negentropy Entropy Negentropy Approximation of negentropy

16. Fixed-point algorithm using negentropy Max J(y) w E{zg(wTz)} E{g '(wTz)} w w w/||w|| Convergence : |<wk+1 , wk>|=1

17. Fixed-point algorithm using negentropy • Centering • Whitening • Choose m, No. of ICs to estimate. Set counter p  1 • Choose an initial guess of unit norm for wp, eg. randomly. • Let • Do deflation decorrelation • Let wp wp/||wp|| • If wp has not converged, go back to step 5. • Set p  p+1. If p  m, go back to step 4. One-by-one Estimation Fixed-point iteration

18. Implantations • Create two uniform sources

19. Implantations • Create two uniform sources

20. Implantations • Two mixed observed signals

21. Implantations • Two mixed observed signals

22. Implantations • Centering

23. Implantations • Centering

24. Implantations • Whitening

25. Implantations • Whitening

26. Implantations • Fixed-point iteration using kurtosis

27. Implantations • Fixed-point iteration using kurtosis

28. Implantations • Fixed-point iteration using kurtosis

29. Implantations • Fixed-point iteration using negentropy

30. Implantations • Fixed-point iteration using negentropy

31. Implantations • Fixed-point iteration using negentropy

32. Implantations • Fixed-point iteration using negentropy

33. Implantations • Fixed-point iteration using negentropy

34. Entropy Entropy A Gaussian variable has the largest entropy among all random variables of equal variance A Gaussian variable has the largest entropy among all random variables of equal variance Negentropy Gaussian  0 nonGaussian  >0 Gaussian  0 nonGaussian  >0 It’s the optimal estimator but computationally difficult since it requires an estimate of the pdf Negentropy Approximation of negentropy Fixed-point algorithm using negentropy

35. High-order cumulant approximation Replace the polynomial function by any nonpolynomial fun Gi ex.G1 is odd and G2 is even It’s quite common that most r.v. have approximately symmetric dist. Fixed-point algorithm using negentropy It’s quite common that most r.v. have approximately symmetric dist.

36. Fixed-point algorithm using negentropy According to Lagrange multiplier  the gradient must point in the direction of w

37. Finding  by approximative Newton method Real Newton method is fast.  small steps for convergence It requires a matrix inversion at every step. large computational load Special properties of the ICA problem  approximative Newton method No need a matrix inversion but converge roughly with the same steps as real Newton method Fixed-point algorithm using negentropy The iteration doesn't have the good convergence properties because the nonpolynomial moments don't have the same nice algebraic properties.

38. According to Lagrange multiplier  the gradient must point in the direction of w Solve this equation by Newton method JF(w)*w = -F(w) w = [JF(w)]-1[-F(w)] E{zzT}E{g'(wTz)} = E{g'(wTz)} I diagonalize Multiply by -E{g '(wTz)} w  w [E{zg(wTz)} w] /[E{g'(wTz)} ] w E{zg(wTz)} E{g '(wTz)} w w w/||w|| Fixed-point algorithm using negentropy

39. Fixed-point algorithm using negentropy w E{zg(wTz)} E{g '(wTz)} w w w/||w|| Convergence : |<wk+1 , wk>|=1 Because ICs can be defined only up to a multiplication sign