Download
an introduction to independent component analysis ica n.
Skip this Video
Loading SlideShow in 5 Seconds..
An Introduction to Independent Component Analysis (ICA) PowerPoint Presentation
Download Presentation
An Introduction to Independent Component Analysis (ICA)

An Introduction to Independent Component Analysis (ICA)

260 Views Download Presentation
Download Presentation

An Introduction to Independent Component Analysis (ICA)

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. An Introduction to Independent Component Analysis (ICA) 吳育德 陽明大學放射醫學科學研究所 台北榮總整合性腦功能實驗室

  2. The Principle of ICA: a cocktail-party problem x1(t)=a11 s1(t)+a12 s2(t)+a13 s3(t) x2(t)=a21 s1(t)+a22 s2(t) +a12 s3(t) x3(t)=a31 s1(t)+a32 s2(t) +a33 s3(t)

  3. Independent Component Analysis Reference : A. Hyvärinen, J. Karhunen, E. Oja (2001) John Wiley & Sons. Independent Component Analysis

  4. Central limit theorem • The distribution of a sum of independent random variables tends toward a Gaussian distribution ICn Observed signal = m1 IC1 + m2 IC2 ….+ mn toward Gaussian Non-Gaussian Non-Gaussian Non-Gaussian

  5. Partial sum of a sequence {zi} of independent and identically distributed random variables zi Partial sum of a sequence {zi} of independent and identically distributed random variables zi Since mean and variance of xk can grow without bound as k, consider instead of xk the standardized variables The distribution of yk a Gaussian distribution with zero mean and unit variance when k. Central Limit Theorem

  6. How to estimate ICA model • Principle for estimating the model of ICA Maximization of NonGaussianity

  7. Measures for NonGaussianity • Kurtosis Kurtosis : E{(x- )4}-3*[E{(x-)2}] 2 Super-Gaussian kurtosis > 0 Gaussian kurtosis = 0 Sub-Gaussian kurtosis < 0 kurt(x1+x2)= kurt(x1) + kurt(x2) kurt(x1) =4kurt(x1)

  8. 1 - = = T z Vx D E x 2 Whitening process = • Assume measurement is zero mean and x A s = T E { ss } I • Let Dand E be theeigenvalues and eigenvector matrix of • covariance matrix of x, i.e. = T T E { xx } EDE 1 - • Then is a whitening matrix = T V D E 2 = T T T E { zz } V E { xx } V 1 1 - - = T T D E EDE ED 2 2 = I

  9. Importance of whitening For the whitened data z, find a vector w such that the linear combination y=wTz has maximum nongaussianityunder the constrain Then Maximize | kurt(wTz)| under the simpler constraint that ||w||=1

  10. Constrained Optimization 2 2 l = + l - = = T L ( w , ) F ( w ) ( w 1 ), w w w 1 max F(w), ||w||2=1 ¶ l L ( w , ) = 0 ¶ w ¶ F ( w ) Þ + l = [ 2 w ] 0 ¶ w ¶ F ( w ) Þ = - l 2 w ¶ w At the stable point, the gradient of F(w) must pointin the direction of w, i.e. equal tow multiplied by a scalar.

  11. Gradient of kurtosis = = - T T 4 T 2 2 F ( w ) kurt ( w z ) E {( w z ) } 3 [ E {( w z ) ] } ¶ - T 4 T 2 E {( w z ) } 3 E {( w z ) }} ¶ F ( w ) = ¶ ¶ w w 1 å T ¶ - T 4 T 2 ( w z ( t )) 3 ( w w ) 1 å = T T t 1 = E { y } y ( t ) Q = = T t 1 ¶ w 4 å T = - + T 3 T z ( t )[ w z ( t )] 3 * 2 ( w w )( w w ) = T t 1 2 = - T T 3 4 sign ( kurt ( w z ))[ E { z [ w z ] } 3 w w ]

  12. Fixed-point algorithm using kurtosis wk ¶ F ( ) wk+1 = wk +  ¶ wk wk ¶ F ( ) -1 = ( +  ) ¶ wk l 2 2 ¬ - 3 T Therefore, w E { z [ w z ] } 3 w w ¬ w w / w Note that adding the gradient to wkdoes not change its direction, since wk+1 = wk - ( wk ) l 2 ) wk = (1- 2 l Convergence :|<wk+1 , wk>|=1 since wk and wk+1areunit vectors

  13. Fixed-point algorithm using kurtosis • Centering • Whitening • Choose m, No. of ICs to estimate. Set counter p  1 • Choose an initial guess of unit norm for wp, eg. randomly. • Let • Do deflation decorrelation • Let wp wp/||wp|| • If wp has not converged (|<wpk+1 , wpk>| 1), go to step 5. • Set p  p+1. If p  m, go back to step 4. One-by-one Estimation Fixed-point iteration

  14. Fixed-point algorithm using negentropy The kurtosis is very sensitive to outliers, which may be erroneous or irrelevant observations ex. r.v. with sample size=1000, mean=0, variance=1, contains one value = 10  kurtosis at least equal to 104/1000-3=7 kurtosis : E{x4}-3 Need to find a more robust measure for nongaussianity Approximation of negentropy

  15. Entropy Fixed-point algorithm using negentropy Entropy Negentropy Approximation of negentropy

  16. Fixed-point algorithm using negentropy Max J(y) w E{zg(wTz)} E{g '(wTz)} w w w/||w|| Convergence : |<wk+1 , wk>|=1

  17. Fixed-point algorithm using negentropy • Centering • Whitening • Choose m, No. of ICs to estimate. Set counter p  1 • Choose an initial guess of unit norm for wp, eg. randomly. • Let • Do deflation decorrelation • Let wp wp/||wp|| • If wp has not converged, go back to step 5. • Set p  p+1. If p  m, go back to step 4. One-by-one Estimation Fixed-point iteration

  18. Implantations • Create two uniform sources

  19. Implantations • Create two uniform sources

  20. Implantations • Two mixed observed signals

  21. Implantations • Two mixed observed signals

  22. Implantations • Centering

  23. Implantations • Centering

  24. Implantations • Whitening

  25. Implantations • Whitening

  26. Implantations • Fixed-point iteration using kurtosis

  27. Implantations • Fixed-point iteration using kurtosis

  28. Implantations • Fixed-point iteration using kurtosis

  29. Implantations • Fixed-point iteration using negentropy

  30. Implantations • Fixed-point iteration using negentropy

  31. Implantations • Fixed-point iteration using negentropy

  32. Implantations • Fixed-point iteration using negentropy

  33. Implantations • Fixed-point iteration using negentropy

  34. Entropy Entropy A Gaussian variable has the largest entropy among all random variables of equal variance A Gaussian variable has the largest entropy among all random variables of equal variance Negentropy Gaussian  0 nonGaussian  >0 Gaussian  0 nonGaussian  >0 It’s the optimal estimator but computationally difficult since it requires an estimate of the pdf Negentropy Approximation of negentropy Fixed-point algorithm using negentropy

  35. High-order cumulant approximation Replace the polynomial function by any nonpolynomial fun Gi ex.G1 is odd and G2 is even It’s quite common that most r.v. have approximately symmetric dist. Fixed-point algorithm using negentropy It’s quite common that most r.v. have approximately symmetric dist.

  36. Fixed-point algorithm using negentropy According to Lagrange multiplier  the gradient must point in the direction of w

  37. Finding  by approximative Newton method Real Newton method is fast.  small steps for convergence It requires a matrix inversion at every step. large computational load Special properties of the ICA problem  approximative Newton method No need a matrix inversion but converge roughly with the same steps as real Newton method Fixed-point algorithm using negentropy The iteration doesn't have the good convergence properties because the nonpolynomial moments don't have the same nice algebraic properties.

  38. According to Lagrange multiplier  the gradient must point in the direction of w Solve this equation by Newton method JF(w)*w = -F(w) w = [JF(w)]-1[-F(w)] E{zzT}E{g'(wTz)} = E{g'(wTz)} I diagonalize Multiply by -E{g '(wTz)} w  w [E{zg(wTz)} w] /[E{g'(wTz)} ] w E{zg(wTz)} E{g '(wTz)} w w w/||w|| Fixed-point algorithm using negentropy

  39. Fixed-point algorithm using negentropy w E{zg(wTz)} E{g '(wTz)} w w w/||w|| Convergence : |<wk+1 , wk>|=1 Because ICs can be defined only up to a multiplication sign