1 / 52

Analyzing Patterns Using Principal Component Analysis

Analyzing Patterns Using Principal Component Analysis. BMSC601 W. Rose. Hubley-Kozey et al. (2006) “Neuromuscular alterations during walking…”, J. Electromyo. Kinesiol. 16 : 365-378.

ossie
Download Presentation

Analyzing Patterns Using Principal Component Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyzing Patterns Using Principal Component Analysis BMSC601 W. Rose

  2. Hubley-Kozey et al. (2006) “Neuromuscular alterations during walking…”, J. Electromyo. Kinesiol.16: 365-378. EMGs and kinematic data are complex, or “high-dimensional”, i.e. they have a lot of numbers: they vary with time and by site. Pattern analysis tries to find patterns in complex data, and thus to reduce complex data to a smaller set of numbers that can be quantitatively compared (“data reduction”). Principal component analysis (PCA) is one kind of pattern analysis. PCA BMSC601 W. Rose

  3. Assume we have multiple “copies” of a signal. The copies could be from multiple trials in one person, or from different people, etc. Goal Find patterns in the signals and quantify whether there are differences in patterns between subgroups. PCA BMSC601 W. Rose

  4. PCA Approach to the Goal Find the pattern (wave shape) which, when scaled up or down by an adjustable scaling factor, accounts for as much of the power as possible in the signals. This is the first principal component. The scaling factor is adjusted for each trial individually, but the basic shape is not adjustable and cannot “slide” along the time axis. PCA BMSC601 W. Rose

  5. PCA Approach to the Goal, cont. Best basic shape = 1st principal component. Once 1st PC is found, find scale factor for each trial. Scale factor tells us “how much” of the 1st PC there is in each trial. (Scale factor could even be negative.) Collect scale factors from “experimentals” & controls and compare them using standard statistics (t test, Wilcoxon rank-sum, ANOVA, etc), to see if scale factors differ from group to group. If they do, it means that “size” of 1st PC is different in diff. groups. PCA BMSC601 W. Rose

  6. PCA Approach to the Goal, cont. Scaled versions of 1st PC can be subtracted from each signal to get “first residual” for each trial, i.e. the part of signal not accounted for by 1st PC. Now repeat what we did before, using 1st residuals of each signal, instead of the actual signals. The shape which, when scaled separately for each trial, does the best job in accounting for the 1st residuals, is the 2nd PC. PCA for EMGs BMSC601 W. Rose

  7. PCA Approach to the Goal, cont. As with 1st PC, we find a scale factor for each trial. The scale factor is “how much” of 2nd PC there is in each trial, i.e. it’s the amount that 2nd PC should be multiplied by to get best fit to that trial’s 1st residual. Once again, we can use statistics to compare 2nd PC scale factors across groups to see if “size” of 2nd PC is different in diff. groups. PCA for EMGs BMSC601 W. Rose

  8. PCA Approach to the Goal, cont. We now compute “2nd residuals” by subtracting the scaled version of 2nd PC from each 1st residual. We use the 2nd residuals to get the third PC, the scaling factors for 3rd PC, and so on. If there are N time points in each signal, we can keep going until we find N principal components. PCA BMSC601 W. Rose

  9. The high numbered PCs are probably not very significant. They may be just fitting the noise. How many PCs should we analyze? Standard Answers: Accept only enough PCs to account for 90% (or 95%) of the variance in the original signals Or Accept PCs until the next one accounts for less than 1% of the variance in the original signals. PCA BMSC601 W. Rose

  10. PCA as described above is done on signals whose mean values have been subtracted off, so the signals analyzed WILL have zero mean value. If desired, the means can be added back on at the end. Hubley-Kozey et al. use Karhunen-Loeve transformation (KLT). Raptopoulos et al. (J Biomech 2006) use “KLT”; they DO subtract mean values first. Raptop 2006 use correlation mtx instead of covariance mtx. PCA BMSC601 W. Rose

  11. In PCA and KLT, start with matrix X whose columns are the signals (e.g. EMGs) from diff. subjects. Rows are different time points. KLT and PCA: compute Cov(X) (or Corr(X): Raptopoulos et al 2006; Schutte et al 2000; Deluzio et al 1997) Cov(X) = [X-E(X)]*[X-E(X)]T / (m-1) E(X) = matrix whose every column = the mean vector. Eigenvectors of Cov(X) are the PCs. Eigenvalues of Cov(X) indicate how much of the variance is accounted for by each PC. PCA BMSC601 W. Rose

  12. Acc. Gerbrands, 1981: KLT: compute y = TT x PCA: compute y = TT [x-E(x)] where E(x)=mean vector. Hall Müller Wang 2006 define KLT with de-meaned X; equate KLT with “functional PCA” PCA BMSC601 W. Rose

  13. Simulate some EMGs. Each EMG is a weighted sum of 4 components (pulses). The weight (i.e. factor that multiplies the y-scale) of each component is a random number between -1,+1. The random numbers differ from trace to trace. Some random noise is also added at each time point. PCA W. Rose BMSC601

  14. PCA BMSC601 W. Rose

  15. PCA Set 1: s1 thru s4≈ (-1,1). Six examples, mean of 20 shown. BMSC601 W. Rose

  16. Example: running a PCA program in Matlab >> PCA02 Enter name of text file containing data (.txt will be added): EMG_set1 100 rows (=times); 20 columns (=different signals) k(90%)=3, k(95%)=4, k(1%)=4 PC weight +- SE %Var accounted for 1 -0.337 +- 0.59 46.4 2 -0.101 +- 0.47 30.2 3 +0.162 +- 0.33 14.7 4 -0.046 +- 0.23 7.0 5 +0.011 +- 0.04 0.2 6 +0.006 +- 0.04 0.2 1 thru 10 11 thru 20 PC weight +- SE weight +- SE 1 -0.480 +- 0.78 -0.193 +- 0.91 2 0.515 +- 0.72 -0.716 +- 0.58 3 -0.142 +- 0.31 0.465 +- 0.58 4 -0.148 +- 0.33 0.055 +- 0.33 Variables saved in EMG_set1_PCA2.mat. >> PCA The program also makes 3 plots. W. Rose BMSC601

  17. PCA BMSC601 W. Rose

  18. PCA BMSC601 W. Rose

  19. PCA Set 1L: s1 thru s4≈ (-1,1); M=20 W. Rose BMSC601

  20. PCA Set 1L1: s1 thru s4≈ (-1,1); M=100. W. Rose BMSC601

  21. PCA Set 1L2: s1 thru s4≈ (-1,1); M=100. W. Rose BMSC601

  22. PCA Set 1VL: s1 thru s4≈ (-1,1); M=200. W. Rose BMSC601

  23. PCA Set 1VL: s1 thru s4≈ (-1,1); M=200. W. Rose BMSC601

  24. PCA Set 1VL: s1 thru s4≈ (-1,1); M=200. W. Rose BMSC601

  25. Do it again using Hubley-Kozey’s version of KLT (i.e. without de-meaning) [Need to check KLT graphs and software to see if it correctly used the covariance matrix – WCR 2006-11-16] PCA W. Rose BMSC601

  26. KLT KLT (H-K) BMSC601 W. Rose

  27. KLT (H-K) KLT PCA BMSC601 W. Rose

  28. KLT (H-K) PCA BMSC601 W. Rose

  29. KLT BMSC601 KLT (H-K) W. Rose

  30. KLT (H-K) (no de-meaning PCA For this data (EMG_set1.txt), mean values ~= 0 so get similar results with both methods. PCA W. Rose BMSC601

  31. Simulate EMGs using same four components (pulses) as before. Make two populations whose random weights (s1 to s4) for the components are randomly distributed over the following intervals: PCA W. Rose BMSC601

  32. PCA BMSC601 W. Rose

  33. PCA s1 thru s4≈ (1,3) Six examples, mean of 20 shown. BMSC601 W. Rose

  34. PCA s1, s3≈ (1,3); s2 ≈ (1,2); s4 ≈ (1,4) Six examples, mean of 20 shown. BMSC601 W. Rose

  35. PCA Set 2a: s1, s2, s3, s4 ≈ (1,3) Set 2b: s1, s3≈ (1,3); s2 ≈ (1,2); s4 ≈ (1,4) BMSC601 W. Rose

  36. Analysis of Simulated Data Set 2 >> PCA02 Enter name of text file containing data (.txt will be added): EMG_set2 100 rows (=times); 40 columns (=different signals) k(90%)=4, k(95%)=4, k(1%)=4 PC weight +- SE %Var accounted for 1 -0.977 +- 0.37 40.4 2 -0.163 +- 0.32 28.8 3 -0.870 +- 0.27 20.8 4 -7.389 +- 0.17 8.0 5 -0.119 +- 0.02 0.1 6 -0.032 +- 0.02 0.1 1 thru 20 21 thru 40 PC weight +- SE weight +- SE 1 -1.556 +- 0.49 -0.397 +- 0.55 2 -0.981 +- 0.42 0.655 +- 0.40 3 -0.192 +- 0.44 -1.549 +- 0.23 4 -7.554 +- 0.23 -7.224 +- 0.24 Variables saved in EMG_set2_PCA2.mat. >> PCA W. Rose BMSC601

  37. PCA Set 2: s1 thru s4≈ (1,3); M=40 W. Rose BMSC601

  38. PCA Set 2: s1 thru s4≈ (1,3); M=40 W. Rose BMSC601

  39. Do it again using Hubley-Kozey’s version of KLT (i.e. without de-meaning) [Need to check KLT graphs and software to see if it correctly used the covariance matrix – WCR 2006-11-16] PCA W. Rose BMSC601

  40. KLT KLT (H-K) BMSC601 W. Rose

  41. KLT (H-K) KLT PCA BMSC601 W. Rose

  42. KLT (H-K) PCA BMSC601 W. Rose

  43. KLT Reconstruction with 1 PC. BMSC601 KLT (H-K style) W. Rose

  44. KLT without demeaning seems much worse but only because it is reconstructing with 1 PC, since this reconstr uses PCs sufficient to get >=90% of variance. Need to change this to include first 4 PCs for both reconstructions. [Need to check KLT graphs and software to see if it correctly used the covariance matrix – WCR 2006-11-16] KLT (H-K) PCA PCA W. Rose BMSC601

  45. New simulated data set: EMG_set3. Pulses used are same as before. Weights are greater for pulses with greater variances. (All 4 pulses have min=0, max=1, but some are wider so have greater power.) Will this allow PCA to find PCs that correspond better to the original pulses? s1=(4,8) s2=(2,6) s3=(3,7) s4a=(1,5) (m=40) s1=(4,8) s2=(2,6) s3=(3,7) s4b=(0,4) (m=40) PCA W. Rose BMSC601

  46. PCA Set 3: s1≈ (4,8), s2 ≈ (2,6),s3 ≈ (3,7),s4a ≈ (1,5), s4b ≈ (0,4) m=80 (40 each) BMSC601 W. Rose

  47. PCA PCA Set 3: s1≈ (4,8), s2 ≈ (2,6),s3 ≈ (3,7),s4a ≈ (1,5), s4b ≈ (0,4); m=80 BMSC601 W. Rose

  48. PCA Set 3: s1≈ (4,8), s2 ≈ (2,6),s3 ≈ (3,7),s4a ≈ (1,5), s4b ≈ (0,4); m=80 BMSC601 PCA W. Rose

  49. Do it again using Hubley-Kozey’s version of KLT (i.e. without de-meaning) [Need to check KLT graphs and software to see if it correctly used the covariance matrix – WCR 2006-11-16] PCA W. Rose BMSC601

  50. PCA PCA without de-meaning Set 3: s1≈ (4,8), s2 ≈ (2,6),s3 ≈ (3,7),s4a ≈ (1,5), s4b ≈ (0,4); m=80 BMSC601 W. Rose

More Related