1 / 58

Lecture 20 Empirical Orthogonal Functions and Factor Analysis

Lecture 20 Empirical Orthogonal Functions and Factor Analysis. Motivation in Fourier Analysis the choice of sine and cosine “patterns” was prescribed by the method. Could we use the data itself as a source of information about the shape of the patterns?.

roch
Download Presentation

Lecture 20 Empirical Orthogonal Functions and Factor Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 20Empirical Orthogonal FunctionsandFactor Analysis

  2. Motivationin Fourier Analysis the choice of sine and cosine “patterns” was prescribed by the method.Could we use the data itself as a source of information about the shape of the patterns?

  3. Examplemaps of some hypothetical function,say, sea surface temperatureforming a sequence in time

  4. the data time time

  5. the data

  6. pattern number pattern importance

  7. Choose just the most important patterns pattern number pattern importance 3

  8. 3 most important patterns

  9. comparison original reconstruction using only 3 patterns Note that this process has reduced the noise(since noise has no pattern common to all the images)

  10. amplitudes of patterns time

  11. amplitudes of patterns time Note: no requirement that pattern is periodic in time

  12. Discussion:mixing of end members

  13. Useful tool for data that has three “components” ternary diagram C A B

  14. works for 3 end-members, as long as A+B+C=100% C 0% A 25% A 50% A 75% A 100% A B … similarly for B and C

  15. Suppose data fall near line on diagram C = data A B

  16. Suppose data fall near line on diagram C = end-members or factors f1 f2 A B

  17. Suppose data fall near line on diagram C = end-members or factors f1 f2 A B

  18. 50% Suppose data fall near line on diagram C = end-members or factors f1 mixing line f2 A B

  19. C f1 f2 A B data idealize as being on mixing line

  20. You could represent the data exactly with a third ‘noise’ factor C doesn’t much matter where you put f3, as long as it’s not on the line f1 f2 f3 A B

  21. S: components (A, B, C, …) in each sample, s (A in s1) (B in s1) (C in s1) (A in s2) (B in s2) (C in s2) (A in s3) (B in s3) (C in s3) … (A in sN) (B in sN) (C in sN) S = N samplesM componentsS is NM Note: a sample is along a row in S

  22. F: components (A, B, C, …) in each factor, f (A in f1) (B in f1) (C in f1) (A in f2) (B in f2) (C in f2) (A in f3) (B in f3) (C in f3) F = M componentsM factorsF is MM

  23. C: coefficients of the factors (f1 in s1) (f2 in s1) (f3 in s1) (f1 in s2) (f2 in s2) (f3 in s2) (f1 in s3) (f2 in s3) (f3 in s3) … (f1 in sN) (f2 in sN) (f3 in sN) C = N samplesM factorsC is NM

  24. SamplesNM S = C F (A in s1) (B in s1) (C in s1) (A in s2) (B in s2) (C in s2) (A in s3) (B in s3) (C in s3) … (A in sN) (B in sN) (C in sN) (f1 in s1) (f2 in s1) (f3 in s1) (f1 in s2) (f2 in s2) (f3 in s2) (f1 in s3) (f2 in s3) (f3 in s3) … (f1 in sN) (f2 in sN) (f3 in sN) (A in f1) (B in f1) (C in f1) (A in f2) (B in f2) (C in f2) (A in f3) (B in f3) (C in f3) = FactorsMM CoefficientsNM

  25. SamplesNM data approximated with only most important factorsp most important factors = those with the biggest coefficients S  C’ F’ (A in s1) (B in s1) (C in s1) (A in s2) (B in s2) (C in s2) (A in s3) (B in s3) (C in s3) … (A in sN) (B in sN) (C in sN) (f1 in s1) (f2 in s1) (f1 in s2) (f2 in s2) (f1 in s3) (f2 in s3) … (f1 in sN) (f2 in sN) (A in f1) (B in f1) (C in f1) (A in f2) (B in f2) (C in f2) = ignore f3 ignore f3 selectedcoefficientsNp selectedfactors pM

  26. view samples as vectors in space B s1 s2 s3 Let the factors be unit vectors … f C A … then the coefficients are the projections (dot products) of the sample onto the factors

  27. B s1 s2 s3 f C A Suggests a method of choosing factors so that they have large coefficients: Find the factor f that maximizesE = Si [ si f ]2with the constraint that ff =1Note: square the dot product since it can be negative

  28. Find the factor f that maximizesE = Si [ si f ]2 with the constraint that L = ff – 1 = 0E = Si [ si f ]2 = Si [Sj Sij fj] [Sk Sik fk] = SjSk [Si Sij Sik] fj fk = SjSk Mjk fj fk with Mjk= Si Sij Sik or M=STSL = Si fi2 – 1Use Lagrange Multipliers, minimizing F=E-l2L, where l2 is the Lagrange Multiplier. We solved this problem 2 lectures ago. It’s solution is the algebraic eigenvalue problemMf = l2f. Recall that the eigenvalue is the corresponding value of E. symmetric Write as square for reasons that will become apparent later

  29. So factors solve the algebraic eigenvalue problem:[STS] f = l2f.[STS]is a square matrix with the same number of rows and columns as there are components. So there are as many factors as there are components. The factors must span a space of the same dimension as the components.If you sort the eigenvectors by the size of their eigenvectors, then the ones with the largest eigenvalue have the largest components. So selecting the most important factors is easy.

  30. An important tidbit from the theory of eigenvalues and eigenvectors that we’ll use later on …[STS] f = l2f.LetL2 be a diagonal matrix of eigenvalues, li2and letV be a matrix whose columns are the corresponding factors, f(i)Then[STS] = V L2 VT

  31. Note also that the factors are orthogonalf(i)f(j)= 0 if ijThis is a mathematically pleasant propertyBut it may not always be the physically most-relevant choice close to mean of data contains negative A C C f2 f1 f1 f2 A not orthogonal B B orthogonal A

  32. Upshoteigenvectors of [STS] f = l2f with the p eigenvaluesidentify a p-dimensional sub-spacein which most of the data lieyou can use those eigenvectors as factorsOrYou can chose any other p factors that span that subspace In the ternary diagram example, they must lie on the line connecting the two SVD factors

  33. Singular Value Decomposition (SVD)Any NM matrix S and be written as the product of three matricesS = ULVTwhere U is NN and satisfies UTU = UUTV is MM and satisfies VTV = VVTandL is an NM diagonal matrix of singular values

  34. Now note that itS = ULVT thenSTS = [ULVT]T [ULVT] = VLUTULVT =VL2VT Compare with the tidbit mentioned earlier STS=VL2VT The SVD V is the same V we were talking about earlierThe columns of V are the eigenvectors f, soF = VTSo we can use the SVD to calculatethe factors, F

  35. But its even better than that! WriteS = ULVT asS = ULVT = [UL] [VT] = C FSo the coefficients are C = ULand, as shown previously, the factors areF = VTSo we can use the SVD to calculatethe coefficients, C, and the factors, F

  36. MatLab Codefor computing C and F [U,LAMBDA,V] = svd(S); C = U*LAMBDA; F = V’;

  37. MatLab Codeapproximating SSp using only the p most important factors p = (whatever); Up=U(:,1:p); LAMBDAp=LAMBDA(1:p,1:p); Cp = Up*LAMBDAp; Vp = V(:,1:p); Fp = (Vp)’; Sp = Cp * Fp;

  38. back to my example

  39. Each pixel is a component of the imageand the patters are factorsour derivation assumed that the data (samples, s(i)) were vectorsHowever, in this example, the data are images (matrices)so what I had to do was to write out the pixels of each image as a vector

  40. Steps1) load images2) reorganize images into S3) SVD of S to get UL and V4) Examine L to identify number of significant factors5) Build S’, using only significant factors6) reorganize S’ back into images

  41. MatLab code for reorganizing a sequence of imagesD(p,q,r) (p=1 …Nx) (q=1 …Nx) (r=1 …Nt) into the sample matrix, S(r,s) (r=1 …Nt) (q=1 …Nx2) for r = [1:Nt] % time r for p = [1:Nx] % row p for q = [1:Nx] % col q s = Nx*(p-1)+q; % index s S(r,s) = D(p,q,r); end end end

  42. MatLab code for reorganizing the sample matrixS(r,s) (r=1 …Nt) (s=1 …Nx2) back into a sequence of imagesD(p,q,r) (p=1 …Nx) (q=1 …Nx) (r=1 …Nt) for r = [1:Nt] % time p for s = [1:Nx*Nx] % index s p = floor( (s-1)/Nx+0.01 ) + 1; % row p q = s - Nx*(p-1); % col q D(p,q,r) = S(r,s); end end

  43. Reality of Factorsare factors intrinsically meaningful, or just a convenient way of representing data? Example: Suppose the samples are rocks and the components are element concentrations then thinking of the factors as minerals might make intuitive sense Minerals: fixed element composition Rock: mixture of minerals

  44. Many rocks – but just a few minerals rock 3 rock 1 rock 2 rock 6 rock 7 rock 5 mineral (factor) 1 rock 4 mineral (factor) 2 mineral (factor) 3

  45. Possibly Desirable Properties of Factors Factors are unlike each other different minerals typically contain different elements Factor contains either large or near-zero components a mineral typically contains only a few elements Factors have only positive components minerals composed of positive amount of chemical elements Coefficient of factors are positive rocks composed of positive amount of minerals Coefficient typically either large or near-zero rocks composed of just a few major minerals

  46. Transformations of Factors S = CF Suppose we mix factors together to get new factors set of factors (A in f1) (B in f1) (C in f1) (A in f2) (B in f2) (C in f2) (A in f3) (B in f3) (C in f3) (A in f’1) (B in f’1) (C in f’1) (A in f’2) (B in f’2) (C in f’2) (A in f’3) (B in f’3) (C in f’3) (f1 in f’1) (f2 in f’1) (f3 in f’1) (f1 in f’s2) (f2 in f’2) (f3 in f’2) (f1 in f’3) (f2 in f’3) (f3 in f’3) = New FactorsMM TransformationMM Old FactorsMM Fnew = TFold

  47. Transformations of Factors Fnew = TFold A requirement is that T-1 exists, else Fnew will not span the same space as Fold S = CF = CIF = (CT-1) (T F)= Cnew Fnew So you could try to implement the desirable factors by designing an appropriate transformation matrix, T A somewhat restrictive choice of T is T=R, where R is a rotation matrix (rotation matrices satisfy R-1=RT)

  48. A method for implementing this property Factors are unlike each other different minerals typically contain different elements Factor contains either large or near-zero components a mineral typically contains only a few elements Factors have only positive components minerals composed of positive amount of chemical elements Coefficient of factors are positive rocks composed of positive amount of minerals Coefficient typically either large or near-zero rocks composed of just a few major minerals

  49. Factor contains either large or near-zero components More-or-less equivalent to Lots of variance in the amounts of components contained in the factor

  50. Usual formula for variance for data, x sd2 = N-2 [ N Sixi2 - (Si xi)2 ] Application to factor, f sf2 = N-2 [ N Sifi4 - (Si fi2)2 ] Note that we are measuring the variance of the squares of the elements of , f. Thus a factor has large sf2 if the absolute-value of its elements has a lot of variation. The sign of the elements is irrelevant.

More Related