Multivariate data analysis
This presentation is the property of its rightful owner.
Sponsored Links
1 / 40

Multivariate Data Analysis PowerPoint PPT Presentation


  • 124 Views
  • Uploaded on
  • Presentation posted in: General

Multivariate Data Analysis. Principal Component Analysis. Principal Component Analysis (PCA). Singular Value Decomposition Eigenvector / eigenvalue calculation. Data Matrix (IxK). Reduce variables Improve projections Remove noise Find outliers Find classes. K. X. I. PCA.

Download Presentation

Multivariate Data Analysis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Multivariate data analysis

Multivariate Data Analysis

Principal Component Analysis


Principal component analysis pca

Principal Component Analysis (PCA)

  • Singular Value Decomposition

  • Eigenvector / eigenvalue calculation


Data matrix ixk

Data Matrix (IxK)

  • Reduce variables

  • Improve projections

  • Remove noise

  • Find outliers

  • Find classes

K

X

I


Multivariate data analysis

PCA

  • Example with 2 variables, 6 objects

  • Find best (most informative) direction in space

  • Describe direction

  • Make projection


Multivariate data analysis

x2

x1


Multivariate data analysis

x2

x1


Multivariate data analysis

1st PC


Multivariate data analysis

1st PC

Score

Residual


Multivariate data analysis

1st PC

Loading p2

Unit vector

Loading p1


Multivariate data analysis

1st PC

Unit vector

Loading p2 = sin (a)

Loading p1 = cos(a)


Multivariate data analysis

t

X

K

i

Score vector

I

p

Loading vector


Multivariate data analysis

k

t

X

K

Score vector

I

p

Loading vector


Multivariate data analysis

t

X

K

Score vector

I

p

Loading vector


Multivariate data analysis

X = t1p1’ + t2p2’ + ... + tApA’ + E

X=TP’+E

X : properly preprocessed (IxK)

T: Score matrix (IxA)

P: loading matrix (KxA)

E: residual matrix (IxK)

ta: score vector

pa: loading vector


The wine example people magazine wise gallagher

The Wine ExamplePeople magazineWise & Gallagher


Multivariate data analysis

Wine Beer Spirit LifeEx HeartD

France

Italy

Switz

Austra

Brit

U.S.A.

Russia

Czech

Japan

Mexico

63.5000 40.1000 2.5000 78.0000 61.1000

58.0000 25.1000 0.9000 78.0000 94.1000

46.0000 65.0000 1.7000 78.0000 106.4000

15.7000 102.1000 1.2000 78.0000 173.0000

12.2000 100.0000 1.5000 77.0000 199.7000

8.9000 87.8000 2.0000 76.0000 176.0000

2.7000 17.1000 3.8000 69.0000 373.6000

1.7000 140.0000 1.0000 73.0000 283.7000

1.0000 55.0000 2.1000 79.0000 34.7000

0.2000 50.4000 0.8000 73.0000 36.4000


Multivariate data analysis

Beer Wine Spirit LifeEx HeartD

Mean

20.9900 68.2600 1.7500 75.9000 153.8700

24.9270 38.6718 0.9132 3.2128 110.8182

Standard

Deviation


Multivariate data analysis

Singular value

l1=46%

32%

12%

8%

2%

Component


Multivariate data analysis

Score 2 (32%)

Czech

Brit

Austral

Mex

USA

Japan

Switz

Italy

France

Russia

Score 1 (46%)


Multivariate data analysis

Loading 2

Beer

Life exp.

Heart dis.

Wine

Spirit

Loading 1


Conclusions

Conclusions

Scores = positions of objects in multivariate space

Loadings = importance of original variables for new directions

Try to explain a large enough portion of X (46+32 = 78%)


The apricot example

The Apricot Example

Manley & Geladi


Multivariate data analysis

Pseudoabsorbance

Appelkoos

Wavelength, nm


Multivariate data analysis

Singular value

Scree plot

Component number


What is rank

What is rank?

Mathematical rank = max(min(I,K))

Gives zero residual

Effective rank = A

Separates model from noise


Anova

ANOVA

SS

SS%

SS%cum

Comp#

1

2

3

4

5

6

7

8

9

10

68.8269

1.2843

0.0463

0.0045

0.0007

0.0003

0.0002

0.0001

0.0000

0.0000

98.10

1.83

0.07

0.01

0.00

0.00

0.00

0.00

0.00

0.00

98.10

99.93

100

Total

70.1634

100


Multivariate data analysis

Score 2 (2%)

Score 1 (98%)


Anova1

ANOVA

SStot = l1 + l2 + l3 +...+ l(I or K)

SStot = SS1 + SS2 + SS3 +...+ SS(I or K)

From largest to smallest!


Anova2

ANOVA

X = TP’ + E

data = model + residual

SStot = SSmod + SSres

R2 = SSmod / SStot = 1 - SSres / SStot

Coefficient of determination (often in %)


Examples

Examples

Wines R2 = SSmod = 78% SSres = 22%2 Comp.

Apricots 1 R2 = SSmod = 99.93% SSres = 0.07%

2 Comp.

Apricots 2 R2 = SSmod = 100% SSres = ±0.0%

3 Comp.


Multivariate data analysis

Absorbance

Outliers removed

Wavelength, nm


Multivariate data analysis

No outliers

Singular values

l1=81%

16%

3%

Component


Multivariate data analysis

Score 3 (3%)

Whole fruit

No kernel

Thin slice

Score 2 (16%)


Multivariate data analysis

Loading 23

Wavelength, nm


Multivariate data analysis

Loading 3

Loading 2


More nomenclature

More nomenclature

Score = Latent Variable

Loading vector = Eigenvector

Effective rank = Pseudorank = Model dimensionality = Number of components

SSa = Eigenvalue

Singular value = SSa1/2


An analysis sequence

An analysis sequence

  • 1. Scale, mean-center data

  • 2. Calculate a few components

  • 3. Check scores, loadings

  • 4. Find outliers, groupings, explain

  • 5. Remove outliers


An analysis sequence1

An analysis sequence

  • 6. Scale, mean-center data

  • 7. Calculate enough components

  • 8. Try to detemine pseudorank

  • 9. Check score plots

  • 10. Check loading plots

  • 11. Check residuals


Multivariate data analysis

Wines

Residual stdev

2

1

4

0

3


Multivariate data analysis

Wines

Residual stdev

4

0

1

3

2


  • Login